makeyourAI.work the machine teaches the human

Week 2: Data, ML, and How Models Learn

Evaluation, Leakage, and GDPR Boundaries

A bad evaluation pipeline can make a useless model look great.

core 70 minutes ML Decision Boundary Gate

Objective

Describe data leakage, basic evaluation metrics, and why sensitive data handling matters in training flows.

The lesson is public. The pressure loop lives inside the app where submissions, revision, and review happen.

Deliverable

A simple ML pipeline with evaluation and a leakage audit.

Each lesson contributes to a week-level artifact and eventually to the shipped AI-native SaaS.

Preview

Public lesson preview.

Lesson Preview

Evaluation, Leakage, and GDPR Boundaries

A bad evaluation pipeline can make a useless model look great.

This lesson teaches you how to distrust a flattering metric until the evaluation design has earned your trust.

Leaky evaluation produces false confidence, which is one of the fastest ways to launch a bad model with executive approval. Privacy mistakes add legal and reputational cost on top.

Evaluation is a claim about future usefulness. Leakage and privacy failures invalidate that claim by corrupting either the data boundary or the legal boundary.

What This Is

This lesson teaches you how to distrust a flattering metric until the evaluation design has earned your trust.

Why This Matters in Production

Leaky evaluation produces false confidence, which is one of the fastest ways to launch a bad model with executive approval. Privacy mistakes add legal and reputational cost on top.

Mental Model

Evaluation is a claim about future usefulness. Leakage and privacy failures invalidate that claim by corrupting either the data boundary or the legal boundary.

Deep Dive

Leakage happens when information from outside the legitimate training context slips into features, preprocessing, or label construction. It often hides in time-aware data, aggregated statistics, or human-generated features. At the same time, privacy boundaries matter because model training can easily absorb identifiers that should have been removed, masked, or minimized. Maturity means treating metrics and privacy controls as one coherent quality system.

Worked Example

A support-ticket model uses a field that is only filled after escalation, but the target is escalation itself. Accuracy looks excellent. In reality, the model learned to detect a future artifact. That is leakage, not intelligence.

Common Failure Modes

Frequent failures include splitting after feature engineering, using test-set information in normalization, and assuming anonymization happened because someone said “the data is safe.”

References

Further reading the machine expects you to use properly.

official-doc

scikit-learn Metrics

Use official docs for metric definitions and tradeoffs.

Open reference

law

GDPR Principles

Anchor privacy handling in a real legal principle.

Open reference

official-doc

Data Leakage Guidance

Supplement the lesson with a practical framing of leakage risk.

Open reference