Week 2: Data, ML, and How Models Learn  /  Lesson Preview

Evaluation, Leakage, and GDPR Boundaries

A bad evaluation pipeline can make a useless model look great.

Difficulty core
Duration 70 min
Gate ML Decision Boundary Gate
Objective

Describe data leakage, basic evaluation metrics, and why sensitive data handling matters in training flows.

The lesson is public. The pressure loop lives inside the app where submissions, revision, and AI review happen.

Deliverable

A simple ML pipeline with evaluation and a leakage audit.

Each lesson contributes to a week-level artifact and eventually to the shipped AI-native SaaS.

PREVIEW_LESSON

Evaluation, Leakage, and GDPR Boundaries

This lesson teaches you how to distrust a flattering metric until the evaluation design has earned your trust.

Leaky evaluation produces false confidence, which is one of the fastest ways to launch a bad model with executive approval. Privacy mistakes add legal and reputational cost on top.

Evaluation is a claim about future usefulness. Leakage and privacy failures invalidate that claim by corrupting either the data boundary or the legal boundary.

Unlock full lesson

What the machine covers in this lesson.

What This Is

This lesson teaches you how to distrust a flattering metric until the evaluation design has earned your trust.

Why This Matters in Production

Leaky evaluation produces false confidence, which is one of the fastest ways to launch a bad model with executive approval. Privacy mistakes add legal and reputational cost on top.

Mental Model

Evaluation is a claim about future usefulness. Leakage and privacy failures invalidate that claim by corrupting either the data boundary or the legal boundary.

Deep Dive

Leakage happens when information from outside the legitimate training context slips into features, preprocessing, or label construction. It often hides in time-aware data, aggregated statistics, or human-generated features. At the same time, privacy boundaries matter because model training can easily absorb identifiers that should have been removed, masked, or minimized. Maturity means treating metrics and privacy controls as one coherent quality system.

Worked Example

A support-ticket model uses a field that is only filled after escalation, but the target is escalation itself. Accuracy looks excellent. In reality, the model learned to detect a future artifact. That is leakage, not intelligence.

Common Failure Modes

Frequent failures include splitting after feature engineering, using test-set information in normalization, and assuming anonymization happened because someone said “the data is safe.”

Further reading the machine expects you to use properly.

official-doc

scikit-learn Metrics

Use official docs for metric definitions and tradeoffs.

Open reference
law

GDPR Principles

Anchor privacy handling in a real legal principle.

Open reference
official-doc

Data Leakage Guidance

Supplement the lesson with a practical framing of leakage risk.

Open reference

The full lesson is inside the app.

Submit the exercise, receive AI review, close the gaps the machine finds, and unlock the next lesson in the sequence.

Enter the training loop Back to week