Decision 1 · Weeks 2–3

"What's the real problem?"

Before any modeling, you have to translate a fuzzy business question into a data problem — and explore the data thoroughly enough to know what's actually there.

EDA Data Cleaning Feature Engineering Foundational

Why this decision matters

Every other decision in this course depends on this one. If you frame the wrong problem, or you miss something hidden in the data, no amount of fancy modeling will save you. Real data scientists spend 60–80% of their time here — and the project lives or dies based on it.

By the end of this topic you'll be able to

Translate a vague business request into a precise analytics question; identify whether you have the right data to answer it; profile a dataset systematically; clean common data issues (missing values, duplicates, type mismatches); build the basic features that downstream models need.

Materials

Key concepts to know
  • Problem framing — turning "we want more revenue" into "predict which existing customers will spend more if contacted in the next 30 days."
  • Data profiling — every variable: type, range, missing %, distribution, weird values.
  • Wide vs. long format — most ML algorithms expect wide; many data sources hand you long. You'll learn to reshape both ways.
  • Missing data strategies — drop, impute, flag, or model. The right answer depends on why it's missing.
  • Outliers — rare event you care about, or data quality problem? Investigation, not deletion.
  • Feature engineering — derived columns (ratios, dates, encodings) that make patterns easier to learn.
Cheat sheets & class notes
Hands-on: data prep demos

Three demo notebooks walk you through common prep scenarios. Unzip each and follow the README inside.

Practice datasets
Practice with games · Big picture & orientation

Short browser games and explainers that build intuition for what analytics is, what kinds there are, and which decision fits which type.

Practice with games · Data shape & problem framing
Optional SQL warm-up

If your data lives in a database, you'll need a little SQL to pull it. These two short demos cover the basics.

Homework

HW-1: EDA — due end of Week 3. Pick one of the practice datasets, profile it end-to-end, and write a short memo identifying three things you'd want to investigate before modeling. Submission instructions.

Stay Ahead of the Curve

Subscribe to our bi-weekly newsletter for the latest insights on AI, data, and business strategy.