Decision 3 · Weeks 5–7, 9

"Keep or let go?"

When the answer to a business question is yes/no — will this customer churn? Will this loan default? Will this stent fail? — you reach for classification.

Logistic Regression Decision Trees Random Forests Supervised

Why this decision matters

Classification is the workhorse of business analytics. Almost every "should we?" question — should we approve this loan, retain this customer, prioritize this lead, recall this device — is a classification problem in disguise.

This module covers four weeks because classification builds on itself: we start simple (logistic regression), add interpretability (decision trees), boost accuracy (random forests & ensembles), then learn how to trust the model (validation, tuning, stakeholder communication).

By the end of this topic you'll be able to

Frame a yes/no business question as a classification problem; build a logistic regression and a decision tree on the same data; read a confusion matrix and ROC curve; explain a tree-based model to a non-technical executive; defend whether a model is "good enough" to deploy.

Materials

Key concepts to know
  • Logistic regression — predicts the probability of a yes/no outcome. Coefficients are interpretable as log-odds.
  • Decision trees — produce human-readable if/then rules. Greedy splits on the feature that best separates the classes. Easy to over-fit.
  • Random forests & ensembles — many imperfect trees averaged together usually beat one "best" tree. Less interpretable, more accurate.
  • Confusion matrix — true positives, false positives, true negatives, false negatives. The single most important table in supervised learning.
  • Precision vs. recall — depending on the business cost, you optimize for one or the other (or balance both via F1).
  • ROC & AUC — model quality across all thresholds; AUC of 0.5 is a coin flip, 1.0 is perfect.
  • Train/test split & cross-validation — never trust a model on data it has seen.
  • Class imbalance — if 99% of cases are "no," accuracy is meaningless. Use balanced metrics.
Slides & lecture decks
Cheat sheets & class notes
Hands-on activity

The Week 6 in-class activity walks you through building a decision tree on a real loan dataset, then defending your splits to a partner.

Python notebooks
Datasets
Mini-case: Stent Failure

A small medical-device company wants to predict which stents are at high risk of failure post-implantation. False negatives are catastrophic; false positives mean unnecessary surgeries. Build, evaluate, and present a classifier — once for executives, once for the technical team.

SAS Model Studio: Healthcare ML

Same problem, enterprise tool. Use SAS Viya Model Studio to build a healthcare-classification pipeline visually, with governed data prep and reproducible scoring.

Practice with games · Model choice & comparison

Three games that drill the "which model wins?" muscle that classification really is about.

Stay Ahead of the Curve

Subscribe to our bi-weekly newsletter for the latest insights on AI, data, and business strategy.