Decision 3 — Classification

Why this decision matters

Classification is the workhorse of business analytics. Almost every "should we?" question — should we approve this loan, retain this customer, prioritize this lead, recall this device — is a classification problem in disguise.

This module covers four weeks because classification builds on itself: we start simple (logistic regression), add interpretability (decision trees), boost accuracy (random forests & ensembles), then learn how to trust the model (validation, tuning, stakeholder communication).

By the end of this topic you'll be able to

Frame a yes/no business question as a classification problem; build a logistic regression and a decision tree on the same data; read a confusion matrix and ROC curve; explain a tree-based model to a non-technical executive; defend whether a model is "good enough" to deploy.

Materials

Key concepts to know

Logistic regression — predicts the probability of a yes/no outcome. Coefficients are interpretable as log-odds.
Decision trees — produce human-readable if/then rules. Greedy splits on the feature that best separates the classes. Easy to over-fit.
Random forests & ensembles — many imperfect trees averaged together usually beat one "best" tree. Less interpretable, more accurate.
Confusion matrix — true positives, false positives, true negatives, false negatives. The single most important table in supervised learning.
Precision vs. recall — depending on the business cost, you optimize for one or the other (or balance both via F1).
ROC & AUC — model quality across all thresholds; AUC of 0.5 is a coin flip, 1.0 is perfect.
Train/test split & cross-validation — never trust a model on data it has seen.
Class imbalance — if 99% of cases are "no," accuracy is meaningless. Use balanced metrics.

Slides & lecture decks

Classification Basics (Week 6)Foundations: what classification is, when to use it, evaluation metrics.
Decision Tree Logic (Week 6)How trees split data; gini vs. entropy; pruning; reading a tree.
End-to-End Pipeline Activity (Week 7)From raw CSV to a trained, validated, defensible classifier.
Saving Lives with BizML — Stent FailureReal-world stakes: when a classifier influences medical-device decisions.

Cheat sheets & class notes

Statistics & ML Cheat SheetOne-page reference: distributions, hypothesis tests, ML terminology.
When Models Go Wrong — Case StudiesFive cautionary tales of classifiers that misled their users.
Predictive Analytics & Data Mining HandbookReference textbook chapters covering this entire module.

Hands-on activity

The Week 6 in-class activity walks you through building a decision tree on a real loan dataset, then defending your splits to a partner.

Decision Tree Activity — Student WorksheetFill-in-the-blank: build a tree, justify your splits, predict outcomes.
Loan Application Dataset~500 loan applications with approve/deny outcomes for the worksheet.

Python notebooks

Predicting Customer Churn — Logistic Regression & Decision TreeFull pipeline: load, EDA, split, train both models, compare results.
Week 7 — ML in HealthcareApply classification to a healthcare dataset; discuss bias and fairness.
Random Forest NotebookBuild a random forest; tune hyperparameters; compare against the single tree.

Datasets

Telco Churn DatasetClassic customer-churn dataset for the logistic regression walk-through.
Loan Application DatasetUsed in the Week 6 in-class activity.
Stent Failure DatasetMedical-device classification — high-stakes, asymmetric error costs.

Mini-case: Stent Failure

A small medical-device company wants to predict which stents are at high risk of failure post-implantation. False negatives are catastrophic; false positives mean unnecessary surgeries. Build, evaluate, and present a classifier — once for executives, once for the technical team.

Stent Failure Dataset~2,000 procedures with patient features and failure labels.
Executive PresentationSample executive-audience deck — what to lead with, what to leave out.
Technical ReviewSample technical-audience deck — methodology, validation, caveats.

SAS Model Studio: Healthcare ML

Same problem, enterprise tool. Use SAS Viya Model Studio to build a healthcare-classification pipeline visually, with governed data prep and reproducible scoring.

SAS Model Studio InstructionsStep-by-step guide to the SAS Viya pipeline.
Case Study BackgroundBusiness context for the healthcare classifier.
Sample Business PresentationHow to frame the model for the hospital leadership audience.
Sample Technical PresentationHow to present the same work to the data team.

Practice with games · Model choice & comparison

Three games that drill the "which model wins?" muscle that classification really is about.

Analytics ConsultantPlay consultant: read the scenario, pick the right method, defend it.
Model MatchmakerMatch models to problems based on data shape and outcome type.
Prediction ShowdownCompare candidate models head-to-head on the same dataset.

"Keep or let go?"

Why this decision matters

By the end of this topic you'll be able to

Materials

Stay Ahead of the Curve