Why this decision matters
Classification is the workhorse of business analytics. Almost every "should we?" question — should we approve this loan, retain this customer, prioritize this lead, recall this device — is a classification problem in disguise.
This module covers four weeks because classification builds on itself: we start simple (logistic regression), add interpretability (decision trees), boost accuracy (random forests & ensembles), then learn how to trust the model (validation, tuning, stakeholder communication).
By the end of this topic you'll be able to
Frame a yes/no business question as a classification problem; build a logistic regression and a decision tree on the same data; read a confusion matrix and ROC curve; explain a tree-based model to a non-technical executive; defend whether a model is "good enough" to deploy.
Materials
Key concepts to know
- Logistic regression — predicts the probability of a yes/no outcome. Coefficients are interpretable as log-odds.
- Decision trees — produce human-readable if/then rules. Greedy splits on the feature that best separates the classes. Easy to over-fit.
- Random forests & ensembles — many imperfect trees averaged together usually beat one "best" tree. Less interpretable, more accurate.
- Confusion matrix — true positives, false positives, true negatives, false negatives. The single most important table in supervised learning.
- Precision vs. recall — depending on the business cost, you optimize for one or the other (or balance both via F1).
- ROC & AUC — model quality across all thresholds; AUC of 0.5 is a coin flip, 1.0 is perfect.
- Train/test split & cross-validation — never trust a model on data it has seen.
- Class imbalance — if 99% of cases are "no," accuracy is meaningless. Use balanced metrics.
Slides & lecture decks
- Classification Basics (Week 6)Foundations: what classification is, when to use it, evaluation metrics.
- Decision Tree Logic (Week 6)How trees split data; gini vs. entropy; pruning; reading a tree.
- End-to-End Pipeline Activity (Week 7)From raw CSV to a trained, validated, defensible classifier.
- Saving Lives with BizML — Stent FailureReal-world stakes: when a classifier influences medical-device decisions.
Cheat sheets & class notes
- Statistics & ML Cheat SheetOne-page reference: distributions, hypothesis tests, ML terminology.
- When Models Go Wrong — Case StudiesFive cautionary tales of classifiers that misled their users.
- Predictive Analytics & Data Mining HandbookReference textbook chapters covering this entire module.
Hands-on activity
The Week 6 in-class activity walks you through building a decision tree on a real loan dataset, then defending your splits to a partner.
- Decision Tree Activity — Student WorksheetFill-in-the-blank: build a tree, justify your splits, predict outcomes.
- Loan Application Dataset~500 loan applications with approve/deny outcomes for the worksheet.
Python notebooks
- Predicting Customer Churn — Logistic Regression & Decision TreeFull pipeline: load, EDA, split, train both models, compare results.
- Week 7 — ML in HealthcareApply classification to a healthcare dataset; discuss bias and fairness.
- Random Forest NotebookBuild a random forest; tune hyperparameters; compare against the single tree.
Datasets
- Telco Churn DatasetClassic customer-churn dataset for the logistic regression walk-through.
- Loan Application DatasetUsed in the Week 6 in-class activity.
- Stent Failure DatasetMedical-device classification — high-stakes, asymmetric error costs.
Mini-case: Stent Failure
A small medical-device company wants to predict which stents are at high risk of failure post-implantation. False negatives are catastrophic; false positives mean unnecessary surgeries. Build, evaluate, and present a classifier — once for executives, once for the technical team.
- Stent Failure Dataset~2,000 procedures with patient features and failure labels.
- Executive PresentationSample executive-audience deck — what to lead with, what to leave out.
- Technical ReviewSample technical-audience deck — methodology, validation, caveats.
SAS Model Studio: Healthcare ML
Same problem, enterprise tool. Use SAS Viya Model Studio to build a healthcare-classification pipeline visually, with governed data prep and reproducible scoring.
- SAS Model Studio InstructionsStep-by-step guide to the SAS Viya pipeline.
- Case Study BackgroundBusiness context for the healthcare classifier.
- Sample Business PresentationHow to frame the model for the hospital leadership audience.
- Sample Technical PresentationHow to present the same work to the data team.
Practice with games · Model choice & comparison
Three games that drill the "which model wins?" muscle that classification really is about.
- Analytics ConsultantPlay consultant: read the scenario, pick the right method, defend it.
- Model MatchmakerMatch models to problems based on data shape and outcome type.
- Prediction ShowdownCompare candidate models head-to-head on the same dataset.