The business problem
ShieldScore Insurance is losing an estimated $4–6M a year to fraudulent claims. Their adjusters can investigate roughly 8% of submitted claims in depth. They want a model that ranks every incoming claim by fraud risk so adjusters spend their time on the riskiest 8%, not a random 8%.
This case combines everything from Decision 3 (Classification) with the deployment thinking from the operationalization materials. The deliverable isn't a notebook — it's a scoring tool the claims team can use Monday morning.
What "deployment" means here
You'll build the model in Python, then translate it into an Excel scorer that operations can run without any code. Same logic, different runtime. This is how a lot of small-to-mid-size companies actually consume ML.
Case kit
Everything you need
- Student WorksheetFrame the problem, plan the model, defend your choices. Start here.
- Auto Claims Dataset (Erie region)~10k claims with features and known fraud labels.
- Python Modeling NotebookReference implementation: EDA → feature engineering → logistic regression → evaluation.
- Excel Deployment ScorerThe model translated into Excel formulas — no code required to score new claims.
- Model Artifacts (JSON)Coefficients and threshold for re-implementing the scorer in another tool.
Sample presentations
Two McKinsey-style decks for the same analysis — one for the executive audience, one for the technical review.
- Executive PresentationFor the CFO and Head of Claims. Lead with dollars, not models.
- Technical PresentationFor the data team and IT. Methodology, validation, deployment plan.
Topics you'll be applying
- Decision 1 — Data Prep & EDA — claims data is messy
- Decision 3 — Classification — your core technique
- Decision 5 — Anomaly Detection — useful framing for the long tail of unusual claims