ShieldScore — Fraud Detection

The business problem

ShieldScore Insurance is losing an estimated $4–6M a year to fraudulent claims. Their adjusters can investigate roughly 8% of submitted claims in depth. They want a model that ranks every incoming claim by fraud risk so adjusters spend their time on the riskiest 8%, not a random 8%.

This case combines everything from Decision 3 (Classification) with the deployment thinking from the operationalization materials. The deliverable isn't a notebook — it's a scoring tool the claims team can use Monday morning.

What "deployment" means here

You'll build the model in Python, then translate it into an Excel scorer that operations can run without any code. Same logic, different runtime. This is how a lot of small-to-mid-size companies actually consume ML.

Case kit

Everything you need

Student WorksheetFrame the problem, plan the model, defend your choices. Start here.
Auto Claims Dataset (Erie region)~10k claims with features and known fraud labels.
Python Modeling NotebookReference implementation: EDA → feature engineering → logistic regression → evaluation.
Excel Deployment ScorerThe model translated into Excel formulas — no code required to score new claims.
Model Artifacts (JSON)Coefficients and threshold for re-implementing the scorer in another tool.

Sample presentations

Two McKinsey-style decks for the same analysis — one for the executive audience, one for the technical review.

Executive PresentationFor the CFO and Head of Claims. Lead with dollars, not models.
Technical PresentationFor the data team and IT. Methodology, validation, deployment plan.

Topics you'll be applying

Decision 1 — Data Prep & EDA — claims data is messy
Decision 3 — Classification — your core technique
Decision 5 — Anomaly Detection — useful framing for the long tail of unusual claims

ShieldScore — Auto-Insurance Fraud Detection

The business problem

What "deployment" means here

Case kit

Stay Ahead of the Curve