{
"cells": [
{
"cell_type": "markdown",
"id": "972b28c7",
"metadata": {},
"source": [
"# š”ļø ShieldScore: Predicting High-Risk Auto Claims\n",
"## Erie Insurance Group ā End-to-End Predictive Analytics Demo\n",
"\n",
"**Course:** Data Analytics & Data Mining \n",
"**Created by:** Dr. Yaa
\n",
"**Framework:** BizML Lifecycle + PAIR Framework \n",
"**Techniques:** EDA ā Feature Engineering ā Classification ā Model Evaluation ā Deployment\n",
"\n",
"---\n",
"\n",
"### The Business Problem\n",
"\n",
"> *Erie Insurance wants to identify auto policyholders at elevated risk of filing a costly claim (>$5,000) in the next 12 months ā not to raise rates, but to deploy proactive interventions (safe driving programs, weather alerts, telematics discounts) that reduce claims and improve retention.*\n",
"\n",
"### PAIR Framework\n",
"\n",
"| Element | Application |\n",
"|---------|------------|\n",
"| **Prediction** | Probability of a high-cost claim in next 12 months |\n",
"| **Action** | Enroll in safe-driving program, send weather alerts, offer telematics discount |\n",
"| **Impact** | Reduced claims cost, improved retention, lower combined ratio |\n",
"| **Risk** | False positives annoy low-risk customers; ZIP-code bias; privacy concerns |\n"
]
},
{
"cell_type": "markdown",
"id": "289cbd22",
"metadata": {},
"source": [
"## Phase 1: Setup & Data Loading\n",
"\n",
"First, let's import our libraries and load the dataset.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "276fd540",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ā
Libraries loaded successfully\n"
]
}
],
"source": [
"# āā Core Libraries āā\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from matplotlib.colors import LinearSegmentedColormap\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"# āā Styling āā\n",
"plt.rcParams['figure.figsize'] = (10, 5)\n",
"plt.rcParams['font.family'] = 'sans-serif'\n",
"plt.rcParams['axes.spines.top'] = False\n",
"plt.rcParams['axes.spines.right'] = False\n",
"\n",
"# Brand colors\n",
"NAVY, TEAL, GOLD = '#1B2A4A', '#2A7B88', '#D4A843'\n",
"CREAM, CORAL = '#FDF6E3', '#E07A5F'\n",
"palette = [TEAL, NAVY, GOLD, CORAL, '#81B29A', '#3D405B']\n",
"sns.set_palette(palette)\n",
"\n",
"print(\"ā
Libraries loaded successfully\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0611aa73",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset shape: 5,001 rows Ć 31 columns\n",
"\n",
"First 5 rows:\n"
]
},
{
"data": {
"text/html": [
"
| \n", " | POLICY_ID | \n", "AGE | \n", "GENDER | \n", "MARITAL_STATUS | \n", "CREDIT_SCORE | \n", "YEARS_AS_CUSTOMER | \n", "ZIP_CODE | \n", "RISK_ZONE | \n", "COVERAGE_TYPE | \n", "DEDUCTIBLE | \n", "... | \n", "DUI_HISTORY | \n", "AT_FAULT_ACCIDENTS | \n", "PAYMENT_TIMELINESS | \n", "POLICY_CHANGES_12MO | \n", "SERVICE_CALLS_12MO | \n", "DIGITAL_ENGAGEMENT | \n", "TELEMATICS_ENROLLED | \n", "ADJUSTER_NOTES | \n", "HIGH_COST_CLAIM | \n", "CLAIM_AMOUNT | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "PH-01948 | \n", "26.0 | \n", "F | \n", "Divorced | \n", "616.0 | \n", "14 | \n", "23214 | \n", "Suburban | \n", "Premium | \n", "1000 | \n", "... | \n", "0 | \n", "0 | \n", "-2.9 | \n", "1 | \n", "1 | \n", "High | \n", "1 | \n", "NaN | \n", "0 | \n", "0 | \n", "
| 1 | \n", "PH-03328 | \n", "36.0 | \n", "M | \n", "Married | \n", "657.0 | \n", "0 | \n", "17362 | \n", "Suburban | \n", "Standard | \n", "250 | \n", "... | \n", "0 | \n", "0 | \n", "-3.5 | \n", "1 | \n", "0 | \n", "High | \n", "0 | \n", "NaN | \n", "0 | \n", "0 | \n", "
| 2 | \n", "PH-04508 | \n", "62.0 | \n", "M | \n", "Married | \n", "786.0 | \n", "12 | \n", "10311 | \n", "Urban | \n", "Standard | \n", "500 | \n", "... | \n", "0 | \n", "2 | \n", "4.8 | \n", "0 | \n", "3 | \n", "Medium | \n", "0 | \n", "Claimant reported hit by uninsured motorist. E... | \n", "0 | \n", "0 | \n", "
| 3 | \n", "PH-01890 | \n", "24.0 | \n", "M | \n", "Divorced | \n", "657.0 | \n", "6 | \n", "47255 | \n", "Suburban | \n", "Premium | \n", "500 | \n", "... | \n", "0 | \n", "1 | \n", "5.4 | \n", "0 | \n", "3 | \n", "Medium | \n", "0 | \n", "NaN | \n", "0 | \n", "0 | \n", "
| 4 | \n", "PH-03491 | \n", "31.0 | \n", "F | \n", "Divorced | \n", "791.0 | \n", "3 | \n", "18926 | \n", "Suburban | \n", "Comprehensive | \n", "250 | \n", "... | \n", "0 | \n", "0 | \n", "18.5 | \n", "0 | \n", "0 | \n", "Low | \n", "0 | \n", "Claimant reported collision at merge. Weather ... | \n", "1 | \n", "13024 | \n", "
5 rows Ć 31 columns
\n", "| \n", " | Type | \n", "Non-Null | \n", "Missing | \n", "Missing % | \n", "Unique | \n", "
|---|---|---|---|---|---|
| ADJUSTER_NOTES | \n", "object | \n", "2002 | \n", "2999 | \n", "60.0 | \n", "2002 | \n", "
| CREDIT_SCORE | \n", "float64 | \n", "4743 | \n", "258 | \n", "5.2 | \n", "390 | \n", "
| ANNUAL_MILEAGE | \n", "float64 | \n", "4802 | \n", "199 | \n", "4.0 | \n", "4150 | \n", "
| AGE | \n", "float64 | \n", "4832 | \n", "169 | \n", "3.4 | \n", "66 | \n", "
| GENDER | \n", "object | \n", "4904 | \n", "97 | \n", "1.9 | \n", "3 | \n", "