{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 🐼 Panda Express Site Analysis — Western Pennsylvania\n",
    "**Goal:** Rank cities in western PA by their potential as a new Panda Express location  \n",
    "**Data Source:** U.S. Census Bureau API (American Community Survey 5-Year Estimates)  \n",
    "**Last Updated:** 2024\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 0 — Setup & API Key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required libraries (run once)\n",
    "# !pip install requests pandas geopandas folium matplotlib seaborn plotly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.cm as cm\n",
    "import seaborn as sns\n",
    "import plotly.express as px\n",
    "import plotly.graph_objects as go\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# ─────────────────────────────────────────────────────────────────\n",
    "#  🔑  REPLACE WITH YOUR CENSUS API KEY\n",
    "#  Get a free key at: https://api.census.gov/data/key_signup.html\n",
    "# ─────────────────────────────────────────────────────────────────\n",
    "\n",
    "#CENSUS_API_KEY = \"YOUR_API_KEY_HERE\"\n",
    "CENSUS_API_KEY = \"5586e5b0855b88b8b1f22fd2b8ca0362f0ab417a\"\n",
    "\n",
    "# Census ACS 5-Year dataset endpoint\n",
    "BASE_URL = \"https://api.census.gov/data/2022/acs/acs5\"\n",
    "\n",
    "print(\"✅ Libraries loaded successfully\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1 — Define Western PA Counties\n",
    "We'll pull data for all cities/places within these 15 counties."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Western PA county FIPS codes (state FIPS = 42 for Pennsylvania)\n",
    "WESTERN_PA_COUNTIES = {\n",
    "    \"003\": \"Allegheny\",\n",
    "    \"005\": \"Armstrong\",\n",
    "    \"007\": \"Beaver\",\n",
    "    \"019\": \"Butler\",\n",
    "    \"031\": \"Clarion\",\n",
    "    \"033\": \"Clearfield\",\n",
    "    \"049\": \"Erie\",\n",
    "    \"051\": \"Fayette\",\n",
    "    \"059\": \"Greene\",\n",
    "    \"063\": \"Indiana\",\n",
    "    \"073\": \"Lawrence\",\n",
    "    \"083\": \"Mercer\",\n",
    "    \"121\": \"Venango\",\n",
    "    \"125\": \"Washington\",\n",
    "    \"129\": \"Westmoreland\",\n",
    "    \"039\": \"Crawford\",\n",
    "}\n",
    "\n",
    "STATE_FIPS = \"42\"  # Pennsylvania\n",
    "\n",
    "print(f\"📍 Analyzing {len(WESTERN_PA_COUNTIES)} counties in Western PA\")\n",
    "for fips, name in WESTERN_PA_COUNTIES.items():\n",
    "    print(f\"   {name} County ({fips})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 — Define Census Variables to Pull\n",
    "These ACS variables map to our key site selection criteria."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Census variable codes → what they mean for site selection\n",
    "VARIABLES = {\n",
    "    \"B01003_001E\": \"total_population\",          # Total population\n",
    "    \"B19013_001E\": \"median_household_income\",    # Median household income\n",
    "    \"B01001_007E\": \"male_20_24\",                 # Males 20-24\n",
    "    \"B01001_008E\": \"male_25_29\",                 # Males 25-29\n",
    "    \"B01001_009E\": \"male_30_34\",                 # Males 30-34\n",
    "    \"B01001_010E\": \"male_35_44\",                 # Males 35-44\n",
    "    \"B01001_031E\": \"female_20_24\",               # Females 20-24\n",
    "    \"B01001_032E\": \"female_25_29\",               # Females 25-29\n",
    "    \"B01001_033E\": \"female_30_34\",               # Females 30-34\n",
    "    \"B01001_034E\": \"female_35_44\",               # Females 35-44\n",
    "    \"B08301_001E\": \"total_commuters\",            # Workers who commute (proxy for daytime pop)\n",
    "    \"B08301_002E\": \"commute_by_car\",             # Drive alone to work\n",
    "    \"B08301_010E\": \"commute_by_transit\",         # Public transit commuters\n",
    "    \"B25001_001E\": \"total_housing_units\",        # Total housing units\n",
    "    \"B15003_022E\": \"bachelors_degree\",           # Bachelor's degree holders\n",
    "    \"B15003_001E\": \"pop_25_and_over\",            # Population 25+ (denominator for education)\n",
    "}\n",
    "\n",
    "var_list = \",\".join(VARIABLES.keys())\n",
    "print(f\"📊 Pulling {len(VARIABLES)} variables from Census API\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3 — Pull Data from Census API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# STEP 3 — Stable Census Pull Using County Subdivisions\n",
    "\n",
    "def fetch_county_subdivisions(county_fips, county_name):\n",
    "    params = {\n",
    "        \"get\": f\"NAME,{var_list}\",\n",
    "        \"for\": \"county subdivision:*\",\n",
    "        \"in\": f\"state:{STATE_FIPS}+county:{county_fips}\",\n",
    "        \"key\": CENSUS_API_KEY\n",
    "    }\n",
    "\n",
    "    response = requests.get(BASE_URL, params=params)\n",
    "\n",
    "    if response.status_code != 200:\n",
    "        print(\"Census API Error:\")\n",
    "        print(response.text)\n",
    "        response.raise_for_status()\n",
    "\n",
    "    data = response.json()\n",
    "    df = pd.DataFrame(data[1:], columns=data[0])\n",
    "    df[\"county_name\"] = county_name\n",
    "\n",
    "    return df\n",
    "\n",
    "\n",
    "print(\"Pulling Western PA county subdivisions...\")\n",
    "\n",
    "all_data = []\n",
    "\n",
    "for fips, name in WESTERN_PA_COUNTIES.items():\n",
    "    print(f\"   → {name} County\")\n",
    "    df_temp = fetch_county_subdivisions(fips, name)\n",
    "    all_data.append(df_temp)\n",
    "\n",
    "raw_df = pd.concat(all_data, ignore_index=True)\n",
    "\n",
    "print(f\"Pulled {len(raw_df)} total subdivisions.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4 — Clean & Transform Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rename columns to readable names\n",
    "df = raw_df.rename(columns=VARIABLES)\n",
    "\n",
    "# Clean city names (remove \", Pennsylvania\" suffix)\n",
    "df[\"city\"] = df[\"NAME\"].str.replace(\", Pennsylvania\", \"\", regex=False).str.strip()\n",
    "\n",
    "# Convert numeric columns (Census returns everything as strings)\n",
    "numeric_cols = list(VARIABLES.values())\n",
    "for col in numeric_cols:\n",
    "    df[col] = pd.to_numeric(df[col], errors=\"coerce\")\n",
    "\n",
    "# Replace negative values (Census uses -666666666 for N/A)\n",
    "df[numeric_cols] = df[numeric_cols].replace(-666666666, np.nan)\n",
    "\n",
    "# ── Derived metrics ──────────────────────────────────────────────\n",
    "\n",
    "# Core Panda Express demographic: ages 20-44\n",
    "df[\"pop_20_44\"] = (\n",
    "    df[[\"male_20_24\", \"male_25_29\", \"male_30_34\", \"male_35_44\",\n",
    "        \"female_20_24\", \"female_25_29\", \"female_30_34\", \"female_35_44\"]]\n",
    "    .sum(axis=1)\n",
    ")\n",
    "\n",
    "# % of population in target age group\n",
    "df[\"pct_target_age\"] = (df[\"pop_20_44\"] / df[\"total_population\"] * 100).round(1)\n",
    "\n",
    "# % with bachelor's degree (educated workforce = higher disposable income tendency)\n",
    "df[\"pct_college_educated\"] = (df[\"bachelors_degree\"] / df[\"pop_25_and_over\"] * 100).round(1)\n",
    "\n",
    "# Car access ratio (important for suburban fast casual)\n",
    "df[\"pct_car_commute\"] = (df[\"commute_by_car\"] / df[\"total_commuters\"] * 100).round(1)\n",
    "\n",
    "# Filter: only include places with population > 2,000 (minimum viable market)\n",
    "df = df[df[\"total_population\"] >= 2000].copy()\n",
    "\n",
    "print(f\"✅ After filtering (pop ≥ 2,000): {len(df)} cities remain\")\n",
    "df[[\"city\", \"county_name\", \"total_population\", \"median_household_income\", \"pct_target_age\"]].head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5 — Score & Rank Cities\n",
    "Each city gets a **composite score (0–100)** based on weighted criteria."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def normalize(series):\n",
    "    \"\"\"Min-max normalize a series to 0-100 scale.\"\"\"\n",
    "    min_val = series.min()\n",
    "    max_val = series.max()\n",
    "    if max_val == min_val:\n",
    "        return pd.Series([50] * len(series), index=series.index)\n",
    "    return ((series - min_val) / (max_val - min_val) * 100)\n",
    "\n",
    "\n",
    "# ── Scoring weights (must sum to 1.0) ────────────────────────────\n",
    "#\n",
    "#  Adjust these weights to reflect your boss's priorities!\n",
    "#\n",
    "WEIGHTS = {\n",
    "    \"total_population\":        0.30,   # Larger market = more customers\n",
    "    \"median_household_income\": 0.25,   # Spending power\n",
    "    \"pct_target_age\":          0.20,   # Core Panda Express demographic\n",
    "    \"pct_college_educated\":    0.10,   # Proxy for brand affinity\n",
    "    \"total_commuters\":         0.15,   # Daytime traffic proxy\n",
    "}\n",
    "\n",
    "print(f\"Total weight: {sum(WEIGHTS.values())} (should be 1.0)\")\n",
    "\n",
    "# Normalize each scoring dimension\n",
    "for col in WEIGHTS.keys():\n",
    "    df[f\"{col}_score\"] = normalize(df[col].fillna(df[col].median()))\n",
    "\n",
    "# Calculate composite score\n",
    "df[\"composite_score\"] = sum(\n",
    "    df[f\"{col}_score\"] * weight \n",
    "    for col, weight in WEIGHTS.items()\n",
    ").round(1)\n",
    "\n",
    "# Rank cities\n",
    "df[\"rank\"] = df[\"composite_score\"].rank(ascending=False, method=\"min\").astype(int)\n",
    "df = df.sort_values(\"rank\")\n",
    "\n",
    "print(\"\\n🏆 Top 15 Cities for Panda Express — Western PA\")\n",
    "print(\"=\" * 65)\n",
    "\n",
    "top15 = df.head(15)[[\n",
    "    \"rank\", \"city\", \"county_name\", \"total_population\",\n",
    "    \"median_household_income\", \"pct_target_age\", \"composite_score\"\n",
    "]].copy()\n",
    "\n",
    "top15[\"median_household_income\"] = top15[\"median_household_income\"].apply(\n",
    "    lambda x: f\"${x:,.0f}\" if pd.notna(x) else \"N/A\"\n",
    ")\n",
    "top15[\"total_population\"] = top15[\"total_population\"].apply(lambda x: f\"{x:,.0f}\")\n",
    "top15[\"pct_target_age\"] = top15[\"pct_target_age\"].apply(lambda x: f\"{x:.1f}%\" if pd.notna(x) else \"N/A\")\n",
    "\n",
    "top15.columns = [\"Rank\", \"City\", \"County\", \"Population\", \"Median Income\", \"% Ages 20-44\", \"Score\"]\n",
    "print(top15.to_string(index=False))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6 — Visualizations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ── Chart 1: Top 15 Cities by Composite Score ────────────────────\n",
    "\n",
    "top15_plot = df.head(15).copy()\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(12, 7))\n",
    "\n",
    "colors = plt.cm.RdYlGn(np.linspace(0.3, 0.9, len(top15_plot)))\n",
    "bars = ax.barh(top15_plot[\"city\"][::-1], top15_plot[\"composite_score\"][::-1], color=colors[::-1])\n",
    "\n",
    "# Add score labels\n",
    "for bar, score in zip(bars, top15_plot[\"composite_score\"][::-1]):\n",
    "    ax.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height()/2,\n",
    "            f'{score:.1f}', va='center', fontsize=9, fontweight='bold')\n",
    "\n",
    "ax.set_xlabel(\"Composite Site Score (0-100)\", fontsize=12)\n",
    "ax.set_title(\"🐼 Panda Express Site Rankings — Western PA\\nTop 15 Markets by Composite Score\", \n",
    "             fontsize=14, fontweight='bold', pad=15)\n",
    "ax.set_xlim(0, 110)\n",
    "ax.spines['top'].set_visible(False)\n",
    "ax.spines['right'].set_visible(False)\n",
    "plt.tight_layout()\n",
    "plt.savefig(\"site_rankings.png\", dpi=150, bbox_inches=\"tight\")\n",
    "plt.show()\n",
    "print(\"Chart saved as site_rankings.png\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ── Chart 2: Income vs Population Bubble Chart ───────────────────\n",
    "#   Bubble size = % target age group\n",
    "\n",
    "top25 = df.head(25).copy()\n",
    "\n",
    "fig = px.scatter(\n",
    "    top25,\n",
    "    x=\"total_population\",\n",
    "    y=\"median_household_income\",\n",
    "    size=\"pct_target_age\",\n",
    "    color=\"composite_score\",\n",
    "    hover_name=\"city\",\n",
    "    hover_data={\"county_name\": True, \"composite_score\": True, \"pct_target_age\": True},\n",
    "    color_continuous_scale=\"RdYlGn\",\n",
    "    size_max=40,\n",
    "    title=\"Population vs Income — Top 25 Western PA Markets<br><sup>Bubble size = % population aged 20-44 | Color = composite score</sup>\"\n",
    ")\n",
    "\n",
    "fig.update_layout(\n",
    "    xaxis_title=\"Total Population\",\n",
    "    yaxis_title=\"Median Household Income ($)\",\n",
    "    font=dict(size=12),\n",
    "    plot_bgcolor=\"white\"\n",
    ")\n",
    "\n",
    "fig.update_xaxes(showgrid=True, gridcolor=\"#f0f0f0\")\n",
    "fig.update_yaxes(showgrid=True, gridcolor=\"#f0f0f0\", tickformat=\"$,.0f\")\n",
    "\n",
    "fig.show()\n",
    "fig.write_html(\"market_bubble_chart.html\")\n",
    "print(\"Interactive chart saved as market_bubble_chart.html\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ── Chart 3: Score Breakdown Heatmap for Top 10 ──────────────────\n",
    "\n",
    "top10 = df.head(10).copy()\n",
    "\n",
    "score_cols = [f\"{col}_score\" for col in WEIGHTS.keys()]\n",
    "score_labels = [\"Population\", \"Income\", \"Target Age\", \"Education\", \"Commuters\"]\n",
    "\n",
    "heatmap_data = top10.set_index(\"city\")[score_cols]\n",
    "heatmap_data.columns = score_labels\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10, 6))\n",
    "sns.heatmap(\n",
    "    heatmap_data,\n",
    "    annot=True, fmt=\".0f\",\n",
    "    cmap=\"RdYlGn\",\n",
    "    linewidths=0.5,\n",
    "    ax=ax,\n",
    "    vmin=0, vmax=100\n",
    ")\n",
    "\n",
    "ax.set_title(\"Score Breakdown — Top 10 Markets\\n(0 = worst, 100 = best for each criterion)\", \n",
    "             fontsize=13, fontweight='bold', pad=12)\n",
    "ax.set_xlabel(\"\")\n",
    "ax.set_ylabel(\"\")\n",
    "plt.xticks(rotation=30, ha='right')\n",
    "plt.yticks(rotation=0)\n",
    "plt.tight_layout()\n",
    "plt.savefig(\"score_heatmap.png\", dpi=150, bbox_inches=\"tight\")\n",
    "plt.show()\n",
    "print(\"Heatmap saved as score_heatmap.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 7 — Export Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export full ranked results to Excel\n",
    "export_cols = [\n",
    "    \"rank\", \"city\", \"county_name\", \"total_population\",\n",
    "    \"median_household_income\", \"pct_target_age\", \"pct_college_educated\",\n",
    "    \"total_commuters\", \"pct_car_commute\", \"composite_score\"\n",
    "]\n",
    "\n",
    "export_df = df[export_cols].copy()\n",
    "export_df.columns = [\n",
    "    \"Rank\", \"City\", \"County\", \"Population\", \"Median HH Income\",\n",
    "    \"% Ages 20-44\", \"% College Educated\", \"Total Commuters\",\n",
    "    \"% Car Commuters\", \"Composite Score\"\n",
    "]\n",
    "\n",
    "export_df.to_excel(\"western_pa_site_analysis.xlsx\", index=False)\n",
    "print(f\"✅ Full results exported to western_pa_site_analysis.xlsx\")\n",
    "print(f\"   {len(export_df)} cities ranked\")\n",
    "\n",
    "# Print final summary\n",
    "print(\"\\n\" + \"=\"*50)\n",
    "print(\"📋 EXECUTIVE SUMMARY\")\n",
    "print(\"=\"*50)\n",
    "top3 = df.head(3)\n",
    "for i, (_, row) in enumerate(top3.iterrows(), 1):\n",
    "    print(f\"\"\"\n",
    "#{i} — {row['city']} ({row['county_name']} County)\n",
    "   Score: {row['composite_score']}/100\n",
    "   Population: {row['total_population']:,.0f}\n",
    "   Median Income: ${row['median_household_income']:,.0f}\n",
    "   Ages 20-44: {row['pct_target_age']:.1f}%\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Notes & Next Steps\n",
    "\n",
    "**This analysis covers:**\n",
    "- ✅ Population size and density\n",
    "- ✅ Household income (spending power)\n",
    "- ✅ Age demographics (core Panda Express customer base)\n",
    "- ✅ Education level (brand affinity proxy)\n",
    "- ✅ Commuter traffic (daytime customer volume proxy)\n",
    "\n",
    "**What to layer in next:**\n",
    "- 🗺️ Existing Panda Express locations (avoid cannibalization)\n",
    "- 🏢 Competitor density (Google Places API)\n",
    "- 🚗 Traffic counts (state DOT data)\n",
    "- 🏬 Available retail/commercial real estate\n",
    "- 🎓 College proximity (Erie → Gannon, Edinboro; Pittsburgh → Pitt, CMU, Duquesne)\n",
    "\n",
    "**Tweak the weights in Step 5** to test different strategic priorities.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}