{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 🐼 Panda Express Site Analysis β€” Western Pennsylvania\n", "**Goal:** Rank cities in western PA by their potential as a new Panda Express location \n", "**Data Source:** U.S. Census Bureau API (American Community Survey 5-Year Estimates) \n", "**Last Updated:** 2024\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 0 β€” Setup & API Key" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install required libraries (run once)\n", "# !pip install requests pandas geopandas folium matplotlib seaborn plotly" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import matplotlib.cm as cm\n", "import seaborn as sns\n", "import plotly.express as px\n", "import plotly.graph_objects as go\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "# ─────────────────────────────────────────────────────────────────\n", "# πŸ”‘ REPLACE WITH YOUR CENSUS API KEY\n", "# Get a free key at: https://api.census.gov/data/key_signup.html\n", "# ─────────────────────────────────────────────────────────────────\n", "\n", "#CENSUS_API_KEY = \"YOUR_API_KEY_HERE\"\n", "CENSUS_API_KEY = \"5586e5b0855b88b8b1f22fd2b8ca0362f0ab417a\"\n", "\n", "# Census ACS 5-Year dataset endpoint\n", "BASE_URL = \"https://api.census.gov/data/2022/acs/acs5\"\n", "\n", "print(\"βœ… Libraries loaded successfully\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 β€” Define Western PA Counties\n", "We'll pull data for all cities/places within these 15 counties." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Western PA county FIPS codes (state FIPS = 42 for Pennsylvania)\n", "WESTERN_PA_COUNTIES = {\n", " \"003\": \"Allegheny\",\n", " \"005\": \"Armstrong\",\n", " \"007\": \"Beaver\",\n", " \"019\": \"Butler\",\n", " \"031\": \"Clarion\",\n", " \"033\": \"Clearfield\",\n", " \"049\": \"Erie\",\n", " \"051\": \"Fayette\",\n", " \"059\": \"Greene\",\n", " \"063\": \"Indiana\",\n", " \"073\": \"Lawrence\",\n", " \"083\": \"Mercer\",\n", " \"121\": \"Venango\",\n", " \"125\": \"Washington\",\n", " \"129\": \"Westmoreland\",\n", " \"039\": \"Crawford\",\n", "}\n", "\n", "STATE_FIPS = \"42\" # Pennsylvania\n", "\n", "print(f\"πŸ“ Analyzing {len(WESTERN_PA_COUNTIES)} counties in Western PA\")\n", "for fips, name in WESTERN_PA_COUNTIES.items():\n", " print(f\" {name} County ({fips})\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 β€” Define Census Variables to Pull\n", "These ACS variables map to our key site selection criteria." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Census variable codes β†’ what they mean for site selection\n", "VARIABLES = {\n", " \"B01003_001E\": \"total_population\", # Total population\n", " \"B19013_001E\": \"median_household_income\", # Median household income\n", " \"B01001_007E\": \"male_20_24\", # Males 20-24\n", " \"B01001_008E\": \"male_25_29\", # Males 25-29\n", " \"B01001_009E\": \"male_30_34\", # Males 30-34\n", " \"B01001_010E\": \"male_35_44\", # Males 35-44\n", " \"B01001_031E\": \"female_20_24\", # Females 20-24\n", " \"B01001_032E\": \"female_25_29\", # Females 25-29\n", " \"B01001_033E\": \"female_30_34\", # Females 30-34\n", " \"B01001_034E\": \"female_35_44\", # Females 35-44\n", " \"B08301_001E\": \"total_commuters\", # Workers who commute (proxy for daytime pop)\n", " \"B08301_002E\": \"commute_by_car\", # Drive alone to work\n", " \"B08301_010E\": \"commute_by_transit\", # Public transit commuters\n", " \"B25001_001E\": \"total_housing_units\", # Total housing units\n", " \"B15003_022E\": \"bachelors_degree\", # Bachelor's degree holders\n", " \"B15003_001E\": \"pop_25_and_over\", # Population 25+ (denominator for education)\n", "}\n", "\n", "var_list = \",\".join(VARIABLES.keys())\n", "print(f\"πŸ“Š Pulling {len(VARIABLES)} variables from Census API\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 β€” Pull Data from Census API" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# STEP 3 β€” Stable Census Pull Using County Subdivisions\n", "\n", "def fetch_county_subdivisions(county_fips, county_name):\n", " params = {\n", " \"get\": f\"NAME,{var_list}\",\n", " \"for\": \"county subdivision:*\",\n", " \"in\": f\"state:{STATE_FIPS}+county:{county_fips}\",\n", " \"key\": CENSUS_API_KEY\n", " }\n", "\n", " response = requests.get(BASE_URL, params=params)\n", "\n", " if response.status_code != 200:\n", " print(\"Census API Error:\")\n", " print(response.text)\n", " response.raise_for_status()\n", "\n", " data = response.json()\n", " df = pd.DataFrame(data[1:], columns=data[0])\n", " df[\"county_name\"] = county_name\n", "\n", " return df\n", "\n", "\n", "print(\"Pulling Western PA county subdivisions...\")\n", "\n", "all_data = []\n", "\n", "for fips, name in WESTERN_PA_COUNTIES.items():\n", " print(f\" β†’ {name} County\")\n", " df_temp = fetch_county_subdivisions(fips, name)\n", " all_data.append(df_temp)\n", "\n", "raw_df = pd.concat(all_data, ignore_index=True)\n", "\n", "print(f\"Pulled {len(raw_df)} total subdivisions.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4 β€” Clean & Transform Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Rename columns to readable names\n", "df = raw_df.rename(columns=VARIABLES)\n", "\n", "# Clean city names (remove \", Pennsylvania\" suffix)\n", "df[\"city\"] = df[\"NAME\"].str.replace(\", Pennsylvania\", \"\", regex=False).str.strip()\n", "\n", "# Convert numeric columns (Census returns everything as strings)\n", "numeric_cols = list(VARIABLES.values())\n", "for col in numeric_cols:\n", " df[col] = pd.to_numeric(df[col], errors=\"coerce\")\n", "\n", "# Replace negative values (Census uses -666666666 for N/A)\n", "df[numeric_cols] = df[numeric_cols].replace(-666666666, np.nan)\n", "\n", "# ── Derived metrics ──────────────────────────────────────────────\n", "\n", "# Core Panda Express demographic: ages 20-44\n", "df[\"pop_20_44\"] = (\n", " df[[\"male_20_24\", \"male_25_29\", \"male_30_34\", \"male_35_44\",\n", " \"female_20_24\", \"female_25_29\", \"female_30_34\", \"female_35_44\"]]\n", " .sum(axis=1)\n", ")\n", "\n", "# % of population in target age group\n", "df[\"pct_target_age\"] = (df[\"pop_20_44\"] / df[\"total_population\"] * 100).round(1)\n", "\n", "# % with bachelor's degree (educated workforce = higher disposable income tendency)\n", "df[\"pct_college_educated\"] = (df[\"bachelors_degree\"] / df[\"pop_25_and_over\"] * 100).round(1)\n", "\n", "# Car access ratio (important for suburban fast casual)\n", "df[\"pct_car_commute\"] = (df[\"commute_by_car\"] / df[\"total_commuters\"] * 100).round(1)\n", "\n", "# Filter: only include places with population > 2,000 (minimum viable market)\n", "df = df[df[\"total_population\"] >= 2000].copy()\n", "\n", "print(f\"βœ… After filtering (pop β‰₯ 2,000): {len(df)} cities remain\")\n", "df[[\"city\", \"county_name\", \"total_population\", \"median_household_income\", \"pct_target_age\"]].head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5 β€” Score & Rank Cities\n", "Each city gets a **composite score (0–100)** based on weighted criteria." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def normalize(series):\n", " \"\"\"Min-max normalize a series to 0-100 scale.\"\"\"\n", " min_val = series.min()\n", " max_val = series.max()\n", " if max_val == min_val:\n", " return pd.Series([50] * len(series), index=series.index)\n", " return ((series - min_val) / (max_val - min_val) * 100)\n", "\n", "\n", "# ── Scoring weights (must sum to 1.0) ────────────────────────────\n", "#\n", "# Adjust these weights to reflect your boss's priorities!\n", "#\n", "WEIGHTS = {\n", " \"total_population\": 0.30, # Larger market = more customers\n", " \"median_household_income\": 0.25, # Spending power\n", " \"pct_target_age\": 0.20, # Core Panda Express demographic\n", " \"pct_college_educated\": 0.10, # Proxy for brand affinity\n", " \"total_commuters\": 0.15, # Daytime traffic proxy\n", "}\n", "\n", "print(f\"Total weight: {sum(WEIGHTS.values())} (should be 1.0)\")\n", "\n", "# Normalize each scoring dimension\n", "for col in WEIGHTS.keys():\n", " df[f\"{col}_score\"] = normalize(df[col].fillna(df[col].median()))\n", "\n", "# Calculate composite score\n", "df[\"composite_score\"] = sum(\n", " df[f\"{col}_score\"] * weight \n", " for col, weight in WEIGHTS.items()\n", ").round(1)\n", "\n", "# Rank cities\n", "df[\"rank\"] = df[\"composite_score\"].rank(ascending=False, method=\"min\").astype(int)\n", "df = df.sort_values(\"rank\")\n", "\n", "print(\"\\nπŸ† Top 15 Cities for Panda Express β€” Western PA\")\n", "print(\"=\" * 65)\n", "\n", "top15 = df.head(15)[[\n", " \"rank\", \"city\", \"county_name\", \"total_population\",\n", " \"median_household_income\", \"pct_target_age\", \"composite_score\"\n", "]].copy()\n", "\n", "top15[\"median_household_income\"] = top15[\"median_household_income\"].apply(\n", " lambda x: f\"${x:,.0f}\" if pd.notna(x) else \"N/A\"\n", ")\n", "top15[\"total_population\"] = top15[\"total_population\"].apply(lambda x: f\"{x:,.0f}\")\n", "top15[\"pct_target_age\"] = top15[\"pct_target_age\"].apply(lambda x: f\"{x:.1f}%\" if pd.notna(x) else \"N/A\")\n", "\n", "top15.columns = [\"Rank\", \"City\", \"County\", \"Population\", \"Median Income\", \"% Ages 20-44\", \"Score\"]\n", "print(top15.to_string(index=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6 β€” Visualizations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ── Chart 1: Top 15 Cities by Composite Score ────────────────────\n", "\n", "top15_plot = df.head(15).copy()\n", "\n", "fig, ax = plt.subplots(figsize=(12, 7))\n", "\n", "colors = plt.cm.RdYlGn(np.linspace(0.3, 0.9, len(top15_plot)))\n", "bars = ax.barh(top15_plot[\"city\"][::-1], top15_plot[\"composite_score\"][::-1], color=colors[::-1])\n", "\n", "# Add score labels\n", "for bar, score in zip(bars, top15_plot[\"composite_score\"][::-1]):\n", " ax.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height()/2,\n", " f'{score:.1f}', va='center', fontsize=9, fontweight='bold')\n", "\n", "ax.set_xlabel(\"Composite Site Score (0-100)\", fontsize=12)\n", "ax.set_title(\"🐼 Panda Express Site Rankings β€” Western PA\\nTop 15 Markets by Composite Score\", \n", " fontsize=14, fontweight='bold', pad=15)\n", "ax.set_xlim(0, 110)\n", "ax.spines['top'].set_visible(False)\n", "ax.spines['right'].set_visible(False)\n", "plt.tight_layout()\n", "plt.savefig(\"site_rankings.png\", dpi=150, bbox_inches=\"tight\")\n", "plt.show()\n", "print(\"Chart saved as site_rankings.png\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ── Chart 2: Income vs Population Bubble Chart ───────────────────\n", "# Bubble size = % target age group\n", "\n", "top25 = df.head(25).copy()\n", "\n", "fig = px.scatter(\n", " top25,\n", " x=\"total_population\",\n", " y=\"median_household_income\",\n", " size=\"pct_target_age\",\n", " color=\"composite_score\",\n", " hover_name=\"city\",\n", " hover_data={\"county_name\": True, \"composite_score\": True, \"pct_target_age\": True},\n", " color_continuous_scale=\"RdYlGn\",\n", " size_max=40,\n", " title=\"Population vs Income β€” Top 25 Western PA Markets
Bubble size = % population aged 20-44 | Color = composite score\"\n", ")\n", "\n", "fig.update_layout(\n", " xaxis_title=\"Total Population\",\n", " yaxis_title=\"Median Household Income ($)\",\n", " font=dict(size=12),\n", " plot_bgcolor=\"white\"\n", ")\n", "\n", "fig.update_xaxes(showgrid=True, gridcolor=\"#f0f0f0\")\n", "fig.update_yaxes(showgrid=True, gridcolor=\"#f0f0f0\", tickformat=\"$,.0f\")\n", "\n", "fig.show()\n", "fig.write_html(\"market_bubble_chart.html\")\n", "print(\"Interactive chart saved as market_bubble_chart.html\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ── Chart 3: Score Breakdown Heatmap for Top 10 ──────────────────\n", "\n", "top10 = df.head(10).copy()\n", "\n", "score_cols = [f\"{col}_score\" for col in WEIGHTS.keys()]\n", "score_labels = [\"Population\", \"Income\", \"Target Age\", \"Education\", \"Commuters\"]\n", "\n", "heatmap_data = top10.set_index(\"city\")[score_cols]\n", "heatmap_data.columns = score_labels\n", "\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "sns.heatmap(\n", " heatmap_data,\n", " annot=True, fmt=\".0f\",\n", " cmap=\"RdYlGn\",\n", " linewidths=0.5,\n", " ax=ax,\n", " vmin=0, vmax=100\n", ")\n", "\n", "ax.set_title(\"Score Breakdown β€” Top 10 Markets\\n(0 = worst, 100 = best for each criterion)\", \n", " fontsize=13, fontweight='bold', pad=12)\n", "ax.set_xlabel(\"\")\n", "ax.set_ylabel(\"\")\n", "plt.xticks(rotation=30, ha='right')\n", "plt.yticks(rotation=0)\n", "plt.tight_layout()\n", "plt.savefig(\"score_heatmap.png\", dpi=150, bbox_inches=\"tight\")\n", "plt.show()\n", "print(\"Heatmap saved as score_heatmap.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 7 β€” Export Results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Export full ranked results to Excel\n", "export_cols = [\n", " \"rank\", \"city\", \"county_name\", \"total_population\",\n", " \"median_household_income\", \"pct_target_age\", \"pct_college_educated\",\n", " \"total_commuters\", \"pct_car_commute\", \"composite_score\"\n", "]\n", "\n", "export_df = df[export_cols].copy()\n", "export_df.columns = [\n", " \"Rank\", \"City\", \"County\", \"Population\", \"Median HH Income\",\n", " \"% Ages 20-44\", \"% College Educated\", \"Total Commuters\",\n", " \"% Car Commuters\", \"Composite Score\"\n", "]\n", "\n", "export_df.to_excel(\"western_pa_site_analysis.xlsx\", index=False)\n", "print(f\"βœ… Full results exported to western_pa_site_analysis.xlsx\")\n", "print(f\" {len(export_df)} cities ranked\")\n", "\n", "# Print final summary\n", "print(\"\\n\" + \"=\"*50)\n", "print(\"πŸ“‹ EXECUTIVE SUMMARY\")\n", "print(\"=\"*50)\n", "top3 = df.head(3)\n", "for i, (_, row) in enumerate(top3.iterrows(), 1):\n", " print(f\"\"\"\n", "#{i} β€” {row['city']} ({row['county_name']} County)\n", " Score: {row['composite_score']}/100\n", " Population: {row['total_population']:,.0f}\n", " Median Income: ${row['median_household_income']:,.0f}\n", " Ages 20-44: {row['pct_target_age']:.1f}%\"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Notes & Next Steps\n", "\n", "**This analysis covers:**\n", "- βœ… Population size and density\n", "- βœ… Household income (spending power)\n", "- βœ… Age demographics (core Panda Express customer base)\n", "- βœ… Education level (brand affinity proxy)\n", "- βœ… Commuter traffic (daytime customer volume proxy)\n", "\n", "**What to layer in next:**\n", "- πŸ—ΊοΈ Existing Panda Express locations (avoid cannibalization)\n", "- 🏒 Competitor density (Google Places API)\n", "- πŸš— Traffic counts (state DOT data)\n", "- 🏬 Available retail/commercial real estate\n", "- πŸŽ“ College proximity (Erie β†’ Gannon, Edinboro; Pittsburgh β†’ Pitt, CMU, Duquesne)\n", "\n", "**Tweak the weights in Step 5** to test different strategic priorities.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }