{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# πΌ Panda Express Site Analysis β Western Pennsylvania\n",
"**Goal:** Rank cities in western PA by their potential as a new Panda Express location \n",
"**Data Source:** U.S. Census Bureau API (American Community Survey 5-Year Estimates) \n",
"**Last Updated:** 2024\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 0 β Setup & API Key"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install required libraries (run once)\n",
"# !pip install requests pandas geopandas folium matplotlib seaborn plotly"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.cm as cm\n",
"import seaborn as sns\n",
"import plotly.express as px\n",
"import plotly.graph_objects as go\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n",
"# π REPLACE WITH YOUR CENSUS API KEY\n",
"# Get a free key at: https://api.census.gov/data/key_signup.html\n",
"# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n",
"\n",
"#CENSUS_API_KEY = \"YOUR_API_KEY_HERE\"\n",
"CENSUS_API_KEY = \"5586e5b0855b88b8b1f22fd2b8ca0362f0ab417a\"\n",
"\n",
"# Census ACS 5-Year dataset endpoint\n",
"BASE_URL = \"https://api.census.gov/data/2022/acs/acs5\"\n",
"\n",
"print(\"β
Libraries loaded successfully\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1 β Define Western PA Counties\n",
"We'll pull data for all cities/places within these 15 counties."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Western PA county FIPS codes (state FIPS = 42 for Pennsylvania)\n",
"WESTERN_PA_COUNTIES = {\n",
" \"003\": \"Allegheny\",\n",
" \"005\": \"Armstrong\",\n",
" \"007\": \"Beaver\",\n",
" \"019\": \"Butler\",\n",
" \"031\": \"Clarion\",\n",
" \"033\": \"Clearfield\",\n",
" \"049\": \"Erie\",\n",
" \"051\": \"Fayette\",\n",
" \"059\": \"Greene\",\n",
" \"063\": \"Indiana\",\n",
" \"073\": \"Lawrence\",\n",
" \"083\": \"Mercer\",\n",
" \"121\": \"Venango\",\n",
" \"125\": \"Washington\",\n",
" \"129\": \"Westmoreland\",\n",
" \"039\": \"Crawford\",\n",
"}\n",
"\n",
"STATE_FIPS = \"42\" # Pennsylvania\n",
"\n",
"print(f\"π Analyzing {len(WESTERN_PA_COUNTIES)} counties in Western PA\")\n",
"for fips, name in WESTERN_PA_COUNTIES.items():\n",
" print(f\" {name} County ({fips})\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2 β Define Census Variables to Pull\n",
"These ACS variables map to our key site selection criteria."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Census variable codes β what they mean for site selection\n",
"VARIABLES = {\n",
" \"B01003_001E\": \"total_population\", # Total population\n",
" \"B19013_001E\": \"median_household_income\", # Median household income\n",
" \"B01001_007E\": \"male_20_24\", # Males 20-24\n",
" \"B01001_008E\": \"male_25_29\", # Males 25-29\n",
" \"B01001_009E\": \"male_30_34\", # Males 30-34\n",
" \"B01001_010E\": \"male_35_44\", # Males 35-44\n",
" \"B01001_031E\": \"female_20_24\", # Females 20-24\n",
" \"B01001_032E\": \"female_25_29\", # Females 25-29\n",
" \"B01001_033E\": \"female_30_34\", # Females 30-34\n",
" \"B01001_034E\": \"female_35_44\", # Females 35-44\n",
" \"B08301_001E\": \"total_commuters\", # Workers who commute (proxy for daytime pop)\n",
" \"B08301_002E\": \"commute_by_car\", # Drive alone to work\n",
" \"B08301_010E\": \"commute_by_transit\", # Public transit commuters\n",
" \"B25001_001E\": \"total_housing_units\", # Total housing units\n",
" \"B15003_022E\": \"bachelors_degree\", # Bachelor's degree holders\n",
" \"B15003_001E\": \"pop_25_and_over\", # Population 25+ (denominator for education)\n",
"}\n",
"\n",
"var_list = \",\".join(VARIABLES.keys())\n",
"print(f\"π Pulling {len(VARIABLES)} variables from Census API\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3 β Pull Data from Census API"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# STEP 3 β Stable Census Pull Using County Subdivisions\n",
"\n",
"def fetch_county_subdivisions(county_fips, county_name):\n",
" params = {\n",
" \"get\": f\"NAME,{var_list}\",\n",
" \"for\": \"county subdivision:*\",\n",
" \"in\": f\"state:{STATE_FIPS}+county:{county_fips}\",\n",
" \"key\": CENSUS_API_KEY\n",
" }\n",
"\n",
" response = requests.get(BASE_URL, params=params)\n",
"\n",
" if response.status_code != 200:\n",
" print(\"Census API Error:\")\n",
" print(response.text)\n",
" response.raise_for_status()\n",
"\n",
" data = response.json()\n",
" df = pd.DataFrame(data[1:], columns=data[0])\n",
" df[\"county_name\"] = county_name\n",
"\n",
" return df\n",
"\n",
"\n",
"print(\"Pulling Western PA county subdivisions...\")\n",
"\n",
"all_data = []\n",
"\n",
"for fips, name in WESTERN_PA_COUNTIES.items():\n",
" print(f\" β {name} County\")\n",
" df_temp = fetch_county_subdivisions(fips, name)\n",
" all_data.append(df_temp)\n",
"\n",
"raw_df = pd.concat(all_data, ignore_index=True)\n",
"\n",
"print(f\"Pulled {len(raw_df)} total subdivisions.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4 β Clean & Transform Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Rename columns to readable names\n",
"df = raw_df.rename(columns=VARIABLES)\n",
"\n",
"# Clean city names (remove \", Pennsylvania\" suffix)\n",
"df[\"city\"] = df[\"NAME\"].str.replace(\", Pennsylvania\", \"\", regex=False).str.strip()\n",
"\n",
"# Convert numeric columns (Census returns everything as strings)\n",
"numeric_cols = list(VARIABLES.values())\n",
"for col in numeric_cols:\n",
" df[col] = pd.to_numeric(df[col], errors=\"coerce\")\n",
"\n",
"# Replace negative values (Census uses -666666666 for N/A)\n",
"df[numeric_cols] = df[numeric_cols].replace(-666666666, np.nan)\n",
"\n",
"# ββ Derived metrics ββββββββββββββββββββββββββββββββββββββββββββββ\n",
"\n",
"# Core Panda Express demographic: ages 20-44\n",
"df[\"pop_20_44\"] = (\n",
" df[[\"male_20_24\", \"male_25_29\", \"male_30_34\", \"male_35_44\",\n",
" \"female_20_24\", \"female_25_29\", \"female_30_34\", \"female_35_44\"]]\n",
" .sum(axis=1)\n",
")\n",
"\n",
"# % of population in target age group\n",
"df[\"pct_target_age\"] = (df[\"pop_20_44\"] / df[\"total_population\"] * 100).round(1)\n",
"\n",
"# % with bachelor's degree (educated workforce = higher disposable income tendency)\n",
"df[\"pct_college_educated\"] = (df[\"bachelors_degree\"] / df[\"pop_25_and_over\"] * 100).round(1)\n",
"\n",
"# Car access ratio (important for suburban fast casual)\n",
"df[\"pct_car_commute\"] = (df[\"commute_by_car\"] / df[\"total_commuters\"] * 100).round(1)\n",
"\n",
"# Filter: only include places with population > 2,000 (minimum viable market)\n",
"df = df[df[\"total_population\"] >= 2000].copy()\n",
"\n",
"print(f\"β
After filtering (pop β₯ 2,000): {len(df)} cities remain\")\n",
"df[[\"city\", \"county_name\", \"total_population\", \"median_household_income\", \"pct_target_age\"]].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5 β Score & Rank Cities\n",
"Each city gets a **composite score (0β100)** based on weighted criteria."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def normalize(series):\n",
" \"\"\"Min-max normalize a series to 0-100 scale.\"\"\"\n",
" min_val = series.min()\n",
" max_val = series.max()\n",
" if max_val == min_val:\n",
" return pd.Series([50] * len(series), index=series.index)\n",
" return ((series - min_val) / (max_val - min_val) * 100)\n",
"\n",
"\n",
"# ββ Scoring weights (must sum to 1.0) ββββββββββββββββββββββββββββ\n",
"#\n",
"# Adjust these weights to reflect your boss's priorities!\n",
"#\n",
"WEIGHTS = {\n",
" \"total_population\": 0.30, # Larger market = more customers\n",
" \"median_household_income\": 0.25, # Spending power\n",
" \"pct_target_age\": 0.20, # Core Panda Express demographic\n",
" \"pct_college_educated\": 0.10, # Proxy for brand affinity\n",
" \"total_commuters\": 0.15, # Daytime traffic proxy\n",
"}\n",
"\n",
"print(f\"Total weight: {sum(WEIGHTS.values())} (should be 1.0)\")\n",
"\n",
"# Normalize each scoring dimension\n",
"for col in WEIGHTS.keys():\n",
" df[f\"{col}_score\"] = normalize(df[col].fillna(df[col].median()))\n",
"\n",
"# Calculate composite score\n",
"df[\"composite_score\"] = sum(\n",
" df[f\"{col}_score\"] * weight \n",
" for col, weight in WEIGHTS.items()\n",
").round(1)\n",
"\n",
"# Rank cities\n",
"df[\"rank\"] = df[\"composite_score\"].rank(ascending=False, method=\"min\").astype(int)\n",
"df = df.sort_values(\"rank\")\n",
"\n",
"print(\"\\nπ Top 15 Cities for Panda Express β Western PA\")\n",
"print(\"=\" * 65)\n",
"\n",
"top15 = df.head(15)[[\n",
" \"rank\", \"city\", \"county_name\", \"total_population\",\n",
" \"median_household_income\", \"pct_target_age\", \"composite_score\"\n",
"]].copy()\n",
"\n",
"top15[\"median_household_income\"] = top15[\"median_household_income\"].apply(\n",
" lambda x: f\"${x:,.0f}\" if pd.notna(x) else \"N/A\"\n",
")\n",
"top15[\"total_population\"] = top15[\"total_population\"].apply(lambda x: f\"{x:,.0f}\")\n",
"top15[\"pct_target_age\"] = top15[\"pct_target_age\"].apply(lambda x: f\"{x:.1f}%\" if pd.notna(x) else \"N/A\")\n",
"\n",
"top15.columns = [\"Rank\", \"City\", \"County\", \"Population\", \"Median Income\", \"% Ages 20-44\", \"Score\"]\n",
"print(top15.to_string(index=False))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6 β Visualizations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ββ Chart 1: Top 15 Cities by Composite Score ββββββββββββββββββββ\n",
"\n",
"top15_plot = df.head(15).copy()\n",
"\n",
"fig, ax = plt.subplots(figsize=(12, 7))\n",
"\n",
"colors = plt.cm.RdYlGn(np.linspace(0.3, 0.9, len(top15_plot)))\n",
"bars = ax.barh(top15_plot[\"city\"][::-1], top15_plot[\"composite_score\"][::-1], color=colors[::-1])\n",
"\n",
"# Add score labels\n",
"for bar, score in zip(bars, top15_plot[\"composite_score\"][::-1]):\n",
" ax.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height()/2,\n",
" f'{score:.1f}', va='center', fontsize=9, fontweight='bold')\n",
"\n",
"ax.set_xlabel(\"Composite Site Score (0-100)\", fontsize=12)\n",
"ax.set_title(\"πΌ Panda Express Site Rankings β Western PA\\nTop 15 Markets by Composite Score\", \n",
" fontsize=14, fontweight='bold', pad=15)\n",
"ax.set_xlim(0, 110)\n",
"ax.spines['top'].set_visible(False)\n",
"ax.spines['right'].set_visible(False)\n",
"plt.tight_layout()\n",
"plt.savefig(\"site_rankings.png\", dpi=150, bbox_inches=\"tight\")\n",
"plt.show()\n",
"print(\"Chart saved as site_rankings.png\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ββ Chart 2: Income vs Population Bubble Chart βββββββββββββββββββ\n",
"# Bubble size = % target age group\n",
"\n",
"top25 = df.head(25).copy()\n",
"\n",
"fig = px.scatter(\n",
" top25,\n",
" x=\"total_population\",\n",
" y=\"median_household_income\",\n",
" size=\"pct_target_age\",\n",
" color=\"composite_score\",\n",
" hover_name=\"city\",\n",
" hover_data={\"county_name\": True, \"composite_score\": True, \"pct_target_age\": True},\n",
" color_continuous_scale=\"RdYlGn\",\n",
" size_max=40,\n",
" title=\"Population vs Income β Top 25 Western PA Markets
Bubble size = % population aged 20-44 | Color = composite score\"\n",
")\n",
"\n",
"fig.update_layout(\n",
" xaxis_title=\"Total Population\",\n",
" yaxis_title=\"Median Household Income ($)\",\n",
" font=dict(size=12),\n",
" plot_bgcolor=\"white\"\n",
")\n",
"\n",
"fig.update_xaxes(showgrid=True, gridcolor=\"#f0f0f0\")\n",
"fig.update_yaxes(showgrid=True, gridcolor=\"#f0f0f0\", tickformat=\"$,.0f\")\n",
"\n",
"fig.show()\n",
"fig.write_html(\"market_bubble_chart.html\")\n",
"print(\"Interactive chart saved as market_bubble_chart.html\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ββ Chart 3: Score Breakdown Heatmap for Top 10 ββββββββββββββββββ\n",
"\n",
"top10 = df.head(10).copy()\n",
"\n",
"score_cols = [f\"{col}_score\" for col in WEIGHTS.keys()]\n",
"score_labels = [\"Population\", \"Income\", \"Target Age\", \"Education\", \"Commuters\"]\n",
"\n",
"heatmap_data = top10.set_index(\"city\")[score_cols]\n",
"heatmap_data.columns = score_labels\n",
"\n",
"fig, ax = plt.subplots(figsize=(10, 6))\n",
"sns.heatmap(\n",
" heatmap_data,\n",
" annot=True, fmt=\".0f\",\n",
" cmap=\"RdYlGn\",\n",
" linewidths=0.5,\n",
" ax=ax,\n",
" vmin=0, vmax=100\n",
")\n",
"\n",
"ax.set_title(\"Score Breakdown β Top 10 Markets\\n(0 = worst, 100 = best for each criterion)\", \n",
" fontsize=13, fontweight='bold', pad=12)\n",
"ax.set_xlabel(\"\")\n",
"ax.set_ylabel(\"\")\n",
"plt.xticks(rotation=30, ha='right')\n",
"plt.yticks(rotation=0)\n",
"plt.tight_layout()\n",
"plt.savefig(\"score_heatmap.png\", dpi=150, bbox_inches=\"tight\")\n",
"plt.show()\n",
"print(\"Heatmap saved as score_heatmap.png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 7 β Export Results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Export full ranked results to Excel\n",
"export_cols = [\n",
" \"rank\", \"city\", \"county_name\", \"total_population\",\n",
" \"median_household_income\", \"pct_target_age\", \"pct_college_educated\",\n",
" \"total_commuters\", \"pct_car_commute\", \"composite_score\"\n",
"]\n",
"\n",
"export_df = df[export_cols].copy()\n",
"export_df.columns = [\n",
" \"Rank\", \"City\", \"County\", \"Population\", \"Median HH Income\",\n",
" \"% Ages 20-44\", \"% College Educated\", \"Total Commuters\",\n",
" \"% Car Commuters\", \"Composite Score\"\n",
"]\n",
"\n",
"export_df.to_excel(\"western_pa_site_analysis.xlsx\", index=False)\n",
"print(f\"β
Full results exported to western_pa_site_analysis.xlsx\")\n",
"print(f\" {len(export_df)} cities ranked\")\n",
"\n",
"# Print final summary\n",
"print(\"\\n\" + \"=\"*50)\n",
"print(\"π EXECUTIVE SUMMARY\")\n",
"print(\"=\"*50)\n",
"top3 = df.head(3)\n",
"for i, (_, row) in enumerate(top3.iterrows(), 1):\n",
" print(f\"\"\"\n",
"#{i} β {row['city']} ({row['county_name']} County)\n",
" Score: {row['composite_score']}/100\n",
" Population: {row['total_population']:,.0f}\n",
" Median Income: ${row['median_household_income']:,.0f}\n",
" Ages 20-44: {row['pct_target_age']:.1f}%\"\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## Notes & Next Steps\n",
"\n",
"**This analysis covers:**\n",
"- β
Population size and density\n",
"- β
Household income (spending power)\n",
"- β
Age demographics (core Panda Express customer base)\n",
"- β
Education level (brand affinity proxy)\n",
"- β
Commuter traffic (daytime customer volume proxy)\n",
"\n",
"**What to layer in next:**\n",
"- πΊοΈ Existing Panda Express locations (avoid cannibalization)\n",
"- π’ Competitor density (Google Places API)\n",
"- π Traffic counts (state DOT data)\n",
"- π¬ Available retail/commercial real estate\n",
"- π College proximity (Erie β Gannon, Edinboro; Pittsburgh β Pitt, CMU, Duquesne)\n",
"\n",
"**Tweak the weights in Step 5** to test different strategic priorities.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}