{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a598f44c",
   "metadata": {},
   "source": [
    "# 🏦 Economic Data Explorer: Fed & World Bank APIs in Python\n",
    "\n",
    "## FINC 332 — Introduction to Data Analytics, Data Mining, and Data Visualization\n",
    "### Developed by Dr. Benyawarath \"Yaa\" Nithithanatchinnapat\n",
    "**Learning Objectives:**\n",
    "- Connect to real-world economic data APIs (FRED & World Bank)\n",
    "- Clean, transform, and merge time-series data with `pandas`\n",
    "- Create compelling visualizations with `matplotlib` and `seaborn`\n",
    "- Perform basic statistical analysis (correlation, regression, distributions)\n",
    "\n",
    "**Why this matters:** Economists, analysts, and data scientists don't work with pre-cleaned CSVs — they pull live data from institutional APIs. Today you'll do exactly that.\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e37ee5d5",
   "metadata": {},
   "source": [
    "## 🔧 Section 0: Setup & Installations\n",
    "\n",
    "Run this cell first to make sure all packages are installed.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33c335c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages (this may take a moment the first time)\n",
    "!pip install fredapi wbgapi pandas matplotlib seaborn scipy -q\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.dates as mdates\n",
    "import seaborn as sns\n",
    "from scipy import stats\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Make our plots look nice\n",
    "sns.set_style(\"whitegrid\")\n",
    "plt.rcParams['figure.figsize'] = (12, 6)\n",
    "plt.rcParams['font.size'] = 12\n",
    "\n",
    "print(\"✅ All packages loaded successfully!\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1db13fb9",
   "metadata": {},
   "source": [
    "---\n",
    "## 📊 Section 1: Getting Data from the Federal Reserve (FRED)\n",
    "\n",
    "**FRED** (Federal Reserve Economic Data) is the gold standard for US economic data. It has over 800,000 data series — everything from GDP to avocado prices.\n",
    "\n",
    "### Step 1: Get Your Free API Key\n",
    "1. Go to https://fred.stlouisfed.org/docs/api/api_key.html\n",
    "2. Create a free account\n",
    "3. Copy your API key\n",
    "\n",
    "> **💡 Pro tip:** Never hardcode API keys in your notebook! Use environment variables or a separate config file. For today's class demo, we'll use a simple variable — but in production, use `os.environ` or a `.env` file.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9b15c70",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ============================================\n",
    "# 🔑 PASTE YOUR FRED API KEY HERE\n",
    "# ============================================\n",
    "FRED_API_KEY = \"YOUR FRED API KEY\"\n",
    "\n",
    "# In production, you'd do this instead:\n",
    "# import os\n",
    "# FRED_API_KEY = os.environ.get(\"FRED_API_KEY\")\n",
    "\n",
    "# For the sake of time, here's my API key: e85280ebf7739fb5396c6a583c6a26b5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4fdf8304",
   "metadata": {},
   "outputs": [],
   "source": [
    "from fredapi import Fred\n",
    "\n",
    "fred = Fred(api_key=FRED_API_KEY)\n",
    "\n",
    "# Let's grab some key economic indicators\n",
    "# Each series has a unique ID — you can search at https://fred.stlouisfed.org\n",
    "\n",
    "print(\"📡 Fetching data from the Federal Reserve...\")\n",
    "print(\"   This is a LIVE API call — you're pulling real data right now!\\n\")\n",
    "\n",
    "# GDP (Quarterly, Seasonally Adjusted Annual Rate)\n",
    "gdp = fred.get_series('GDP', observation_start='2000-01-01')\n",
    "gdp.name = 'GDP'\n",
    "\n",
    "# Unemployment Rate (Monthly)\n",
    "unemployment = fred.get_series('UNRATE', observation_start='2000-01-01')\n",
    "unemployment.name = 'Unemployment Rate'\n",
    "\n",
    "# Consumer Price Index (Monthly) — our inflation measure\n",
    "cpi = fred.get_series('CPIAUCSL', observation_start='2000-01-01')\n",
    "cpi.name = 'CPI'\n",
    "\n",
    "# Federal Funds Rate (Monthly)\n",
    "fed_funds = fred.get_series('FEDFUNDS', observation_start='2000-01-01')\n",
    "fed_funds.name = 'Fed Funds Rate'\n",
    "\n",
    "# 10-Year Treasury Yield\n",
    "treasury_10y = fred.get_series('DGS10', observation_start='2000-01-01')\n",
    "treasury_10y.name = '10-Year Treasury'\n",
    "\n",
    "print(f\"✅ GDP:              {len(gdp)} observations\")\n",
    "print(f\"✅ Unemployment:     {len(unemployment)} observations\")  \n",
    "print(f\"✅ CPI:              {len(cpi)} observations\")\n",
    "print(f\"✅ Fed Funds Rate:   {len(fed_funds)} observations\")\n",
    "print(f\"✅ 10-Yr Treasury:   {len(treasury_10y)} observations\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6b79e46",
   "metadata": {},
   "source": [
    "### 🔍 Quick Look: What Did We Just Get?\n",
    "\n",
    "Let's inspect the data. Notice these are **pandas Series** with datetime indices — perfect for time-series analysis.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4c16dc2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's look at what a FRED series looks like\n",
    "print(\"Type:\", type(gdp))\n",
    "print(\"\\nFirst 5 values:\")\n",
    "print(gdp.head())\n",
    "print(\"\\nLast 5 values:\")\n",
    "print(gdp.tail())\n",
    "print(f\"\\nDate range: {gdp.index[0].strftime('%B %Y')} to {gdp.index[-1].strftime('%B %Y')}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40f85ec3",
   "metadata": {},
   "source": [
    "---\n",
    "## 📈 Section 2: Visualizing Economic Trends\n",
    "\n",
    "Let's start with the question every American cares about:\n",
    "\n",
    "> **\"How has the economy been doing?\"**\n",
    "\n",
    "We'll answer this with data, not opinions.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f41df8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# PLOT 1: GDP Over Time (with recession shading!)\n",
    "# -------------------------------------------------\n",
    "\n",
    "# Get recession dates from FRED (this is a binary 0/1 series)\n",
    "recessions = fred.get_series('USREC', observation_start='2000-01-01')\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(14, 6))\n",
    "\n",
    "# Plot GDP\n",
    "ax.plot(gdp.index, gdp.values / 1000, color='#2196F3', linewidth=2.5, label='US GDP')\n",
    "\n",
    "# Shade recession periods — this is the magic trick!\n",
    "# We find where recessions == 1 and shade those periods gray\n",
    "recession_starts = recessions[recessions.diff() == 1].index\n",
    "recession_ends = recessions[recessions.diff() == -1].index\n",
    "\n",
    "# Handle edge case: if we start in a recession\n",
    "if recessions.iloc[0] == 1:\n",
    "    recession_starts = recession_starts.insert(0, recessions.index[0])\n",
    "# Handle edge case: if we end in a recession    \n",
    "if recessions.iloc[-1] == 1:\n",
    "    recession_ends = recession_ends.append(pd.DatetimeIndex([recessions.index[-1]]))\n",
    "\n",
    "for start, end in zip(recession_starts, recession_ends):\n",
    "    ax.axvspan(start, end, alpha=0.15, color='gray', label='_nolegend_')\n",
    "\n",
    "# Add a single recession label for the legend\n",
    "ax.axvspan(pd.Timestamp('1900-01-01'), pd.Timestamp('1900-01-02'), \n",
    "           alpha=0.15, color='gray', label='Recession')\n",
    "\n",
    "ax.set_title('US Gross Domestic Product (2000–Present)', fontsize=16, fontweight='bold')\n",
    "ax.set_ylabel('GDP (Trillions $)', fontsize=13)\n",
    "ax.set_xlabel('')\n",
    "ax.legend(fontsize=12)\n",
    "ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}T'))\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(\"\\n💡 Notice how GDP dips during the gray recession bands.\")\n",
    "print(\"   The 2008 financial crisis and 2020 COVID shock are clearly visible.\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ff26771",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# PLOT 2: The Big Four — Economic Dashboard\n",
    "# -------------------------------------------------\n",
    "\n",
    "fig, axes = plt.subplots(2, 2, figsize=(16, 10))\n",
    "fig.suptitle('US Economic Dashboard (2000–Present)', fontsize=18, fontweight='bold', y=1.02)\n",
    "\n",
    "# GDP Growth Rate (quarter-over-quarter, annualized)\n",
    "gdp_growth = gdp.pct_change() * 4 * 100  # Annualized quarterly growth\n",
    "axes[0,0].plot(gdp_growth.index, gdp_growth.values, color='#2196F3', linewidth=1.5)\n",
    "axes[0,0].axhline(y=0, color='black', linewidth=0.8, linestyle='--')\n",
    "axes[0,0].fill_between(gdp_growth.index, 0, gdp_growth.values, \n",
    "                        where=gdp_growth.values >= 0, alpha=0.3, color='green')\n",
    "axes[0,0].fill_between(gdp_growth.index, 0, gdp_growth.values, \n",
    "                        where=gdp_growth.values < 0, alpha=0.3, color='red')\n",
    "axes[0,0].set_title('GDP Growth Rate (Annualized %)', fontsize=13)\n",
    "axes[0,0].set_ylabel('%')\n",
    "\n",
    "# Unemployment\n",
    "axes[0,1].plot(unemployment.index, unemployment.values, color='#FF5722', linewidth=1.5)\n",
    "axes[0,1].fill_between(unemployment.index, unemployment.values, alpha=0.2, color='#FF5722')\n",
    "axes[0,1].set_title('Unemployment Rate (%)', fontsize=13)\n",
    "axes[0,1].set_ylabel('%')\n",
    "\n",
    "# Inflation (Year-over-year CPI change)\n",
    "inflation = cpi.pct_change(12) * 100  # Year-over-year % change\n",
    "axes[1,0].plot(inflation.index, inflation.values, color='#E91E63', linewidth=1.5)\n",
    "axes[1,0].axhline(y=2, color='green', linewidth=1.5, linestyle='--', label='Fed 2% Target')\n",
    "axes[1,0].set_title('Inflation Rate (Year-over-Year %)', fontsize=13)\n",
    "axes[1,0].set_ylabel('%')\n",
    "axes[1,0].legend()\n",
    "\n",
    "# Interest Rates\n",
    "axes[1,1].plot(fed_funds.index, fed_funds.values, color='#9C27B0', linewidth=1.5, label='Fed Funds Rate')\n",
    "axes[1,1].plot(treasury_10y.index, treasury_10y.values, color='#FF9800', linewidth=1, alpha=0.7, label='10-Yr Treasury')\n",
    "axes[1,1].set_title('Interest Rates (%)', fontsize=13)\n",
    "axes[1,1].set_ylabel('%')\n",
    "axes[1,1].legend()\n",
    "\n",
    "for ax in axes.flat:\n",
    "    ax.tick_params(axis='x', rotation=30)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac0a3673",
   "metadata": {},
   "source": [
    "---\n",
    "## 🌍 Section 3: Going Global — World Bank API\n",
    "\n",
    "The World Bank API gives us data for **every country on Earth**. No API key needed!\n",
    "\n",
    "Let's compare economic indicators across countries.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7926fe8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import wbgapi as wb\n",
    "\n",
    "# Let's see what's available\n",
    "print(\"🌍 The World Bank has data organized by topics:\\n\")\n",
    "for topic in wb.topic.list():\n",
    "    print(f\"   {topic['id']:>3}. {topic['value']}\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb19ea1c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# Pull GDP per capita for interesting countries\n",
    "# -------------------------------------------------\n",
    "\n",
    "# Pick countries to compare (ISO country codes)\n",
    "countries = ['USA', 'CHN', 'DEU', 'JPN', 'BRA', 'IND', 'NGA', 'GBR']\n",
    "country_names = {\n",
    "    'USA': 'United States', 'CHN': 'China', 'DEU': 'Germany', \n",
    "    'JPN': 'Japan', 'BRA': 'Brazil', 'IND': 'India', \n",
    "    'NGA': 'Nigeria', 'GBR': 'United Kingdom'\n",
    "}\n",
    "\n",
    "print(\"📡 Fetching GDP per capita from the World Bank...\\n\")\n",
    "\n",
    "# NY.GDP.PCAP.CD = GDP per capita (current US$)\n",
    "gdp_pc = wb.data.DataFrame(\n",
    "    'NY.GDP.PCAP.CD', \n",
    "    economy=countries, \n",
    "    time=range(2000, 2024)\n",
    ")\n",
    "\n",
    "# Clean up the dataframe\n",
    "# The wbgapi output has countries as rows and 'YR20XX' columns\n",
    "# Drop any non-numeric label columns if they exist (varies by version)\n",
    "label_cols = [c for c in gdp_pc.columns if not c.startswith('YR')]\n",
    "if label_cols:\n",
    "    gdp_pc = gdp_pc.drop(columns=label_cols)\n",
    "\n",
    "# Transpose so years are rows and countries are columns\n",
    "gdp_pc = gdp_pc.T\n",
    "gdp_pc.index = gdp_pc.index.str.replace('YR', '').astype(int)\n",
    "gdp_pc.index.name = 'Year'\n",
    "gdp_pc = gdp_pc.sort_index()\n",
    "\n",
    "print(\"✅ Data retrieved! Shape:\", gdp_pc.shape)\n",
    "print(\"\\nPreview:\")\n",
    "gdp_pc.tail()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "671da325",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# PLOT 3: GDP Per Capita — Country Comparison\n",
    "# -------------------------------------------------\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(14, 7))\n",
    "\n",
    "colors = ['#2196F3', '#F44336', '#4CAF50', '#FF9800', '#9C27B0', '#00BCD4', '#795548', '#607D8B']\n",
    "\n",
    "for i, country in enumerate(countries):\n",
    "    if country in gdp_pc.columns:\n",
    "        ax.plot(gdp_pc.index, gdp_pc[country], \n",
    "                linewidth=2.5, label=country_names[country], color=colors[i],\n",
    "                marker='o', markersize=3)\n",
    "\n",
    "ax.set_title('GDP Per Capita by Country (2000–2023)', fontsize=16, fontweight='bold')\n",
    "ax.set_ylabel('GDP Per Capita (Current US$)', fontsize=13)\n",
    "ax.set_xlabel('')\n",
    "ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))\n",
    "ax.legend(fontsize=11, loc='upper left')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(\"\\n💡 Discussion question: Why is the gap between rich and poor countries\")\n",
    "print(\"   growing in absolute terms but shrinking in percentage terms?\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "357e6bac",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# Pull MORE World Bank indicators for deeper analysis\n",
    "# -------------------------------------------------\n",
    "\n",
    "def clean_wb_dataframe(indicator, economies, years):\n",
    "    \"\"\"Helper function to pull and clean World Bank data.\"\"\"\n",
    "    try:\n",
    "        df = wb.data.DataFrame(indicator, economy=economies, time=years)\n",
    "        # Drop any non-YR columns (labels that vary by wbgapi version)\n",
    "        label_cols = [c for c in df.columns if not c.startswith('YR')]\n",
    "        if label_cols:\n",
    "            df = df.drop(columns=label_cols)\n",
    "        df = df.T\n",
    "        df.index = df.index.str.replace('YR', '').astype(int)\n",
    "        df.index.name = 'Year'\n",
    "        return df.sort_index()\n",
    "    except Exception as e:\n",
    "        print(f\"   ⚠️ Could not fetch {indicator}: {e}\")\n",
    "        return None\n",
    "\n",
    "print(\"📡 Fetching additional indicators...\\n\")\n",
    "\n",
    "# Life expectancy at birth\n",
    "life_exp = clean_wb_dataframe('SP.DYN.LE00.IN', countries, range(2000, 2023))\n",
    "\n",
    "# Internet users (% of population)\n",
    "internet = clean_wb_dataframe('IT.NET.USER.ZS', countries, range(2000, 2023))\n",
    "\n",
    "# Population growth (annual %)\n",
    "pop_growth = clean_wb_dataframe('SP.POP.GROW', countries, range(2000, 2023))\n",
    "\n",
    "for name, df in [(\"Life expectancy\", life_exp), (\"Internet usage\", internet), \n",
    "                  (\"Population growth\", pop_growth)]:\n",
    "    if df is not None:\n",
    "        print(f\"✅ {name} data shape: {df.shape}\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "491a513a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# PLOT 4: Does Money Buy Happiness (or at least health)?\n",
    "# GDP per capita vs Life Expectancy — a \"Gapminder\" style plot\n",
    "# -------------------------------------------------\n",
    "\n",
    "if life_exp is not None:\n",
    "    # Get the most recent year with data for both indicators\n",
    "    latest_year_gdp = gdp_pc.dropna(how='all').index[-1]\n",
    "    latest_year_life = life_exp.dropna(how='all').index[-1]\n",
    "    year = min(latest_year_gdp, latest_year_life)\n",
    "\n",
    "    fig, ax = plt.subplots(figsize=(12, 8))\n",
    "\n",
    "    for i, country in enumerate(countries):\n",
    "        try:\n",
    "            x_val = gdp_pc.loc[year, country]\n",
    "            y_val = life_exp.loc[year, country]\n",
    "            if pd.notna(x_val) and pd.notna(y_val):\n",
    "                ax.scatter(x_val, y_val, s=200, color=colors[i], \n",
    "                          edgecolors='white', linewidth=2, zorder=5)\n",
    "                ax.annotate(country_names[country], (x_val, y_val),\n",
    "                           textcoords=\"offset points\", xytext=(10, 10),\n",
    "                           fontsize=11, fontweight='bold', color=colors[i])\n",
    "        except (KeyError, IndexError):\n",
    "            pass\n",
    "\n",
    "    ax.set_title(f'Does Wealth Buy Health? ({year})', fontsize=16, fontweight='bold')\n",
    "    ax.set_xlabel('GDP Per Capita (US$)', fontsize=13)\n",
    "    ax.set_ylabel('Life Expectancy (Years)', fontsize=13)\n",
    "    ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))\n",
    "\n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "\n",
    "    print(f\"\\n💡 This is the famous 'Preston Curve' — wealthier countries tend to\")\n",
    "    print(f\"   have higher life expectancy, but with diminishing returns.\")\n",
    "else:\n",
    "    print(\"⚠️ Life expectancy data was unavailable. Skipping this plot.\")\n",
    "    print(\"   Try re-running the cell above — the World Bank API can be intermittent.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10240914",
   "metadata": {},
   "source": [
    "---\n",
    "## 📐 Section 4: Statistical Analysis\n",
    "\n",
    "Now let's move beyond visualization into actual statistical analysis. We'll test real economic hypotheses with data.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "736e2ff1",
   "metadata": {},
   "source": [
    "### 4.1 The Phillips Curve: Unemployment vs. Inflation\n",
    "\n",
    "One of the most famous relationships in economics: **when unemployment goes down, inflation tends to go up** (and vice versa).\n",
    "\n",
    "Let's see if the data supports this.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e23c82f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# Merge unemployment and inflation into one dataframe\n",
    "# -------------------------------------------------\n",
    "\n",
    "# Resample both to monthly and align\n",
    "df_phillips = pd.DataFrame({\n",
    "    'Unemployment': unemployment,\n",
    "    'Inflation': inflation\n",
    "}).dropna()\n",
    "\n",
    "print(f\"Merged dataset: {len(df_phillips)} monthly observations\")\n",
    "print(f\"Date range: {df_phillips.index[0].strftime('%B %Y')} to {df_phillips.index[-1].strftime('%B %Y')}\")\n",
    "print(f\"\\nBasic statistics:\")\n",
    "print(df_phillips.describe().round(2))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5433bd57",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# PLOT 5: Phillips Curve — Scatter with Regression\n",
    "# -------------------------------------------------\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(16, 7))\n",
    "\n",
    "# Left panel: Time series\n",
    "axes[0].plot(df_phillips.index, df_phillips['Unemployment'], \n",
    "             color='#FF5722', linewidth=1.5, label='Unemployment %')\n",
    "axes[0].plot(df_phillips.index, df_phillips['Inflation'], \n",
    "             color='#2196F3', linewidth=1.5, label='Inflation %')\n",
    "axes[0].axhline(y=2, color='green', linewidth=1, linestyle='--', alpha=0.5, label='2% Target')\n",
    "axes[0].set_title('Unemployment & Inflation Over Time', fontsize=14, fontweight='bold')\n",
    "axes[0].set_ylabel('%')\n",
    "axes[0].legend(fontsize=11)\n",
    "\n",
    "# Right panel: Scatter plot with regression line\n",
    "x = df_phillips['Unemployment'].values\n",
    "y = df_phillips['Inflation'].values\n",
    "\n",
    "# Linear regression\n",
    "slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)\n",
    "\n",
    "axes[1].scatter(x, y, alpha=0.4, s=30, color='#9C27B0', edgecolors='white')\n",
    "\n",
    "# Regression line\n",
    "x_line = np.linspace(x.min(), x.max(), 100)\n",
    "y_line = slope * x_line + intercept\n",
    "axes[1].plot(x_line, y_line, color='red', linewidth=2.5, \n",
    "             label=f'y = {slope:.2f}x + {intercept:.2f}')\n",
    "\n",
    "axes[1].set_title('Phillips Curve: Does It Hold?', fontsize=14, fontweight='bold')\n",
    "axes[1].set_xlabel('Unemployment Rate (%)', fontsize=13)\n",
    "axes[1].set_ylabel('Inflation Rate (%)', fontsize=13)\n",
    "\n",
    "# Add stats annotation\n",
    "stats_text = f'R² = {r_value**2:.3f}\\np-value = {p_value:.4f}\\nn = {len(x)}'\n",
    "axes[1].annotate(stats_text, xy=(0.05, 0.95), xycoords='axes fraction',\n",
    "                fontsize=12, verticalalignment='top',\n",
    "                bbox=dict(boxstyle='round,pad=0.5', facecolor='lightyellow', alpha=0.8))\n",
    "axes[1].legend(fontsize=11)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(f\"\\n📊 Regression Results:\")\n",
    "print(f\"   Slope:     {slope:.4f} (for every 1% increase in unemployment,\")\n",
    "print(f\"              inflation changes by {slope:.2f} percentage points)\")\n",
    "print(f\"   R-squared: {r_value**2:.4f}\")\n",
    "print(f\"   P-value:   {p_value:.6f}\")\n",
    "print(f\"\\n💡 Is the Phillips Curve alive? An R² of {r_value**2:.3f} suggests\")\n",
    "print(f\"   {'a meaningful' if r_value**2 > 0.1 else 'a weak'} relationship in the modern era.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b530cd42",
   "metadata": {},
   "source": [
    "### 4.2 Correlation Matrix: How Do Economic Indicators Move Together?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d227363c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# Build a comprehensive monthly indicator dataframe\n",
    "# -------------------------------------------------\n",
    "\n",
    "econ_df = pd.DataFrame({\n",
    "    'Unemployment': unemployment,\n",
    "    'Inflation': inflation,\n",
    "    'Fed Funds Rate': fed_funds,\n",
    "    '10-Yr Treasury': treasury_10y,\n",
    "    'GDP Growth': gdp.pct_change() * 100\n",
    "}).dropna()\n",
    "\n",
    "print(f\"Combined dataset: {len(econ_df)} observations, {econ_df.shape[1]} variables\\n\")\n",
    "\n",
    "# Correlation matrix\n",
    "corr_matrix = econ_df.corr()\n",
    "\n",
    "# -------------------------------------------------\n",
    "# PLOT 6: Correlation Heatmap\n",
    "# -------------------------------------------------\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10, 8))\n",
    "\n",
    "mask = np.triu(np.ones_like(corr_matrix, dtype=bool), k=1)  # Upper triangle mask\n",
    "sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='RdBu_r', center=0,\n",
    "            square=True, linewidths=2, linecolor='white',\n",
    "            annot_kws={'fontsize': 13, 'fontweight': 'bold'},\n",
    "            vmin=-1, vmax=1, ax=ax)\n",
    "\n",
    "ax.set_title('Correlation Matrix: US Economic Indicators', fontsize=16, fontweight='bold')\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(\"\\n💡 Key observations:\")\n",
    "print(\"   • Positive correlation = they move together\")\n",
    "print(\"   • Negative correlation = they move in opposite directions\")\n",
    "print(\"   • Closer to ±1 = stronger relationship\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25ee40f7",
   "metadata": {},
   "source": [
    "### 4.3 Distribution Analysis: Is Economic Growth \"Normal\"?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de8ecb78",
   "metadata": {},
   "outputs": [],
   "source": [
    "# -------------------------------------------------\n",
    "# PLOT 7: Distribution of GDP Growth Rates\n",
    "# -------------------------------------------------\n",
    "\n",
    "gdp_growth_clean = (gdp.pct_change() * 4 * 100).dropna()  # Annualized quarterly\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(16, 6))\n",
    "\n",
    "# Histogram with KDE\n",
    "axes[0].hist(gdp_growth_clean.values, bins=30, density=True, alpha=0.6, \n",
    "             color='#2196F3', edgecolor='white', linewidth=1.2)\n",
    "gdp_growth_clean.plot.kde(ax=axes[0], color='#F44336', linewidth=2.5, label='KDE')\n",
    "axes[0].axvline(x=gdp_growth_clean.mean(), color='green', linewidth=2, \n",
    "                linestyle='--', label=f'Mean: {gdp_growth_clean.mean():.1f}%')\n",
    "axes[0].axvline(x=0, color='black', linewidth=1, linestyle='-', alpha=0.5)\n",
    "axes[0].set_title('Distribution of US GDP Growth Rates', fontsize=14, fontweight='bold')\n",
    "axes[0].set_xlabel('Annualized Growth Rate (%)')\n",
    "axes[0].set_ylabel('Density')\n",
    "axes[0].legend(fontsize=11)\n",
    "\n",
    "# QQ Plot — tests if the distribution is normal\n",
    "stats.probplot(gdp_growth_clean.values, dist=\"norm\", plot=axes[1])\n",
    "axes[1].set_title('Q-Q Plot: Is GDP Growth Normal?', fontsize=14, fontweight='bold')\n",
    "axes[1].get_lines()[0].set(color='#2196F3', markersize=5)\n",
    "axes[1].get_lines()[1].set(color='red', linewidth=2)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Formal normality test\n",
    "shapiro_stat, shapiro_p = stats.shapiro(gdp_growth_clean.values[-100:])  # Last 100 obs\n",
    "\n",
    "print(f\"\\n📊 Distribution Statistics:\")\n",
    "print(f\"   Mean:       {gdp_growth_clean.mean():.2f}%\")\n",
    "print(f\"   Median:     {gdp_growth_clean.median():.2f}%\")\n",
    "print(f\"   Std Dev:    {gdp_growth_clean.std():.2f}%\")\n",
    "print(f\"   Skewness:   {gdp_growth_clean.skew():.3f} {'(left-skewed ⬅️)' if gdp_growth_clean.skew() < 0 else '(right-skewed ➡️)'}\")\n",
    "print(f\"   Kurtosis:   {gdp_growth_clean.kurtosis():.3f} {'(heavy tails — more extreme events!)' if gdp_growth_clean.kurtosis() > 0 else '(light tails)'}\")\n",
    "print(f\"\\n   Shapiro-Wilk test: W={shapiro_stat:.4f}, p={shapiro_p:.4f}\")\n",
    "print(f\"   {'   → Distribution is NOT normal (p < 0.05)' if shapiro_p < 0.05 else '   → Distribution appears normal (p >= 0.05)'}\")\n",
    "print(f\"\\n💡 The skewness and heavy tails tell us that economic downturns\")\n",
    "print(f\"   tend to be sharper than expansions — recessions hit fast!\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb32f4b2",
   "metadata": {},
   "source": [
    "---\n",
    "## 🎯 Section 5: Your Turn! (Practice Exercises)\n",
    "\n",
    "Now it's your turn. Try these exercises using the skills you've learned above.\n",
    "\n",
    "### Exercise 1: Housing Market\n",
    "Pull the **S&P/Case-Shiller Home Price Index** from FRED (series ID: `CSUSHPINSA`) and plot it from 2000 to present. Can you spot the 2008 housing bubble?\n",
    "\n",
    "### Exercise 2: Global Internet Adoption\n",
    "Using the World Bank API, create a line chart showing internet usage (% of population) for 5 countries of your choice. Which country had the fastest adoption?\n",
    "\n",
    "### Exercise 3: Hypothesis Test\n",
    "Is the average inflation rate in the 2010s (2010–2019) significantly different from the 2020s (2020–present)? Use a **t-test** to find out. (Hint: `stats.ttest_ind()`)\n",
    "\n",
    "### Exercise 4: Build Your Own Dashboard\n",
    "Choose 3 FRED indicators that interest you and create a dashboard similar to the one in Section 2. Add recession shading for bonus points!\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6b9394f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 1: Housing Market — Starter Code\n",
    "# =============================================\n",
    "\n",
    "# housing = fred.get_series('CSUSHPINSA', observation_start='2000-01-01')\n",
    "# \n",
    "# fig, ax = plt.subplots(figsize=(14, 6))\n",
    "# ax.plot(housing.index, housing.values)\n",
    "# ax.set_title('S&P/Case-Shiller Home Price Index')\n",
    "# plt.show()\n",
    "\n",
    "# YOUR CODE HERE 👇\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c09add59",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 2: Global Internet Adoption — Starter Code\n",
    "# ====================================================\n",
    "\n",
    "# my_countries = ['USA', 'KOR', 'KEN', 'BRA', 'CHN']  # Choose your own!\n",
    "# \n",
    "# internet_data = wb.data.DataFrame(\n",
    "#     'IT.NET.USER.ZS',\n",
    "#     economy=my_countries,\n",
    "#     time=range(2000, 2023),\n",
    "#     labels=True\n",
    "# )\n",
    "\n",
    "# YOUR CODE HERE 👇\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a455059",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 3: Hypothesis Test — Starter Code\n",
    "# =============================================\n",
    "\n",
    "# inflation_2010s = inflation['2010':'2019'].dropna()\n",
    "# inflation_2020s = inflation['2020':].dropna()\n",
    "# \n",
    "# t_stat, p_value = stats.ttest_ind(inflation_2010s, inflation_2020s)\n",
    "# print(f\"T-statistic: {t_stat:.4f}\")\n",
    "# print(f\"P-value: {p_value:.6f}\")\n",
    "\n",
    "# YOUR CODE HERE 👇\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b43c6c01",
   "metadata": {},
   "source": [
    "---\n",
    "## 📚 Section 6: Quick Reference\n",
    "\n",
    "### Useful FRED Series IDs\n",
    "| Series ID | Description |\n",
    "|-----------|-------------|\n",
    "| `GDP` | Gross Domestic Product |\n",
    "| `UNRATE` | Unemployment Rate |\n",
    "| `CPIAUCSL` | Consumer Price Index (All Urban) |\n",
    "| `FEDFUNDS` | Federal Funds Effective Rate |\n",
    "| `DGS10` | 10-Year Treasury Yield |\n",
    "| `CSUSHPINSA` | Case-Shiller Home Price Index |\n",
    "| `UMCSENT` | Consumer Sentiment |\n",
    "| `PAYEMS` | Total Nonfarm Payrolls |\n",
    "| `M2SL` | M2 Money Supply |\n",
    "| `USREC` | Recession Indicator (0/1) |\n",
    "\n",
    "### Useful World Bank Indicator Codes\n",
    "| Code | Description |\n",
    "|------|-------------|\n",
    "| `NY.GDP.PCAP.CD` | GDP per capita (current US$) |\n",
    "| `SP.DYN.LE00.IN` | Life expectancy at birth |\n",
    "| `IT.NET.USER.ZS` | Internet users (% of pop.) |\n",
    "| `EN.ATM.CO2E.PC` | CO2 emissions per capita |\n",
    "| `SE.ADT.LITR.ZS` | Adult literacy rate |\n",
    "| `SL.UEM.TOTL.ZS` | Unemployment (ILO estimate) |\n",
    "\n",
    "### Key `scipy.stats` Functions\n",
    "| Function | Purpose |\n",
    "|----------|---------|\n",
    "| `linregress(x, y)` | Simple linear regression |\n",
    "| `pearsonr(x, y)` | Pearson correlation + p-value |\n",
    "| `ttest_ind(a, b)` | Independent samples t-test |\n",
    "| `shapiro(x)` | Shapiro-Wilk normality test |\n",
    "\n",
    "---\n",
    "\n",
    "*Notebook created for DATA 110 — Introduction to Data Science*  \n",
    "*APIs used: FRED (Federal Reserve) & World Bank*  \n",
    "*Have questions? Check: https://fred.stlouisfed.org and https://data.worldbank.org*\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c52722a-27f6-4c54-9dcc-f6804727d6fa",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}