{ "cells": [ { "cell_type": "markdown", "id": "a598f44c", "metadata": {}, "source": [ "# 🏦 Economic Data Explorer: Fed & World Bank APIs in Python\n", "\n", "## FINC 332 β€” Introduction to Data Analytics, Data Mining, and Data Visualization\n", "### Developed by Dr. Benyawarath \"Yaa\" Nithithanatchinnapat\n", "**Learning Objectives:**\n", "- Connect to real-world economic data APIs (FRED & World Bank)\n", "- Clean, transform, and merge time-series data with `pandas`\n", "- Create compelling visualizations with `matplotlib` and `seaborn`\n", "- Perform basic statistical analysis (correlation, regression, distributions)\n", "\n", "**Why this matters:** Economists, analysts, and data scientists don't work with pre-cleaned CSVs β€” they pull live data from institutional APIs. Today you'll do exactly that.\n", "\n", "---\n" ] }, { "cell_type": "markdown", "id": "e37ee5d5", "metadata": {}, "source": [ "## πŸ”§ Section 0: Setup & Installations\n", "\n", "Run this cell first to make sure all packages are installed.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "33c335c0", "metadata": {}, "outputs": [], "source": [ "# Install required packages (this may take a moment the first time)\n", "!pip install fredapi wbgapi pandas matplotlib seaborn scipy -q\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import matplotlib.dates as mdates\n", "import seaborn as sns\n", "from scipy import stats\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "# Make our plots look nice\n", "sns.set_style(\"whitegrid\")\n", "plt.rcParams['figure.figsize'] = (12, 6)\n", "plt.rcParams['font.size'] = 12\n", "\n", "print(\"βœ… All packages loaded successfully!\")\n" ] }, { "cell_type": "markdown", "id": "1db13fb9", "metadata": {}, "source": [ "---\n", "## πŸ“Š Section 1: Getting Data from the Federal Reserve (FRED)\n", "\n", "**FRED** (Federal Reserve Economic Data) is the gold standard for US economic data. It has over 800,000 data series β€” everything from GDP to avocado prices.\n", "\n", "### Step 1: Get Your Free API Key\n", "1. Go to https://fred.stlouisfed.org/docs/api/api_key.html\n", "2. Create a free account\n", "3. Copy your API key\n", "\n", "> **πŸ’‘ Pro tip:** Never hardcode API keys in your notebook! Use environment variables or a separate config file. For today's class demo, we'll use a simple variable β€” but in production, use `os.environ` or a `.env` file.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f9b15c70", "metadata": {}, "outputs": [], "source": [ "# ============================================\n", "# πŸ”‘ PASTE YOUR FRED API KEY HERE\n", "# ============================================\n", "FRED_API_KEY = \"YOUR FRED API KEY\"\n", "\n", "# In production, you'd do this instead:\n", "# import os\n", "# FRED_API_KEY = os.environ.get(\"FRED_API_KEY\")\n", "\n", "# For the sake of time, here's my API key: e85280ebf7739fb5396c6a583c6a26b5" ] }, { "cell_type": "code", "execution_count": null, "id": "4fdf8304", "metadata": {}, "outputs": [], "source": [ "from fredapi import Fred\n", "\n", "fred = Fred(api_key=FRED_API_KEY)\n", "\n", "# Let's grab some key economic indicators\n", "# Each series has a unique ID β€” you can search at https://fred.stlouisfed.org\n", "\n", "print(\"πŸ“‘ Fetching data from the Federal Reserve...\")\n", "print(\" This is a LIVE API call β€” you're pulling real data right now!\\n\")\n", "\n", "# GDP (Quarterly, Seasonally Adjusted Annual Rate)\n", "gdp = fred.get_series('GDP', observation_start='2000-01-01')\n", "gdp.name = 'GDP'\n", "\n", "# Unemployment Rate (Monthly)\n", "unemployment = fred.get_series('UNRATE', observation_start='2000-01-01')\n", "unemployment.name = 'Unemployment Rate'\n", "\n", "# Consumer Price Index (Monthly) β€” our inflation measure\n", "cpi = fred.get_series('CPIAUCSL', observation_start='2000-01-01')\n", "cpi.name = 'CPI'\n", "\n", "# Federal Funds Rate (Monthly)\n", "fed_funds = fred.get_series('FEDFUNDS', observation_start='2000-01-01')\n", "fed_funds.name = 'Fed Funds Rate'\n", "\n", "# 10-Year Treasury Yield\n", "treasury_10y = fred.get_series('DGS10', observation_start='2000-01-01')\n", "treasury_10y.name = '10-Year Treasury'\n", "\n", "print(f\"βœ… GDP: {len(gdp)} observations\")\n", "print(f\"βœ… Unemployment: {len(unemployment)} observations\") \n", "print(f\"βœ… CPI: {len(cpi)} observations\")\n", "print(f\"βœ… Fed Funds Rate: {len(fed_funds)} observations\")\n", "print(f\"βœ… 10-Yr Treasury: {len(treasury_10y)} observations\")\n" ] }, { "cell_type": "markdown", "id": "a6b79e46", "metadata": {}, "source": [ "### πŸ” Quick Look: What Did We Just Get?\n", "\n", "Let's inspect the data. Notice these are **pandas Series** with datetime indices β€” perfect for time-series analysis.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a4c16dc2", "metadata": {}, "outputs": [], "source": [ "# Let's look at what a FRED series looks like\n", "print(\"Type:\", type(gdp))\n", "print(\"\\nFirst 5 values:\")\n", "print(gdp.head())\n", "print(\"\\nLast 5 values:\")\n", "print(gdp.tail())\n", "print(f\"\\nDate range: {gdp.index[0].strftime('%B %Y')} to {gdp.index[-1].strftime('%B %Y')}\")\n" ] }, { "cell_type": "markdown", "id": "40f85ec3", "metadata": {}, "source": [ "---\n", "## πŸ“ˆ Section 2: Visualizing Economic Trends\n", "\n", "Let's start with the question every American cares about:\n", "\n", "> **\"How has the economy been doing?\"**\n", "\n", "We'll answer this with data, not opinions.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9f41df8f", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# PLOT 1: GDP Over Time (with recession shading!)\n", "# -------------------------------------------------\n", "\n", "# Get recession dates from FRED (this is a binary 0/1 series)\n", "recessions = fred.get_series('USREC', observation_start='2000-01-01')\n", "\n", "fig, ax = plt.subplots(figsize=(14, 6))\n", "\n", "# Plot GDP\n", "ax.plot(gdp.index, gdp.values / 1000, color='#2196F3', linewidth=2.5, label='US GDP')\n", "\n", "# Shade recession periods β€” this is the magic trick!\n", "# We find where recessions == 1 and shade those periods gray\n", "recession_starts = recessions[recessions.diff() == 1].index\n", "recession_ends = recessions[recessions.diff() == -1].index\n", "\n", "# Handle edge case: if we start in a recession\n", "if recessions.iloc[0] == 1:\n", " recession_starts = recession_starts.insert(0, recessions.index[0])\n", "# Handle edge case: if we end in a recession \n", "if recessions.iloc[-1] == 1:\n", " recession_ends = recession_ends.append(pd.DatetimeIndex([recessions.index[-1]]))\n", "\n", "for start, end in zip(recession_starts, recession_ends):\n", " ax.axvspan(start, end, alpha=0.15, color='gray', label='_nolegend_')\n", "\n", "# Add a single recession label for the legend\n", "ax.axvspan(pd.Timestamp('1900-01-01'), pd.Timestamp('1900-01-02'), \n", " alpha=0.15, color='gray', label='Recession')\n", "\n", "ax.set_title('US Gross Domestic Product (2000–Present)', fontsize=16, fontweight='bold')\n", "ax.set_ylabel('GDP (Trillions $)', fontsize=13)\n", "ax.set_xlabel('')\n", "ax.legend(fontsize=12)\n", "ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}T'))\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(\"\\nπŸ’‘ Notice how GDP dips during the gray recession bands.\")\n", "print(\" The 2008 financial crisis and 2020 COVID shock are clearly visible.\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6ff26771", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# PLOT 2: The Big Four β€” Economic Dashboard\n", "# -------------------------------------------------\n", "\n", "fig, axes = plt.subplots(2, 2, figsize=(16, 10))\n", "fig.suptitle('US Economic Dashboard (2000–Present)', fontsize=18, fontweight='bold', y=1.02)\n", "\n", "# GDP Growth Rate (quarter-over-quarter, annualized)\n", "gdp_growth = gdp.pct_change() * 4 * 100 # Annualized quarterly growth\n", "axes[0,0].plot(gdp_growth.index, gdp_growth.values, color='#2196F3', linewidth=1.5)\n", "axes[0,0].axhline(y=0, color='black', linewidth=0.8, linestyle='--')\n", "axes[0,0].fill_between(gdp_growth.index, 0, gdp_growth.values, \n", " where=gdp_growth.values >= 0, alpha=0.3, color='green')\n", "axes[0,0].fill_between(gdp_growth.index, 0, gdp_growth.values, \n", " where=gdp_growth.values < 0, alpha=0.3, color='red')\n", "axes[0,0].set_title('GDP Growth Rate (Annualized %)', fontsize=13)\n", "axes[0,0].set_ylabel('%')\n", "\n", "# Unemployment\n", "axes[0,1].plot(unemployment.index, unemployment.values, color='#FF5722', linewidth=1.5)\n", "axes[0,1].fill_between(unemployment.index, unemployment.values, alpha=0.2, color='#FF5722')\n", "axes[0,1].set_title('Unemployment Rate (%)', fontsize=13)\n", "axes[0,1].set_ylabel('%')\n", "\n", "# Inflation (Year-over-year CPI change)\n", "inflation = cpi.pct_change(12) * 100 # Year-over-year % change\n", "axes[1,0].plot(inflation.index, inflation.values, color='#E91E63', linewidth=1.5)\n", "axes[1,0].axhline(y=2, color='green', linewidth=1.5, linestyle='--', label='Fed 2% Target')\n", "axes[1,0].set_title('Inflation Rate (Year-over-Year %)', fontsize=13)\n", "axes[1,0].set_ylabel('%')\n", "axes[1,0].legend()\n", "\n", "# Interest Rates\n", "axes[1,1].plot(fed_funds.index, fed_funds.values, color='#9C27B0', linewidth=1.5, label='Fed Funds Rate')\n", "axes[1,1].plot(treasury_10y.index, treasury_10y.values, color='#FF9800', linewidth=1, alpha=0.7, label='10-Yr Treasury')\n", "axes[1,1].set_title('Interest Rates (%)', fontsize=13)\n", "axes[1,1].set_ylabel('%')\n", "axes[1,1].legend()\n", "\n", "for ax in axes.flat:\n", " ax.tick_params(axis='x', rotation=30)\n", "\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "ac0a3673", "metadata": {}, "source": [ "---\n", "## 🌍 Section 3: Going Global β€” World Bank API\n", "\n", "The World Bank API gives us data for **every country on Earth**. No API key needed!\n", "\n", "Let's compare economic indicators across countries.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b7926fe8", "metadata": {}, "outputs": [], "source": [ "import wbgapi as wb\n", "\n", "# Let's see what's available\n", "print(\"🌍 The World Bank has data organized by topics:\\n\")\n", "for topic in wb.topic.list():\n", " print(f\" {topic['id']:>3}. {topic['value']}\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "fb19ea1c", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# Pull GDP per capita for interesting countries\n", "# -------------------------------------------------\n", "\n", "# Pick countries to compare (ISO country codes)\n", "countries = ['USA', 'CHN', 'DEU', 'JPN', 'BRA', 'IND', 'NGA', 'GBR']\n", "country_names = {\n", " 'USA': 'United States', 'CHN': 'China', 'DEU': 'Germany', \n", " 'JPN': 'Japan', 'BRA': 'Brazil', 'IND': 'India', \n", " 'NGA': 'Nigeria', 'GBR': 'United Kingdom'\n", "}\n", "\n", "print(\"πŸ“‘ Fetching GDP per capita from the World Bank...\\n\")\n", "\n", "# NY.GDP.PCAP.CD = GDP per capita (current US$)\n", "gdp_pc = wb.data.DataFrame(\n", " 'NY.GDP.PCAP.CD', \n", " economy=countries, \n", " time=range(2000, 2024)\n", ")\n", "\n", "# Clean up the dataframe\n", "# The wbgapi output has countries as rows and 'YR20XX' columns\n", "# Drop any non-numeric label columns if they exist (varies by version)\n", "label_cols = [c for c in gdp_pc.columns if not c.startswith('YR')]\n", "if label_cols:\n", " gdp_pc = gdp_pc.drop(columns=label_cols)\n", "\n", "# Transpose so years are rows and countries are columns\n", "gdp_pc = gdp_pc.T\n", "gdp_pc.index = gdp_pc.index.str.replace('YR', '').astype(int)\n", "gdp_pc.index.name = 'Year'\n", "gdp_pc = gdp_pc.sort_index()\n", "\n", "print(\"βœ… Data retrieved! Shape:\", gdp_pc.shape)\n", "print(\"\\nPreview:\")\n", "gdp_pc.tail()\n" ] }, { "cell_type": "code", "execution_count": null, "id": "671da325", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# PLOT 3: GDP Per Capita β€” Country Comparison\n", "# -------------------------------------------------\n", "\n", "fig, ax = plt.subplots(figsize=(14, 7))\n", "\n", "colors = ['#2196F3', '#F44336', '#4CAF50', '#FF9800', '#9C27B0', '#00BCD4', '#795548', '#607D8B']\n", "\n", "for i, country in enumerate(countries):\n", " if country in gdp_pc.columns:\n", " ax.plot(gdp_pc.index, gdp_pc[country], \n", " linewidth=2.5, label=country_names[country], color=colors[i],\n", " marker='o', markersize=3)\n", "\n", "ax.set_title('GDP Per Capita by Country (2000–2023)', fontsize=16, fontweight='bold')\n", "ax.set_ylabel('GDP Per Capita (Current US$)', fontsize=13)\n", "ax.set_xlabel('')\n", "ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))\n", "ax.legend(fontsize=11, loc='upper left')\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(\"\\nπŸ’‘ Discussion question: Why is the gap between rich and poor countries\")\n", "print(\" growing in absolute terms but shrinking in percentage terms?\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "357e6bac", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# Pull MORE World Bank indicators for deeper analysis\n", "# -------------------------------------------------\n", "\n", "def clean_wb_dataframe(indicator, economies, years):\n", " \"\"\"Helper function to pull and clean World Bank data.\"\"\"\n", " try:\n", " df = wb.data.DataFrame(indicator, economy=economies, time=years)\n", " # Drop any non-YR columns (labels that vary by wbgapi version)\n", " label_cols = [c for c in df.columns if not c.startswith('YR')]\n", " if label_cols:\n", " df = df.drop(columns=label_cols)\n", " df = df.T\n", " df.index = df.index.str.replace('YR', '').astype(int)\n", " df.index.name = 'Year'\n", " return df.sort_index()\n", " except Exception as e:\n", " print(f\" ⚠️ Could not fetch {indicator}: {e}\")\n", " return None\n", "\n", "print(\"πŸ“‘ Fetching additional indicators...\\n\")\n", "\n", "# Life expectancy at birth\n", "life_exp = clean_wb_dataframe('SP.DYN.LE00.IN', countries, range(2000, 2023))\n", "\n", "# Internet users (% of population)\n", "internet = clean_wb_dataframe('IT.NET.USER.ZS', countries, range(2000, 2023))\n", "\n", "# Population growth (annual %)\n", "pop_growth = clean_wb_dataframe('SP.POP.GROW', countries, range(2000, 2023))\n", "\n", "for name, df in [(\"Life expectancy\", life_exp), (\"Internet usage\", internet), \n", " (\"Population growth\", pop_growth)]:\n", " if df is not None:\n", " print(f\"βœ… {name} data shape: {df.shape}\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "491a513a", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# PLOT 4: Does Money Buy Happiness (or at least health)?\n", "# GDP per capita vs Life Expectancy β€” a \"Gapminder\" style plot\n", "# -------------------------------------------------\n", "\n", "if life_exp is not None:\n", " # Get the most recent year with data for both indicators\n", " latest_year_gdp = gdp_pc.dropna(how='all').index[-1]\n", " latest_year_life = life_exp.dropna(how='all').index[-1]\n", " year = min(latest_year_gdp, latest_year_life)\n", "\n", " fig, ax = plt.subplots(figsize=(12, 8))\n", "\n", " for i, country in enumerate(countries):\n", " try:\n", " x_val = gdp_pc.loc[year, country]\n", " y_val = life_exp.loc[year, country]\n", " if pd.notna(x_val) and pd.notna(y_val):\n", " ax.scatter(x_val, y_val, s=200, color=colors[i], \n", " edgecolors='white', linewidth=2, zorder=5)\n", " ax.annotate(country_names[country], (x_val, y_val),\n", " textcoords=\"offset points\", xytext=(10, 10),\n", " fontsize=11, fontweight='bold', color=colors[i])\n", " except (KeyError, IndexError):\n", " pass\n", "\n", " ax.set_title(f'Does Wealth Buy Health? ({year})', fontsize=16, fontweight='bold')\n", " ax.set_xlabel('GDP Per Capita (US$)', fontsize=13)\n", " ax.set_ylabel('Life Expectancy (Years)', fontsize=13)\n", " ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))\n", "\n", " plt.tight_layout()\n", " plt.show()\n", "\n", " print(f\"\\nπŸ’‘ This is the famous 'Preston Curve' β€” wealthier countries tend to\")\n", " print(f\" have higher life expectancy, but with diminishing returns.\")\n", "else:\n", " print(\"⚠️ Life expectancy data was unavailable. Skipping this plot.\")\n", " print(\" Try re-running the cell above β€” the World Bank API can be intermittent.\")\n" ] }, { "cell_type": "markdown", "id": "10240914", "metadata": {}, "source": [ "---\n", "## πŸ“ Section 4: Statistical Analysis\n", "\n", "Now let's move beyond visualization into actual statistical analysis. We'll test real economic hypotheses with data.\n" ] }, { "cell_type": "markdown", "id": "736e2ff1", "metadata": {}, "source": [ "### 4.1 The Phillips Curve: Unemployment vs. Inflation\n", "\n", "One of the most famous relationships in economics: **when unemployment goes down, inflation tends to go up** (and vice versa).\n", "\n", "Let's see if the data supports this.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "8e23c82f", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# Merge unemployment and inflation into one dataframe\n", "# -------------------------------------------------\n", "\n", "# Resample both to monthly and align\n", "df_phillips = pd.DataFrame({\n", " 'Unemployment': unemployment,\n", " 'Inflation': inflation\n", "}).dropna()\n", "\n", "print(f\"Merged dataset: {len(df_phillips)} monthly observations\")\n", "print(f\"Date range: {df_phillips.index[0].strftime('%B %Y')} to {df_phillips.index[-1].strftime('%B %Y')}\")\n", "print(f\"\\nBasic statistics:\")\n", "print(df_phillips.describe().round(2))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5433bd57", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# PLOT 5: Phillips Curve β€” Scatter with Regression\n", "# -------------------------------------------------\n", "\n", "fig, axes = plt.subplots(1, 2, figsize=(16, 7))\n", "\n", "# Left panel: Time series\n", "axes[0].plot(df_phillips.index, df_phillips['Unemployment'], \n", " color='#FF5722', linewidth=1.5, label='Unemployment %')\n", "axes[0].plot(df_phillips.index, df_phillips['Inflation'], \n", " color='#2196F3', linewidth=1.5, label='Inflation %')\n", "axes[0].axhline(y=2, color='green', linewidth=1, linestyle='--', alpha=0.5, label='2% Target')\n", "axes[0].set_title('Unemployment & Inflation Over Time', fontsize=14, fontweight='bold')\n", "axes[0].set_ylabel('%')\n", "axes[0].legend(fontsize=11)\n", "\n", "# Right panel: Scatter plot with regression line\n", "x = df_phillips['Unemployment'].values\n", "y = df_phillips['Inflation'].values\n", "\n", "# Linear regression\n", "slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)\n", "\n", "axes[1].scatter(x, y, alpha=0.4, s=30, color='#9C27B0', edgecolors='white')\n", "\n", "# Regression line\n", "x_line = np.linspace(x.min(), x.max(), 100)\n", "y_line = slope * x_line + intercept\n", "axes[1].plot(x_line, y_line, color='red', linewidth=2.5, \n", " label=f'y = {slope:.2f}x + {intercept:.2f}')\n", "\n", "axes[1].set_title('Phillips Curve: Does It Hold?', fontsize=14, fontweight='bold')\n", "axes[1].set_xlabel('Unemployment Rate (%)', fontsize=13)\n", "axes[1].set_ylabel('Inflation Rate (%)', fontsize=13)\n", "\n", "# Add stats annotation\n", "stats_text = f'RΒ² = {r_value**2:.3f}\\np-value = {p_value:.4f}\\nn = {len(x)}'\n", "axes[1].annotate(stats_text, xy=(0.05, 0.95), xycoords='axes fraction',\n", " fontsize=12, verticalalignment='top',\n", " bbox=dict(boxstyle='round,pad=0.5', facecolor='lightyellow', alpha=0.8))\n", "axes[1].legend(fontsize=11)\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(f\"\\nπŸ“Š Regression Results:\")\n", "print(f\" Slope: {slope:.4f} (for every 1% increase in unemployment,\")\n", "print(f\" inflation changes by {slope:.2f} percentage points)\")\n", "print(f\" R-squared: {r_value**2:.4f}\")\n", "print(f\" P-value: {p_value:.6f}\")\n", "print(f\"\\nπŸ’‘ Is the Phillips Curve alive? An RΒ² of {r_value**2:.3f} suggests\")\n", "print(f\" {'a meaningful' if r_value**2 > 0.1 else 'a weak'} relationship in the modern era.\")\n" ] }, { "cell_type": "markdown", "id": "b530cd42", "metadata": {}, "source": [ "### 4.2 Correlation Matrix: How Do Economic Indicators Move Together?\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d227363c", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# Build a comprehensive monthly indicator dataframe\n", "# -------------------------------------------------\n", "\n", "econ_df = pd.DataFrame({\n", " 'Unemployment': unemployment,\n", " 'Inflation': inflation,\n", " 'Fed Funds Rate': fed_funds,\n", " '10-Yr Treasury': treasury_10y,\n", " 'GDP Growth': gdp.pct_change() * 100\n", "}).dropna()\n", "\n", "print(f\"Combined dataset: {len(econ_df)} observations, {econ_df.shape[1]} variables\\n\")\n", "\n", "# Correlation matrix\n", "corr_matrix = econ_df.corr()\n", "\n", "# -------------------------------------------------\n", "# PLOT 6: Correlation Heatmap\n", "# -------------------------------------------------\n", "\n", "fig, ax = plt.subplots(figsize=(10, 8))\n", "\n", "mask = np.triu(np.ones_like(corr_matrix, dtype=bool), k=1) # Upper triangle mask\n", "sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='RdBu_r', center=0,\n", " square=True, linewidths=2, linecolor='white',\n", " annot_kws={'fontsize': 13, 'fontweight': 'bold'},\n", " vmin=-1, vmax=1, ax=ax)\n", "\n", "ax.set_title('Correlation Matrix: US Economic Indicators', fontsize=16, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(\"\\nπŸ’‘ Key observations:\")\n", "print(\" β€’ Positive correlation = they move together\")\n", "print(\" β€’ Negative correlation = they move in opposite directions\")\n", "print(\" β€’ Closer to Β±1 = stronger relationship\")\n" ] }, { "cell_type": "markdown", "id": "25ee40f7", "metadata": {}, "source": [ "### 4.3 Distribution Analysis: Is Economic Growth \"Normal\"?\n" ] }, { "cell_type": "code", "execution_count": null, "id": "de8ecb78", "metadata": {}, "outputs": [], "source": [ "# -------------------------------------------------\n", "# PLOT 7: Distribution of GDP Growth Rates\n", "# -------------------------------------------------\n", "\n", "gdp_growth_clean = (gdp.pct_change() * 4 * 100).dropna() # Annualized quarterly\n", "\n", "fig, axes = plt.subplots(1, 2, figsize=(16, 6))\n", "\n", "# Histogram with KDE\n", "axes[0].hist(gdp_growth_clean.values, bins=30, density=True, alpha=0.6, \n", " color='#2196F3', edgecolor='white', linewidth=1.2)\n", "gdp_growth_clean.plot.kde(ax=axes[0], color='#F44336', linewidth=2.5, label='KDE')\n", "axes[0].axvline(x=gdp_growth_clean.mean(), color='green', linewidth=2, \n", " linestyle='--', label=f'Mean: {gdp_growth_clean.mean():.1f}%')\n", "axes[0].axvline(x=0, color='black', linewidth=1, linestyle='-', alpha=0.5)\n", "axes[0].set_title('Distribution of US GDP Growth Rates', fontsize=14, fontweight='bold')\n", "axes[0].set_xlabel('Annualized Growth Rate (%)')\n", "axes[0].set_ylabel('Density')\n", "axes[0].legend(fontsize=11)\n", "\n", "# QQ Plot β€” tests if the distribution is normal\n", "stats.probplot(gdp_growth_clean.values, dist=\"norm\", plot=axes[1])\n", "axes[1].set_title('Q-Q Plot: Is GDP Growth Normal?', fontsize=14, fontweight='bold')\n", "axes[1].get_lines()[0].set(color='#2196F3', markersize=5)\n", "axes[1].get_lines()[1].set(color='red', linewidth=2)\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "# Formal normality test\n", "shapiro_stat, shapiro_p = stats.shapiro(gdp_growth_clean.values[-100:]) # Last 100 obs\n", "\n", "print(f\"\\nπŸ“Š Distribution Statistics:\")\n", "print(f\" Mean: {gdp_growth_clean.mean():.2f}%\")\n", "print(f\" Median: {gdp_growth_clean.median():.2f}%\")\n", "print(f\" Std Dev: {gdp_growth_clean.std():.2f}%\")\n", "print(f\" Skewness: {gdp_growth_clean.skew():.3f} {'(left-skewed ⬅️)' if gdp_growth_clean.skew() < 0 else '(right-skewed ➑️)'}\")\n", "print(f\" Kurtosis: {gdp_growth_clean.kurtosis():.3f} {'(heavy tails β€” more extreme events!)' if gdp_growth_clean.kurtosis() > 0 else '(light tails)'}\")\n", "print(f\"\\n Shapiro-Wilk test: W={shapiro_stat:.4f}, p={shapiro_p:.4f}\")\n", "print(f\" {' β†’ Distribution is NOT normal (p < 0.05)' if shapiro_p < 0.05 else ' β†’ Distribution appears normal (p >= 0.05)'}\")\n", "print(f\"\\nπŸ’‘ The skewness and heavy tails tell us that economic downturns\")\n", "print(f\" tend to be sharper than expansions β€” recessions hit fast!\")\n" ] }, { "cell_type": "markdown", "id": "fb32f4b2", "metadata": {}, "source": [ "---\n", "## 🎯 Section 5: Your Turn! (Practice Exercises)\n", "\n", "Now it's your turn. Try these exercises using the skills you've learned above.\n", "\n", "### Exercise 1: Housing Market\n", "Pull the **S&P/Case-Shiller Home Price Index** from FRED (series ID: `CSUSHPINSA`) and plot it from 2000 to present. Can you spot the 2008 housing bubble?\n", "\n", "### Exercise 2: Global Internet Adoption\n", "Using the World Bank API, create a line chart showing internet usage (% of population) for 5 countries of your choice. Which country had the fastest adoption?\n", "\n", "### Exercise 3: Hypothesis Test\n", "Is the average inflation rate in the 2010s (2010–2019) significantly different from the 2020s (2020–present)? Use a **t-test** to find out. (Hint: `stats.ttest_ind()`)\n", "\n", "### Exercise 4: Build Your Own Dashboard\n", "Choose 3 FRED indicators that interest you and create a dashboard similar to the one in Section 2. Add recession shading for bonus points!\n", "\n", "---\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f6b9394f", "metadata": {}, "outputs": [], "source": [ "# Exercise 1: Housing Market β€” Starter Code\n", "# =============================================\n", "\n", "# housing = fred.get_series('CSUSHPINSA', observation_start='2000-01-01')\n", "# \n", "# fig, ax = plt.subplots(figsize=(14, 6))\n", "# ax.plot(housing.index, housing.values)\n", "# ax.set_title('S&P/Case-Shiller Home Price Index')\n", "# plt.show()\n", "\n", "# YOUR CODE HERE πŸ‘‡\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c09add59", "metadata": {}, "outputs": [], "source": [ "# Exercise 2: Global Internet Adoption β€” Starter Code\n", "# ====================================================\n", "\n", "# my_countries = ['USA', 'KOR', 'KEN', 'BRA', 'CHN'] # Choose your own!\n", "# \n", "# internet_data = wb.data.DataFrame(\n", "# 'IT.NET.USER.ZS',\n", "# economy=my_countries,\n", "# time=range(2000, 2023),\n", "# labels=True\n", "# )\n", "\n", "# YOUR CODE HERE πŸ‘‡\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7a455059", "metadata": {}, "outputs": [], "source": [ "# Exercise 3: Hypothesis Test β€” Starter Code\n", "# =============================================\n", "\n", "# inflation_2010s = inflation['2010':'2019'].dropna()\n", "# inflation_2020s = inflation['2020':].dropna()\n", "# \n", "# t_stat, p_value = stats.ttest_ind(inflation_2010s, inflation_2020s)\n", "# print(f\"T-statistic: {t_stat:.4f}\")\n", "# print(f\"P-value: {p_value:.6f}\")\n", "\n", "# YOUR CODE HERE πŸ‘‡\n", "\n" ] }, { "cell_type": "markdown", "id": "b43c6c01", "metadata": {}, "source": [ "---\n", "## πŸ“š Section 6: Quick Reference\n", "\n", "### Useful FRED Series IDs\n", "| Series ID | Description |\n", "|-----------|-------------|\n", "| `GDP` | Gross Domestic Product |\n", "| `UNRATE` | Unemployment Rate |\n", "| `CPIAUCSL` | Consumer Price Index (All Urban) |\n", "| `FEDFUNDS` | Federal Funds Effective Rate |\n", "| `DGS10` | 10-Year Treasury Yield |\n", "| `CSUSHPINSA` | Case-Shiller Home Price Index |\n", "| `UMCSENT` | Consumer Sentiment |\n", "| `PAYEMS` | Total Nonfarm Payrolls |\n", "| `M2SL` | M2 Money Supply |\n", "| `USREC` | Recession Indicator (0/1) |\n", "\n", "### Useful World Bank Indicator Codes\n", "| Code | Description |\n", "|------|-------------|\n", "| `NY.GDP.PCAP.CD` | GDP per capita (current US$) |\n", "| `SP.DYN.LE00.IN` | Life expectancy at birth |\n", "| `IT.NET.USER.ZS` | Internet users (% of pop.) |\n", "| `EN.ATM.CO2E.PC` | CO2 emissions per capita |\n", "| `SE.ADT.LITR.ZS` | Adult literacy rate |\n", "| `SL.UEM.TOTL.ZS` | Unemployment (ILO estimate) |\n", "\n", "### Key `scipy.stats` Functions\n", "| Function | Purpose |\n", "|----------|---------|\n", "| `linregress(x, y)` | Simple linear regression |\n", "| `pearsonr(x, y)` | Pearson correlation + p-value |\n", "| `ttest_ind(a, b)` | Independent samples t-test |\n", "| `shapiro(x)` | Shapiro-Wilk normality test |\n", "\n", "---\n", "\n", "*Notebook created for DATA 110 β€” Introduction to Data Science* \n", "*APIs used: FRED (Federal Reserve) & World Bank* \n", "*Have questions? Check: https://fred.stlouisfed.org and https://data.worldbank.org*\n" ] }, { "cell_type": "code", "execution_count": null, "id": "0c52722a-27f6-4c54-9dcc-f6804727d6fa", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }