{ "cells": [ { "cell_type": "markdown", "id": "2551c474-b7d9-4ed5-8656-6b649450e55a", "metadata": {}, "source": [ "## Data Art: The 100,000-Step Random Walk\n", "You will need the foundational data science libraries, `numpy` and `matplotlib`, installed in your environment to run this. \n", "### What you will learn:\n", "* **Scale and Speed:** You are generating, processing, and plotting 200,000 individual data points (x and y arrays) in a fraction of a second. This immediately highlights Python's computational efficiency over manual tools.\n", "* **Vectorization:** It introduces the power of `numpy` arrays. By using `np.cumsum` (cumulative sum) instead of writing a slow `for` loop, you are showing them \"the Pythonic way\" to handle big data.\n", "* **Data Storytelling:** The `cmap='inferno'` argument maps the color to the time step (from the 1st point to the 100,000th). It visually teaches them that data visualization isn't just about bar charts; it is about encoding variables (like time) into visuals (like color) to reveal patterns." ] }, { "cell_type": "code", "execution_count": null, "id": "69ebed8b-0a50-44a4-93fc-358f8a66ed01", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "# 1. Set up the canvas aesthetic\n", "plt.style.use('dark_background')\n", "plt.figure(figsize=(10, 10))\n", "\n", "# 2. Generate the \"data\" (100,000 steps of a random walk)\n", "# We use numpy to quickly calculate cumulative sums of random numbers\n", "steps = 100000\n", "x = np.cumsum(np.random.randn(steps))\n", "y = np.cumsum(np.random.randn(steps))\n", "\n", "# 3. Plot the data art\n", "# We color the points based on their step number to show the progression over time\n", "plt.scatter(x, y, c=range(steps), cmap='inferno', s=1, alpha=0.8)\n", "\n", "# 4. Clean up and display\n", "plt.axis('off') # Hide the axes for a cleaner \"art\" look\n", "plt.title(\"The Beauty of Randomness\", fontsize=20, color='white', pad=20)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "702c7597-d0f8-4d90-b4b9-3f00ce535245", "metadata": {}, "source": [ "## The 5-Line Interactive Data Dashboard\n", "You will need the `plotly` library for this. It is the gold standard for interactive charts in Python. \n", "### Key Features of this Visualization:\n", "* **Zero to Dashboard in Seconds:** You will see an interactive chart appear right in the notebook. You can hover your mouse over the bubbles to see specific country data, zoom in on specific regions, and click continents on the legend to filter the data.\n", "* **The \"Play\" Button:** The `animation_frame` argument automatically builds a play button and a slider at the bottom of the chart. When you click play, you watch the entire world develop economically over 50 years.\n", "* **Dimensionality:** MBAs often struggle with how to present complex data without making slides look cluttered. This teaches you how to visualize five dimensions of data at once:\n", " 1. **X-axis:** Wealth (GDP)\n", " 2. **Y-axis:** Health (Life Expectancy)\n", " 3. **Bubble Size:** Population\n", " 4. **Bubble Color:** Continent (Categorical grouping)\n", " 5. **Animation Slider:** Time\n", "* **The Logarithmic Scale:** The `log_x=True` argument is a great teaching moment for business students. It shows how we can use math to clearly visualize massive disparities in income (comparing a country with $500 GDP to one with $50,000) on the same screen without compressing the lower-income countries into a single clump." ] }, { "cell_type": "code", "execution_count": null, "id": "f4304430-1d4d-47eb-9b84-1c111c462011", "metadata": {}, "outputs": [], "source": [ "import plotly.express as px\n", "\n", "# 1. Load a built-in economic and demographic dataset\n", "df = px.data.gapminder()\n", "\n", "# 2. Create an animated 5-dimensional chart (X, Y, Color, Size, and Time)\n", "fig = px.scatter(\n", " df, \n", " x=\"gdpPercap\", \n", " y=\"lifeExp\", \n", " animation_frame=\"year\", \n", " animation_group=\"country\",\n", " size=\"pop\", \n", " color=\"continent\", \n", " hover_name=\"country\",\n", " log_x=True, \n", " size_max=55, \n", " range_x=[100, 100000], \n", " range_y=[25, 90],\n", " title=\"Global Wealth vs. Health (1952 - 2007)\"\n", ")\n", "\n", "# 3. Render the interactive dashboard\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "73f74441-1dc5-41b8-863a-2d32414f08f1", "metadata": {}, "source": [ "## For the Finance Students: The \"Wall Street\" Live Dashboard\n", "Python can bypass clunky financial software and pull live market data directly into an interactive environment. This script uses the `yfinance` and `plotly` libraries to pull the last 6 months of Apple stock and generate a professional, interactive Candlestick chart. \n", "### Why this works for finance:\n", "* **Live, Real-World Data:** You aren't looking at a static textbook CSV; you are analyzing the market right now. You can instantly change AAPL to TSLA or SPY.\n", "* **Professional Visualization:** Candlestick charts are the universal language of traders. Being able to hover over a specific day to see the exact open, close, high, and low prices shows you how analysts build custom trading dashboards." ] }, { "cell_type": "code", "execution_count": null, "id": "3cb5b6b6-99d4-4de5-924f-3a42ca268ef5", "metadata": {}, "outputs": [], "source": [ "%pip install yfinance plotly\n", "\n", "import yfinance as yf\n", "import plotly.graph_objects as go\n", "\n", "# 1. Fetch live market data (e.g., Apple stock over the last 6 months)\n", "ticker = \"AAPL\"\n", "stock_data = yf.download(ticker, period=\"6mo\")\n", "\n", "# 2. Create the interactive Candlestick chart\n", "fig = go.Figure(data=[go.Candlestick(\n", " x=stock_data.index,\n", " open=stock_data['Open'],\n", " high=stock_data['High'],\n", " low=stock_data['Low'],\n", " close=stock_data['Close']\n", ")])\n", "\n", "# 3. Add Wall Street styling and display\n", "fig.update_layout(\n", " title=f\"Live Interactive Market Data: {ticker}\",\n", " yaxis_title=\"Stock Price (USD)\",\n", " template=\"plotly_dark\",\n", " xaxis_rangeslider_visible=False\n", ")\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "f82dfd31-847d-438a-85fe-ce1161677dee", "metadata": {}, "source": [ "## For the Baseball Athletes: The \"Moneyball\" Pitch Heatmap\n", "Modern baseball is driven entirely by data. The `pybaseball` library allows anyone to query MLB's Statcast database just like an MLB analyst. This script pulls actual pitch data for a star pitcher (like Gerrit Cole) and uses `seaborn` to generate a fiery heat map of where his fastballs cross the plate. \n", "### Why this works for athletes:\n", "* **Tangible Connection:** You know what a 4-seam fastball looks like from the batter's box. Now you are seeing the math behind it.\n", "* **Big Data Made Beautiful:** Statcast data contains dozens of columns (spin rate, launch angle, break). Extracting just the `plate_x` and `plate_z` coordinates to map density teaches you how to distill complex datasets into actionable coaching insights." ] }, { "cell_type": "code", "execution_count": null, "id": "fe0a1ccf-fcb4-44a4-8eec-1b0976d933e9", "metadata": {}, "outputs": [], "source": [ "%pip install pybaseball seaborn matplotlib\n", "\n", "from pybaseball import statcast_pitcher\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "\n", "# 1. Pull Statcast data for a pitcher using their MLBAM ID\n", "# Gerrit Cole's ID is 543037. Let's pull his Cy Young season data!\n", "pitches = statcast_pitcher('2023-04-01', '2023-09-30', 543037)\n", "\n", "# 2. Filter the data to only look at 4-Seam Fastballs (FF)\n", "fastballs = pitches[pitches['pitch_type'] == 'FF']\n", "\n", "# 3. Set up the visual canvas\n", "plt.figure(figsize=(7, 9))\n", "plt.style.use('dark_background')\n", "\n", "# 4. Create a \"Moneyball\" Heatmap of pitch locations \n", "# plate_x (horizontal) and plate_z (vertical) from the catcher's perspective\n", "sns.kdeplot(\n", " x=fastballs['plate_x'], \n", " y=fastballs['plate_z'], \n", " fill=True, \n", " cmap=\"inferno\", \n", " thresh=0.05,\n", " alpha=0.8\n", ")\n", "\n", "# 5. Draw the Strike Zone for context\n", "plt.plot([-0.7, 0.7, 0.7, -0.7, -0.7], [1.5, 1.5, 3.5, 3.5, 1.5], color='white', linewidth=3, linestyle='--')\n", "\n", "plt.title(\"Gerrit Cole: 4-Seam Fastball Heatmap\", fontsize=16, pad=15)\n", "plt.xlim(-3, 3)\n", "plt.ylim(0, 5)\n", "plt.axis('off')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "59f3749e-bdfa-4b4e-9153-1252e7f9ca7e", "metadata": {}, "source": [ "## The Anatomy of a World Record\n", "For track and field student-athletes, the numbers on a stopwatch are everything. Showing you how to use Python to dissect a real world-record race will instantly make data analytics relevant to your daily training. \n", "\n", "Here is a script that breaks down the greatest sprint in human history: Usain Bolt's 2009 100m World Record (9.58 seconds). It takes his actual, historical 10-meter split times, calculates his velocity for each segment, and plots his speed curve. \n", "Just like before, open a new Jupyter cell. Since we already installed `matplotlib`, you just need to run this code using standard `numpy` arrays:\n", "\n", "### For track athletes:\n", "* **Translating Time into Velocity:** Athletes understand split times, but calculating and visualizing speed makes the physics of a race tangible. Seeing Bolt hit nearly 45 km/h puts his athletic achievement into sharp perspective.\n", "* **The Three Phases of a Sprint:** The graph visually proves what sprint coaches preach every day. You will clearly see the steep climb (Acceleration Phase), the peak (Max Velocity around 60-70 meters), and the crucial slight dip at the end (Speed Endurance/Deceleration).\n", "* **Array Operations (Vectorization):** From a coding perspective, subtracting entire arrays of data simultaneously using `np.diff()` is a brilliant way to introduce Python's efficiency over doing manual, row-by-row math in Excel." ] }, { "cell_type": "code", "execution_count": null, "id": "7f0f7b43-9b9e-47ac-a649-9b2334561071", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "# 1. Usain Bolt's 2009 WR 100m splits (Cumulative time in seconds)\n", "# Distance markers every 10 meters\n", "distance = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])\n", "\n", "# Bolt's actual race times as he crossed each 10m line\n", "time = np.array([0, 1.89, 2.88, 3.78, 4.64, 5.47, 6.29, 7.10, 7.92, 8.75, 9.58])\n", "\n", "# 2. Calculate the change in distance and time for each 10m segment\n", "delta_d = np.diff(distance) \n", "delta_t = np.diff(time) \n", "\n", "# 3. Calculate Velocity (Speed = Distance / Time)\n", "# We multiply by 3.6 to convert meters/second to kilometers/hour!\n", "velocity_mps = delta_d / delta_t\n", "velocity_kmh = velocity_mps * 3.6\n", "\n", "# Find the midpoint of each segment to center our data points on the graph\n", "midpoints = distance[:-1] + 5 \n", "\n", "# 4. Set up the visual canvas\n", "plt.figure(figsize=(10, 6))\n", "plt.style.use('dark_background')\n", "\n", "# 5. Plot the Velocity Curve\n", "plt.plot(midpoints, velocity_kmh, marker='o', color='cyan', linewidth=3, markersize=8)\n", "\n", "# Highlight his absolute Max Speed\n", "max_speed_idx = np.argmax(velocity_kmh)\n", "max_speed_val = velocity_kmh[max_speed_idx]\n", "plt.scatter(midpoints[max_speed_idx], max_speed_val, color='gold', s=200, zorder=5)\n", "\n", "# Add annotation with adjusted position\n", "plt.annotate(f'Top Speed:\\n{max_speed_val:.1f} km/h', \n", " (midpoints[max_speed_idx], max_speed_val), \n", " textcoords=\"offset points\", xytext=(0, 25), ha='center', color='gold', fontsize=12, fontweight='bold')\n", "\n", "# Formatting for presentation\n", "# Increased pad to 30 to give the title more room\n", "plt.title(\"The Anatomy of a World Record: Usain Bolt (9.58s)\", fontsize=16, pad=30)\n", "plt.xlabel(\"Distance (Meters)\", fontsize=12)\n", "plt.ylabel(\"Velocity (km/h)\", fontsize=12)\n", "plt.xticks(distance)\n", "# Set the y-axis limit a bit higher to create breathing room at the top\n", "plt.ylim(0, max_speed_val + 5) \n", "plt.grid(axis='y', alpha=0.3)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "7688b55e-09f6-4dff-a800-86e32007f855", "metadata": {}, "source": [ "## The Beauty Launch 3D Profit Dashboard\n", "Accountants are the unsung heroes of the beauty industry. Behind every viral Rare Beauty blush or luxurious Estée Lauder serum launch, there is an accounting and data team figuring out the exact pricing strategy to make it profitable. \n", "This code shows how Python takes standard textbook formulas (like Break-Even Analysis) and turns them into interactive, executive-level presentations by building a 3D Profitability Surface Plot. \n", "\n", "This script models a hypothetical new \"Viral Liquid Blush\" launch. It takes core accounting principles—Fixed Costs, Variable Costs, and Revenue—and visualizes every possible profit outcome based on how the product is priced and how many units sell. Since we already installed `plotly` in our first finance example, you can paste this straight into a new Jupyter cell.\n", "\n", "### For accounting students:\n", "* **Accounting Meets Analytics:** It elevates a static concept (Cost-Volume-Profit analysis) into a dynamic visual. Instead of calculating a single break-even point in Excel, you are using `numpy` arrays to instantly calculate and visualize 2,500 different financial scenarios at once.\n", "* **The \"Zero Line\":** Notice the `zerolinecolor=\"red\"` argument in the code? When you spin the 3D graph around, you will see a thick red line on the Z-axis marking $0. Everything below that line is a loss; everything above is profit. It visually maps out exactly where the product becomes a financial success.\n", "* **Executive Presentation:** At major conglomerates like Estée Lauder, accountants don't just hand over spreadsheets; they present insights to marketing executives. Being able to click, drag, and rotate a 3D financial model in real-time shows you how Python can make you a better storyteller in the boardroom." ] }, { "cell_type": "code", "execution_count": null, "id": "7655df04-f996-4409-9dcf-d06d37a1666a", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import plotly.graph_objects as go\n", "\n", "# 1. Set the Accounting Variables for a new \"Viral Blush\" Launch\n", "fixed_costs = 500000 # R&D, Influencer Marketing, and Overhead\n", "variable_cost_per_unit = 4.50 # Formula, packaging, and shipping per unit\n", "\n", "# 2. Forecast a range of possible Retail Prices and Sales Volumes\n", "# Testing prices from $15 to $40, and sales from 20,000 to 150,000 units\n", "retail_price = np.linspace(15, 40, 50) \n", "units_sold = np.linspace(20000, 150000, 50) \n", "\n", "# Create a 2D matrix (grid) to test every combination of price and volume\n", "Price_Grid, Units_Grid = np.meshgrid(retail_price, units_sold)\n", "\n", "# 3. The Core Accounting Formula: Calculate Net Profit for every scenario\n", "Revenue = Price_Grid * Units_Grid\n", "Total_Costs = fixed_costs + (variable_cost_per_unit * Units_Grid)\n", "Profit = Revenue - Total_Costs\n", "\n", "# 4. Build the 3D Interactive Surface Plot\n", "fig = go.Figure(data=[go.Surface(\n", " z=Profit, \n", " x=Price_Grid, \n", " y=Units_Grid,\n", " colorscale='RdPu', # We use a Red-Purple color map for a cosmetics brand vibe!\n", " colorbar_title=\"Net Profit ($)\"\n", ")])\n", "\n", "# 5. Style the dashboard for an executive pitch\n", "fig.update_layout(\n", " title=\"New Cosmetic Launch: Profitability Matrix\",\n", " scene=dict(\n", " xaxis_title=\"Retail Price ($)\",\n", " yaxis_title=\"Volume (Units Sold)\",\n", " zaxis_title=\"Net Profit ($)\",\n", " # Highlight the \"Break-Even\" line at $0 Profit\n", " zaxis=dict(backgroundcolor=\"lightgrey\", gridcolor=\"white\", showbackground=True, zerolinecolor=\"red\", zerolinewidth=5)\n", " ),\n", " margin=dict(l=0, r=0, b=0, t=50),\n", " template=\"plotly_dark\"\n", ")\n", "\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "8b3cdde8-d387-4196-b7fd-96b253b685ec", "metadata": {}, "source": [ "## The Monte Carlo Project Risk Dashboard\n", "To bridge the gap between financial risk modeling, data mining, and the Spanish construction industry, we can build a Monte Carlo Risk Simulation. \n", "In construction, project costs are notoriously unpredictable. Material costs (like steel and concrete) fluctuate, labor can run into overtime, and weather or permit delays add unforeseen expenses. Traditional spreadsheets handle this poorly because they rely on static, single-point estimates.\n", "\n", "This script uses data analytics to simulate a new commercial build in Madrid 10,000 times in a fraction of a second. It uses statistical distributions to model the uncertainty of different cost factors, giving you a dynamic, executive-level financial risk profile. This uses `numpy` for the heavy statistical lifting and `seaborn` (which we installed earlier) for the visualization.\n", "\n", "### What you will learn:\n", "* **Real-World Business Application:** When bidding on massive construction projects, knowing there is a specific, mathematically backed percentage chance of going over budget is the difference between a profitable year and bankruptcy.\n", "* **Intro to Advanced Analytics:** Monte Carlo simulations are a staple of both Wall Street quantitative finance and advanced data mining. It visually proves that \"mining\" historical data to find the standard deviations of labor and materials allows you to predict the future with terrifying accuracy.\n", "* **The Power of Distributions:** It perfectly illustrates that not all data behaves the same way. Material costs might follow a triangular distribution (they rarely drop below a certain price but can skyrocket), while labor follows a standard bell curve. Python handles these distinct behaviors effortlessly." ] }, { "cell_type": "code", "execution_count": null, "id": "68715830-637b-4e7b-b2c1-a3e9b38d5f68", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "# 1. Set up the parameters for a Madrid Construction Project (in Euros)\n", "simulations = 10000\n", "\n", "# Simulate random costs based on historical data distributions\n", "# Labor costs (Normal distribution: mean=€15M, std_dev=€1.5M)\n", "labor_costs = np.random.normal(15000000, 1500000, simulations)\n", "\n", "# Material costs (Triangular distribution: min=€18M, expected=€20M, max=€26M)\n", "# Steel and concrete prices have a hard floor but can spike significantly\n", "material_costs = np.random.triangular(18000000, 20000000, 26000000, simulations)\n", "\n", "# Permit & weather delays (Exponential distribution representing sudden risk)\n", "delay_costs = np.random.exponential(2000000, simulations)\n", "\n", "# 2. Calculate Total Project Cost for all 10,000 alternate realities\n", "total_costs = labor_costs + material_costs + delay_costs\n", "total_costs_millions = total_costs / 1000000 # Convert to millions for readability\n", "\n", "# 3. Financial Analysis: What is the exact risk of going over our €42M budget?\n", "budget_target = 42.0\n", "over_budget_probability = np.mean(total_costs_millions > budget_target) * 100\n", "\n", "# 4. Visualize the Financial Risk Profile\n", "plt.figure(figsize=(10, 6))\n", "plt.style.use('dark_background')\n", "\n", "# Plot the probability distribution of possible costs\n", "sns.histplot(total_costs_millions, bins=50, kde=True, color='dodgerblue', edgecolor='black')\n", "\n", "# Add the \"Budget Line\"\n", "plt.axvline(budget_target, color='red', linestyle='--', linewidth=3, label=f'Target Budget (€{budget_target}M)')\n", "\n", "# Add context to the chart\n", "plt.title(\"Commercial Build in Madrid: Monte Carlo Cost Analysis\", fontsize=16, pad=20)\n", "plt.xlabel(\"Total Projected Cost (Millions €)\", fontsize=12)\n", "plt.ylabel(\"Frequency (Out of 10,000 Scenarios)\", fontsize=12)\n", "\n", "# Add a text box with the bottom-line financial insight\n", "plt.text(total_costs_millions.max() - 4, 400, f'Risk of Overrun:\\n{over_budget_probability:.1f}%', \n", " fontsize=14, color='white', bbox=dict(facecolor='red', alpha=0.5))\n", "\n", "plt.legend()\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "3ed347a3-8599-4429-876b-d173f657cf5b", "metadata": {}, "source": [ "## The Tip Forecaster: Lunch vs. Dinner Revenue\n", "Working in a restaurant is the ultimate crash course in cash flow, unit economics, and maximizing hourly yield. To bridge your day job with your finance degree, we'll use Python for Predictive Revenue Modeling. \n", "Every modern restaurant Point of Sale (POS) system like Toast or Square exports massive CSV files of daily transactions. This script uses a famous built-in dataset of restaurant tips and applies Linear Regression—a core financial forecasting tool—to predict exactly how much money a server will make based on the size of the bill, split by their shift.\n", "\n", "Since we already installed `seaborn` and `matplotlib` earlier, we can run this immediately in a new Jupyter cell. `seaborn` actually has this \"tips\" dataset built right into it for practice!\n", "\n", "### Why this bridges the gap between Finance and Hospitality:\n", "* **Predictive Analytics in Action:** In finance, linear regression is used to forecast stock prices or housing markets. Here, you are using it to forecast your own cash flow. The solid trendline drawn through the scatter plot represents the mathematical prediction of your tip based on any bill size.\n", "* **Finding the Margin:** If you look closely at the charts, the slope of the trendline for Dinner is typically steeper than Lunch, and the data points are much wider spread. It visually proves that while Lunch is highly predictable (tighter cluster), Dinner offers a higher financial ceiling for \"upselling\" appetizers and drinks to increase the total bill.\n", "* **Data Subsetting (The `col` argument):** By simply adding `col=\"time\"`, Python automatically splits the single dataset into two perfectly formatted charts. If you tried to do this in Excel, you would be manually filtering rows, copying data to new sheets, and building two separate graphs." ] }, { "cell_type": "code", "execution_count": null, "id": "9356d052-e660-4e8a-bb42-a31ff12a8c0a", "metadata": {}, "outputs": [], "source": [ "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "\n", "# 1. Load the historical POS transaction data\n", "# This dataset contains the total bill, tip amount, day, time, and party size\n", "df = sns.load_dataset(\"tips\")\n", "\n", "# 2. Set up the financial dashboard aesthetic\n", "plt.style.use('dark_background')\n", "\n", "# 3. Build a Predictive Linear Regression Model (lmplot)\n", "# This plots the data AND automatically calculates the financial trendline\n", "g = sns.lmplot(\n", " data=df,\n", " x=\"total_bill\", \n", " y=\"tip\", \n", " hue=\"time\", # Color-code by Lunch vs. Dinner\n", " col=\"time\", # Split into two separate side-by-side charts\n", " palette=\"spring\", # Use a vibrant color palette\n", " height=6, # Chart size\n", " scatter_kws={\"s\": 100, \"alpha\": 0.6, \"edgecolor\": \"white\"}, # Bubble styling\n", " line_kws={\"linewidth\": 4} # Trendline styling\n", ")\n", "\n", "# 4. Format for an executive presentation\n", "g.set_axis_labels(\"Total Bill ($)\", \"Expected Tip Amount ($)\", fontsize=12)\n", "g.set_titles(col_template=\"{col_name} Shift Forecasting\", size=14, weight='bold')\n", "g.fig.suptitle(\"Predictive Revenue Modeling: Table Value vs. Tip Yield\", y=1.05, fontsize=18)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "96f6b40b-7b6b-4a5c-b68a-2ca99c16939e", "metadata": {}, "source": [ "## The Sports Medicine Actuary Dashboard\n", "Combining pediatrics, finance, and baseball might seem impossible at first glance, but in the world of data analytics, it is the exact recipe for Sports Medicine Hospital Administration or Sports Insurance Actuarial Science. To hit all three passions at once, we are going to build a Pediatric Arm Care & Financial Risk Model. \n", "In youth baseball, the biggest medical crisis is \"Little League Elbow\" (UCL injuries) caused by young kids throwing too many pitches before their bodies develop. This script generates a simulated database of 500 youth patients. It uses medical logic (Age vs. Pitch Count) to calculate their injury risk, and then applies a core finance concept (Expected Value) to calculate the projected medical costs for a sports clinic or insurance provider.\n", "\n", "This uses `pandas` to build the medical database and `plotly` to create an interactive financial-medical dashboard.\n", "\n", "### Why this hits a grand slam for your specific interests:\n", "* **The Pediatrics (Medical Logic):** You will instantly see the medical truth in the data. The graph will show a cluster of bright red bubbles (high risk) in the top-left corner—representing young 8-to-10-year-olds throwing an irresponsible number of pitches.\n", "* **The Finance (Expected Value Modeling):** This introduces you to \"Actuarial Science,\" which is how health insurance companies make money. By multiplying the probability of an event by the cost of that event, you are calculating the financial risk portfolio of these young athletes. The larger the bubble, the bigger the financial liability.\n", "* **The Baseball (Arm Care Analytics):** MLB front offices use this exact type of modeling to decide whether to draft a high school pitcher. If a kid's \"medical cost\" bubble is too big because he's been overworked by his youth coaches, a team will pass on him to avoid a million-dollar surgery bill.\n", "* **Data Engineering (Pandas):** You are learning how to build and manipulate a `pandas DataFrame`, which is the absolute backbone of almost all data mining and analytics jobs today." ] }, { "cell_type": "code", "execution_count": null, "id": "71033a98-0dcf-4357-aa31-b0b30e8e1077", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import plotly.express as px\n", "\n", "# 1. Simulate the Pediatric Patient Database (500 youth baseball players)\n", "# We use a set seed so the random data is the same every time he runs it\n", "np.random.seed(42)\n", "num_patients = 500\n", "\n", "# Generate ages (Little League through High School: 8 to 18 years old)\n", "ages = np.random.randint(8, 19, num_patients)\n", "\n", "# Generate weekly pitch counts (Older kids generally pitch more, but with wide variance)\n", "pitch_counts = (ages * 5) + np.random.normal(20, 15, num_patients)\n", "pitch_counts = np.clip(pitch_counts, 10, 120) # Keep the data realistic\n", "\n", "# 2. The Pediatric Medical Model\n", "# Medical reality: High pitch counts at a young age spike injury risk.\n", "# We create a formula where risk increases as pitches go up, but decreases as age goes up.\n", "injury_probability = (pitch_counts / ages) * 0.06\n", "injury_probability = np.clip(injury_probability, 0.01, 0.95) # Cap at 95% risk\n", "\n", "# 3. The Finance Model (Expected Value)\n", "# Expected Cost = Base physical therapy cost + (Surgery Cost * Probability of Injury)\n", "# Assume basic PT is $1,000 and Tommy John Surgery is $25,000\n", "expected_costs = 1000 + (25000 * injury_probability)\n", "\n", "# 4. Combine into a \"Hospital Database\" (Pandas DataFrame)\n", "df = pd.DataFrame({\n", " 'Patient_Age': ages,\n", " 'Weekly_Pitches': pitch_counts.round(0),\n", " 'Injury_Risk_Pct': (injury_probability * 100).round(1),\n", " 'Expected_Medical_Cost_USD': expected_costs.round(2)\n", "})\n", "\n", "# 5. Build the Interactive 4D Scatter Plot\n", "fig = px.scatter(\n", " df, \n", " x=\"Patient_Age\", \n", " y=\"Weekly_Pitches\", \n", " size=\"Expected_Medical_Cost_USD\", # Bubble size represents financial cost\n", " color=\"Injury_Risk_Pct\", # Color represents medical danger\n", " color_continuous_scale=\"RdYlGn_r\", # Green (Safe) to Red (Danger)\n", " hover_data=['Expected_Medical_Cost_USD'], # Show exact dollar amount on hover\n", " labels={\n", " \"Patient_Age\": \"Patient Age (Years)\",\n", " \"Weekly_Pitches\": \"Avg. Weekly Pitches\",\n", " \"Injury_Risk_Pct\": \"Arm Injury Risk (%)\"\n", " },\n", " title=\"Pediatric Sports Medicine: Youth Pitching Risk & Expected Insurance Costs\"\n", ")\n", "\n", "# Style for a professional hospital board presentation\n", "fig.update_layout(template=\"plotly_dark\", title_x=0.5)\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "4d91c71e-03a1-41b4-acfe-344ae1a28f22", "metadata": {}, "source": [ "## The Quantitative Match Simulator\n", "\n", "Wall Street actively recruits former college wrestlers because of their ability to operate under pressure and intuitively understand \"risk versus reward.\" To bridge quantitative finance with time on the mat, this demo uses **Binomial Distributions and Monte Carlo Simulations**.\n", "\n", "In finance, these exact algorithms are used to price complex derivatives (like the Binomial Options Pricing Model). Here, we use the same math to simulate a 3-period match 10,000 times to calculate the probability of a win based on two wrestlers' historical stats.\n", "\n", "Open a new Jupyter cell. Since we already installed `seaborn` and `numpy` in previous exercises, you can run this immediately to simulate a matchup between \"Wrestler A\" (an aggressive attacker) and \"Wrestler B\" (a defensive counter-wrestler).\n", "\n", "\n", "\n", "### Why this hits the mark for a Finance/Wrestling student:\n", "\n", "* **Binomial Options Pricing in Action:** In a finance textbook, you learn about Binomial Trees—a model that calculates asset prices by moving step-by-step through time (up or down). Here, `np.random.binomial` does the exact same thing, determining step-by-step if an \"asset\" (a takedown) is acquired or denied based on statistical probability.\n", "* **The \"Expected Value\" of a Match:** Wrestlers inherently understand expected value. They know if they wrestle the same opponent 10 times, they might win 7 and lose 3. This code visualizes that gut feeling as a mathematical bell curve. \n", "* **Rule Change Analytics:** Try changing the takedown multiplier in the code from `3` back to `2` (the old NCAA rule). You will instantly see how the historic rule change mathematically shifted the win probability to favor aggressive attackers over defensive wrestlers. That is exactly the kind of \"what-if\" regulatory analysis quants do on Wall Street!" ] }, { "cell_type": "code", "execution_count": null, "id": "491f8c9c-9ece-4aea-b3a8-94945d0eab8b", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "# 1. Set the parameters for 10,000 alternate realities\n", "simulations = 10000\n", "exchanges = 6 # Average number of neutral action sequences in a college match\n", "\n", "# Wrestler A: High offensive output, average defense\n", "takedown_prob_A = 0.55\n", "escape_prob_A = 0.40\n", "\n", "# Wrestler B: Lower offense, elite defense/escaping\n", "takedown_prob_B = 0.30\n", "escape_prob_B = 0.85\n", "\n", "# 2. The Binomial Finance Model \n", "# We use a Binomial Distribution to simulate how many takedowns each hits in a match\n", "takedowns_A = np.random.binomial(exchanges, takedown_prob_A, simulations)\n", "takedowns_B = np.random.binomial(exchanges, takedown_prob_B, simulations)\n", "\n", "# 3. Calculate the Scrambles (NCAA Rules: 3 pts for takedown, 1 pt for escape)\n", "# If A gets a takedown, B gets a chance to escape based on their escape probability\n", "escapes_B = np.random.binomial(takedowns_A, escape_prob_B)\n", "escapes_A = np.random.binomial(takedowns_B, escape_prob_A)\n", "\n", "# Calculate final scores for all 10,000 matches\n", "score_A = (takedowns_A * 3) + (escapes_A * 1)\n", "score_B = (takedowns_B * 3) + (escapes_B * 1)\n", "\n", "# 4. Financial \"Margin\" Analysis: The Point Spread\n", "point_spread = score_A - score_B\n", "win_prob_A = np.mean(point_spread > 0) * 100\n", "\n", "# 5. Visualize the Mathematical Outcome\n", "plt.figure(figsize=(10, 6))\n", "plt.style.use('dark_background')\n", "\n", "# Plot the distribution of the point spread\n", "\n", "sns.histplot(point_spread, bins=range(-10, 15), kde=True, color='purple', discrete=True, alpha=0.7)\n", "\n", "# Highlight the \"Break-Even\" Overtime Line\n", "plt.axvline(0, color='white', linestyle='--', linewidth=2, label='Overtime (Tie)')\n", "plt.axvline(np.mean(point_spread), color='gold', linestyle='-', linewidth=3, label=f'Expected Spread (A by {np.mean(point_spread):.1f} pts)')\n", "\n", "plt.title(\"Monte Carlo Match Simulation: Wrestler A vs. Wrestler B\", fontsize=16, pad=15)\n", "plt.xlabel(\"Final Point Spread (Positive = Wrestler A Wins)\", fontsize=12)\n", "plt.ylabel(\"Frequency (Out of 10,000 Matches)\", fontsize=12)\n", "\n", "# Add the bottom-line probability insight\n", "plt.text(7, 800, f'Wrestler A Win Prob:\\n{win_prob_A:.1f}%', \n", " fontsize=14, color='white', bbox=dict(facecolor='purple', alpha=0.5))\n", "\n", "plt.legend()\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "cd27c46a-0a6a-4882-8ca6-06ed0e4c39d3", "metadata": {}, "source": [ "## The Carbon-Plated Asset: Depreciation & Risk Model\n", "For distance runners, a pair of $250 carbon-plated racing shoes (like the Nike Alphafly) is an investment. But as you log miles, the foam breaks down. For accounting students, that shoe is a depreciating fixed asset. For economics students, pushing that shoe past its lifespan introduces the marginal cost of injury risk (physical therapy bills, lost training time).\n", "\n", "This script builds a \"Shoe Replacement Optimizer.\" It calculates the exact mile marker where it makes the most financial and economic sense to retire the shoes and buy a new pair. \n", "\n", "Open a new Jupyter cell. Since we already installed `pandas` and `plotly` in previous exercises, you can paste the code directly into your notebook.\n", "\n", "\n", "\n", "### Why this hits the mark for this specific trio:\n", "* **For the Accounting Majors (Asset Management):** You are seeing \"Straight-Line Depreciation\" applied to a physical asset you use every day. The code `shoe_cost - (shoe_cost / max_useful_life_miles) * miles` is the exact formula you learn in Intro to Financial Accounting. Python just calculates it 50 times instantly.\n", "* **For the Economics Major (Optimization):** This is a textbook Marginal Cost vs. Marginal Benefit graph. You are finding the absolute minimum of the `total_cost` curve. It perfectly illustrates that trying to squeeze an extra 50 miles out of a \"free\" (fully depreciated) shoe actually *costs* you money due to the exponentially rising risk curve. \n", "* **For the Cross Country Athletes:** The interactive `hovermode=\"x unified\"` argument allows you to drag your mouse across the chart. You can literally look at \"Mile 350\" and see exactly how much financial danger your shins and knees are in compared to the remaining book value of the shoe." ] }, { "cell_type": "code", "execution_count": null, "id": "a8f7b7ce-79f5-4e9a-b1ef-c1e9f92fa161", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import plotly.graph_objects as go\n", "\n", "# 1. The Asset: A high-end running shoe\n", "shoe_cost = 250.00\n", "max_useful_life_miles = 400\n", "\n", "# Generate an array of miles run (from 0 to 500 in increments of 10)\n", "miles = np.arange(0, 510, 10)\n", "\n", "# 2. The Accounting Model: Straight-Line Depreciation\n", "# The shoe steadily loses value until it is worth $0 at its max useful life\n", "book_value = np.maximum(shoe_cost - (shoe_cost / max_useful_life_miles) * miles, 0)\n", "\n", "# 3. The Economics Model: Marginal Cost of Injury\n", "# As the foam dies, the financial risk of shin splints or physical therapy rises exponentially\n", "base_medical_risk = 5.00\n", "injury_risk_cost = base_medical_risk * np.exp(miles / 110) \n", "\n", "# 4. Total Economic Cost\n", "total_cost = book_value + injury_risk_cost\n", "\n", "# Find the exact optimal point to buy a new shoe (Lowest Total Economic Cost)\n", "optimal_idx = np.argmin(total_cost)\n", "optimal_mile = miles[optimal_idx]\n", "lowest_cost = total_cost[optimal_idx]\n", "\n", "# 5. Build the Interactive Dashboard\n", "fig = go.Figure()\n", "\n", "# Plot the Accounting Depreciation Curve\n", "fig.add_trace(go.Scatter(x=miles, y=book_value, mode='lines', name='Shoe Book Value ($)', line=dict(color='cyan', width=3)))\n", "\n", "# Plot the Economic Risk Curve\n", "fig.add_trace(go.Scatter(x=miles, y=injury_risk_cost, mode='lines', name='Injury Risk Cost ($)', line=dict(color='red', width=3)))\n", "\n", "# Plot the Total Cost Curve\n", "fig.add_trace(go.Scatter(x=miles, y=total_cost, mode='lines', name='Total Economic Cost ($)', line=dict(color='gold', width=4, dash='dot')))\n", "\n", "# Highlight the Optimal Replacement Point\n", "fig.add_annotation(\n", " x=optimal_mile, y=lowest_cost,\n", " text=f\"Optimal Replacement: {optimal_mile} Miles\",\n", " showarrow=True, arrowhead=2, arrowsize=1.5, arrowcolor=\"white\",\n", " font=dict(size=14, color=\"black\"), bgcolor=\"gold\", bordercolor=\"white\", borderwidth=2\n", ")\n", "\n", "# Style for presentation\n", "fig.update_layout(\n", " title=\"Athletic Asset Management: When to Replace Your Running Shoes\",\n", " xaxis_title=\"Cumulative Mileage\",\n", " yaxis_title=\"Financial Cost ($)\",\n", " template=\"plotly_dark\",\n", " hovermode=\"x unified\",\n", " legend=dict(yanchor=\"top\", y=0.99, xanchor=\"left\", x=0.01)\n", ")\n", "\n", "fig.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }