{ "cells": [ { "cell_type": "markdown", "id": "10fed78c", "metadata": {}, "source": [ "# šŸƒ Anomaly Detection in Sports: Athlete Performance & Injury Risk Monitoring\n", "\n", "## Course: DATA 110 — Introduction to Data Science\n", "### School of Data Science and Society, UNC Chapel Hill\n", "\n", "---\n", "\n", "**Scenario:** You are a sports data analyst for a collegiate athletics program. Coaches rely on you to monitor 20 athletes across 5 sports over a 30-day training period. Your goal is to detect:\n", "\n", "- **Overtraining** — athletes pushing too hard, risking burnout\n", "- **Injury risk** — declining performance + poor recovery signals \n", "- **Peak performance** — unusually high output (positive anomaly!)\n", "\n", "You will use both **supervised** and **unsupervised** anomaly detection to flag athletes who need attention.\n", "\n", "---\n", "\n", "### šŸ“‹ What You'll Learn\n", "- Exploring sports performance data with visualizations\n", "- **Supervised:** Random Forest to classify known anomaly patterns\n", "- **Unsupervised:** Isolation Forest and Local Outlier Factor (LOF) to discover anomalies without labels\n", "- Comparing methods and understanding trade-offs\n", "- Real-world applications in sports analytics\n", "\n", "### šŸ“¦ Dataset\n", "- **600 daily records** (20 athletes Ɨ 30 days)\n", "- 8 performance/wellness metrics per record\n", "- **540 normal** records and **60 anomalies** (overtraining, injury risk, peak performance)\n" ] }, { "cell_type": "markdown", "id": "84fa77fd", "metadata": {}, "source": [ "## Step 1: Setup — Import Libraries and Load Data\n" ] }, { "cell_type": "code", "execution_count": null, "id": "8c91fef6", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# IMPORT LIBRARIES\n", "# ============================================================\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "# Machine Learning\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.ensemble import RandomForestClassifier, IsolationForest\n", "from sklearn.neighbors import LocalOutlierFactor\n", "from sklearn.metrics import (classification_report, confusion_matrix,\n", " accuracy_score, ConfusionMatrixDisplay,\n", " precision_score, recall_score, f1_score)\n", "\n", "plt.style.use('seaborn-v0_8-whitegrid')\n", "sns.set_palette(\"husl\")\n", "plt.rcParams['figure.figsize'] = (10, 6)\n", "plt.rcParams['font.size'] = 12\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "print(\"āœ… All libraries loaded successfully!\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c64331ec", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# LOAD THE DATASET\n", "# ============================================================\n", "# Update the path to wherever you saved the Excel file\n", "df = pd.read_excel('sports_athlete_anomaly_data.xlsx', sheet_name='Athlete_Performance')\n", "\n", "print(f\"Dataset shape: {df.shape[0]} rows Ɨ {df.shape[1]} columns\")\n", "print(f\"\\nFirst 5 rows:\")\n", "df.head()\n" ] }, { "cell_type": "markdown", "id": "99d75ac7", "metadata": {}, "source": [ "## Step 2: Exploratory Data Analysis (EDA)\n", "\n", "### 2.1 Understand the Data Structure\n" ] }, { "cell_type": "code", "execution_count": null, "id": "abfe5490", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# DATASET OVERVIEW\n", "# ============================================================\n", "print(\"=\" * 60)\n", "print(\"DATASET OVERVIEW\")\n", "print(\"=\" * 60)\n", "\n", "print(f\"\\nTotal records: {len(df)}\")\n", "print(f\"Unique athletes: {df['Athlete_ID'].nunique()}\")\n", "print(f\"Sports: {', '.join(df['Sport'].unique())}\")\n", "print(f\"Date range: {df['Date'].min().date()} to {df['Date'].max().date()}\")\n", "\n", "print(f\"\\nLabel distribution:\")\n", "print(df['Anomaly_Label'].value_counts())\n", "print(f\"\\nAnomaly percentage: {(df['Anomaly_Label'] != 'Normal').mean() * 100:.1f}%\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "64f85baa", "metadata": {}, "outputs": [], "source": [ "# Define our feature columns (the performance metrics)\n", "feature_cols = ['Training_Load_AU', 'Sprint_Speed_km_h', 'Resting_Heart_Rate_bpm',\n", " 'Recovery_Heart_Rate_bpm', 'Sleep_Hours', 'Hydration_Level_pct',\n", " 'Perceived_Exertion_1_10', 'Performance_Score_0_100']\n", "\n", "df[feature_cols].describe().round(2)\n" ] }, { "cell_type": "markdown", "id": "b1182f0f", "metadata": {}, "source": [ "### 2.2 Visualize Performance Metrics\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a367c153", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# DISTRIBUTION PLOTS: Compare Normal vs Anomaly types\n", "# ============================================================\n", "fig, axes = plt.subplots(2, 4, figsize=(22, 10))\n", "axes = axes.flatten()\n", "\n", "for i, col in enumerate(feature_cols):\n", " for label in df['Anomaly_Label'].unique():\n", " subset = df[df['Anomaly_Label'] == label][col]\n", " axes[i].hist(subset, bins=20, alpha=0.5, label=label, density=True)\n", " axes[i].set_title(col.replace('_', ' '), fontsize=10, fontweight='bold')\n", " axes[i].set_ylabel('Density')\n", "\n", "# Single legend for all subplots\n", "handles, labels = axes[0].get_legend_handles_labels()\n", "fig.legend(handles, labels, loc='lower center', ncol=4, fontsize=10, \n", " bbox_to_anchor=(0.5, -0.02))\n", "plt.suptitle('Performance Metric Distributions by Status', fontsize=16, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(\"šŸ’” OBSERVE: How do overtraining and injury risk patterns differ from normal?\")\n", "print(\" Notice that peak performance anomalies are POSITIVE outliers!\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "0ce3c47e", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# SCATTER: Training Load vs Performance Score\n", "# ============================================================\n", "plt.figure(figsize=(10, 7))\n", "\n", "colors = {'Normal': '#4CAF50', 'Anomaly - Overtraining': '#F44336',\n", " 'Anomaly - Injury Risk': '#FF9800', 'Anomaly - Peak Performance': '#2196F3'}\n", "\n", "for label, color in colors.items():\n", " mask = df['Anomaly_Label'] == label\n", " plt.scatter(df.loc[mask, 'Training_Load_AU'],\n", " df.loc[mask, 'Performance_Score_0_100'],\n", " c=color, label=label, alpha=0.6, s=60, edgecolors='white')\n", "\n", "plt.xlabel('Training Load (AU)', fontsize=12)\n", "plt.ylabel('Performance Score (0-100)', fontsize=12)\n", "plt.title('Training Load vs. Performance Score', fontsize=14, fontweight='bold')\n", "plt.legend(fontsize=10, loc='upper left')\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(\"šŸ’” KEY INSIGHT:\")\n", "print(\" - Overtraining: HIGH load, LOW performance (burning out)\")\n", "print(\" - Injury Risk: HIGH load, LOW performance + poor sprint speed\")\n", "print(\" - Peak Performance: MODERATE load, VERY HIGH performance (in the zone!)\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d4744154", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# RADAR CHART: Average profile for each group\n", "# ============================================================\n", "from math import pi\n", "\n", "# Calculate mean for each group\n", "groups = df.groupby('Anomaly_Label')[feature_cols].mean()\n", "\n", "# Normalize to 0-1 scale for radar chart\n", "groups_norm = (groups - groups.min()) / (groups.max() - groups.min())\n", "\n", "categories = [c.replace('_', '\\n') for c in feature_cols]\n", "N = len(categories)\n", "angles = [n / float(N) * 2 * pi for n in range(N)]\n", "angles += angles[:1]\n", "\n", "fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))\n", "\n", "for label, color in colors.items():\n", " values = groups_norm.loc[label].values.tolist()\n", " values += values[:1]\n", " ax.plot(angles, values, 'o-', linewidth=2, label=label, color=color)\n", " ax.fill(angles, values, alpha=0.1, color=color)\n", "\n", "ax.set_xticks(angles[:-1])\n", "ax.set_xticklabels(categories, fontsize=9)\n", "ax.set_ylim(0, 1)\n", "ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1), fontsize=10)\n", "plt.title('Average Profile by Group (Normalized)', fontsize=14, fontweight='bold', y=1.08)\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(\"šŸ’” The radar chart shows each group's 'fingerprint' across all metrics.\")\n", "print(\" Different anomaly types have distinctly different shapes!\")\n" ] }, { "cell_type": "markdown", "id": "29e6e62d", "metadata": {}, "source": [ "---\n", "\n", "## Step 3: SUPERVISED Anomaly Detection — Random Forest\n", "\n", "Since we have labeled data, we can train a classifier. This time we'll do **multi-class** classification to distinguish between the three types of anomalies.\n", "\n", "### 3.1 Prepare Data\n" ] }, { "cell_type": "code", "execution_count": null, "id": "4b7b17ce", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# PREPARE DATA\n", "# ============================================================\n", "X = df[feature_cols].copy()\n", "\n", "# Binary labels for overall anomaly detection\n", "y_binary = (df['Anomaly_Label'] != 'Normal').astype(int)\n", "\n", "# Multi-class labels for specific anomaly type detection\n", "y_multi = df['Anomaly_Label'].copy()\n", "\n", "# Split data\n", "X_train, X_test, yb_train, yb_test, ym_train, ym_test = train_test_split(\n", " X, y_binary, y_multi, test_size=0.30, random_state=42, stratify=y_binary\n", ")\n", "\n", "# Scale features\n", "scaler = StandardScaler()\n", "X_train_scaled = scaler.fit_transform(X_train)\n", "X_test_scaled = scaler.transform(X_test)\n", "\n", "print(f\"Training set: {X_train.shape[0]} records\")\n", "print(f\"Test set: {X_test.shape[0]} records\")\n" ] }, { "cell_type": "markdown", "id": "3c3c8552", "metadata": {}, "source": [ "### 3.2 Train and Evaluate — Binary Classification (Normal vs Anomaly)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1f017e72", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# BINARY RANDOM FOREST: Normal vs Anomaly\n", "# ============================================================\n", "rf_binary = RandomForestClassifier(\n", " n_estimators=100, max_depth=10, random_state=42, class_weight='balanced'\n", ")\n", "rf_binary.fit(X_train_scaled, yb_train)\n", "yb_pred = rf_binary.predict(X_test_scaled)\n", "\n", "print(\"BINARY CLASSIFICATION: Normal vs. Anomaly\")\n", "print(\"=\" * 50)\n", "print(classification_report(yb_test, yb_pred, target_names=['Normal', 'Anomaly']))\n", "\n", "fig, ax = plt.subplots(figsize=(8, 6))\n", "ConfusionMatrixDisplay(confusion_matrix(yb_test, yb_pred),\n", " display_labels=['Normal', 'Anomaly']).plot(cmap='Blues', ax=ax)\n", "plt.title('Binary Classification — Confusion Matrix', fontsize=14, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "7987354c", "metadata": {}, "source": [ "### 3.3 Train and Evaluate — Multi-Class (Specific Anomaly Types)\n", "\n", "Can the model tell us **what kind** of anomaly it is?\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a613a374", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# MULTI-CLASS RANDOM FOREST: Predict specific anomaly type\n", "# ============================================================\n", "rf_multi = RandomForestClassifier(\n", " n_estimators=100, max_depth=12, random_state=42, class_weight='balanced'\n", ")\n", "rf_multi.fit(X_train_scaled, ym_train)\n", "ym_pred = rf_multi.predict(X_test_scaled)\n", "\n", "print(\"MULTI-CLASS CLASSIFICATION: Specific Anomaly Types\")\n", "print(\"=\" * 60)\n", "print(classification_report(ym_test, ym_pred))\n", "\n", "fig, ax = plt.subplots(figsize=(10, 8))\n", "cm_multi = confusion_matrix(ym_test, ym_pred, labels=rf_multi.classes_)\n", "disp = ConfusionMatrixDisplay(cm_multi, display_labels=rf_multi.classes_)\n", "disp.plot(cmap='Blues', ax=ax, xticks_rotation=30)\n", "plt.title('Multi-Class Classification — Confusion Matrix', fontsize=14, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(\"šŸ’” The multi-class model can identify the TYPE of anomaly,\")\n", "print(\" which helps coaches take the right corrective action!\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7c893e68", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# FEATURE IMPORTANCE\n", "# ============================================================\n", "importance = pd.Series(rf_binary.feature_importances_, index=feature_cols)\n", "importance = importance.sort_values(ascending=True)\n", "\n", "plt.figure(figsize=(10, 6))\n", "colors_bar = ['#4CAF50' if v < importance.median() else '#1976D2' for v in importance.values]\n", "importance.plot(kind='barh', color=colors_bar, edgecolor='white')\n", "plt.xlabel('Importance Score', fontsize=12)\n", "plt.title('Which Metrics Are Most Important for Detecting Anomalies?', \n", " fontsize=14, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(f\"\\nšŸ’” Top 3 most important features: {', '.join(importance.index[-3:][::-1])}\")\n", "print(\" Coaches should pay closest attention to these metrics!\")\n" ] }, { "cell_type": "markdown", "id": "c5bcbdd0", "metadata": {}, "source": [ "---\n", "\n", "## Step 4: UNSUPERVISED Anomaly Detection\n", "\n", "Now let's **pretend we don't have labels**. Can we still find anomalies?\n", "\n", "We'll use:\n", "1. **Isolation Forest** — isolates anomalies as points that are easy to separate\n", "2. **Local Outlier Factor (LOF)** — compares each point's density to its neighbors\n", "\n", "---\n", "\n", "### 4.1 Isolation Forest\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6829697a", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# ISOLATION FOREST\n", "# ============================================================\n", "X_all_scaled = scaler.fit_transform(X)\n", "\n", "iso_forest = IsolationForest(\n", " n_estimators=100,\n", " contamination=0.10, # Expect ~10% anomalies\n", " random_state=42\n", ")\n", "\n", "iso_predictions = iso_forest.fit_predict(X_all_scaled)\n", "iso_labels = (iso_predictions == -1).astype(int)\n", "\n", "print(\"ISOLATION FOREST RESULTS\")\n", "print(\"=\" * 50)\n", "print(f\"Predicted Normal: {(iso_labels == 0).sum()}\")\n", "print(f\"Predicted Anomaly: {(iso_labels == 1).sum()}\")\n", "print()\n", "print(classification_report(y_binary, iso_labels, target_names=['Normal', 'Anomaly']))\n", "\n", "fig, ax = plt.subplots(figsize=(8, 6))\n", "ConfusionMatrixDisplay(confusion_matrix(y_binary, iso_labels),\n", " display_labels=['Normal', 'Anomaly']).plot(cmap='Oranges', ax=ax)\n", "plt.title('Isolation Forest (Unsupervised) — Confusion Matrix', fontsize=14, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "dd051830", "metadata": {}, "source": [ "### 4.2 Local Outlier Factor (LOF)\n", "\n", "**How it works:** LOF measures the local density around each data point compared to its neighbors. If a point's density is much lower than its neighbors, it's considered an outlier.\n", "\n", "**Sports analogy:** If most athletes at a similar training load have similar performance scores, but one athlete's performance is dramatically different, LOF will flag that athlete.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e774bb29", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# LOCAL OUTLIER FACTOR\n", "# ============================================================\n", "lof = LocalOutlierFactor(\n", " n_neighbors=20, # Number of neighbors to consider\n", " contamination=0.10 # Expected proportion of outliers\n", ")\n", "\n", "lof_predictions = lof.fit_predict(X_all_scaled)\n", "lof_labels = (lof_predictions == -1).astype(int)\n", "\n", "print(\"LOCAL OUTLIER FACTOR RESULTS\")\n", "print(\"=\" * 50)\n", "print(f\"Predicted Normal: {(lof_labels == 0).sum()}\")\n", "print(f\"Predicted Anomaly: {(lof_labels == 1).sum()}\")\n", "print()\n", "print(classification_report(y_binary, lof_labels, target_names=['Normal', 'Anomaly']))\n", "\n", "fig, ax = plt.subplots(figsize=(8, 6))\n", "ConfusionMatrixDisplay(confusion_matrix(y_binary, lof_labels),\n", " display_labels=['Normal', 'Anomaly']).plot(cmap='Purples', ax=ax)\n", "plt.title('Local Outlier Factor (Unsupervised) — Confusion Matrix', fontsize=14, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "03a4b774", "metadata": {}, "source": [ "---\n", "\n", "## Step 5: Compare All Three Methods\n" ] }, { "cell_type": "code", "execution_count": null, "id": "36e31ca5", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# SIDE-BY-SIDE COMPARISON\n", "# ============================================================\n", "rf_full_pred = rf_binary.predict(scaler.fit_transform(X))\n", "\n", "methods = ['Random Forest\\n(Supervised)', 'Isolation Forest\\n(Unsupervised)', 'LOF\\n(Unsupervised)']\n", "predictions = [rf_full_pred, iso_labels, lof_labels]\n", "\n", "metrics = {'Accuracy': [], 'Precision': [], 'Recall': [], 'F1 Score': []}\n", "for pred in predictions:\n", " metrics['Accuracy'].append(accuracy_score(y_binary, pred))\n", " metrics['Precision'].append(precision_score(y_binary, pred, zero_division=0))\n", " metrics['Recall'].append(recall_score(y_binary, pred, zero_division=0))\n", " metrics['F1 Score'].append(f1_score(y_binary, pred, zero_division=0))\n", "\n", "fig, axes = plt.subplots(1, 4, figsize=(20, 5))\n", "bar_colors = ['#1976D2', '#F57C00', '#7B1FA2']\n", "\n", "for i, (metric_name, values) in enumerate(metrics.items()):\n", " bars = axes[i].bar(methods, values, color=bar_colors, edgecolor='white', linewidth=1.5)\n", " axes[i].set_title(metric_name, fontsize=14, fontweight='bold')\n", " axes[i].set_ylim(0, 1.05)\n", " axes[i].axhline(y=0.5, color='gray', linestyle='--', alpha=0.3)\n", " for bar, val in zip(bars, values):\n", " axes[i].text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.02,\n", " f'{val:.2f}', ha='center', fontsize=12, fontweight='bold')\n", "\n", "plt.suptitle('Model Comparison: Supervised vs. Unsupervised Anomaly Detection',\n", " fontsize=16, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a55af6e5", "metadata": {}, "outputs": [], "source": [ "# ============================================================\n", "# BONUS: Which specific anomaly types did each method catch?\n", "# ============================================================\n", "results_df = df[['Athlete_Name', 'Sport', 'Anomaly_Label']].copy()\n", "results_df['RF_Detected'] = rf_full_pred\n", "results_df['IsoForest_Detected'] = iso_labels\n", "results_df['LOF_Detected'] = lof_labels\n", "\n", "# Look at actual anomalies — how many did each method catch?\n", "actual_anomalies = results_df[results_df['Anomaly_Label'] != 'Normal'].copy()\n", "\n", "print(\"DETECTION RATE BY ANOMALY TYPE\")\n", "print(\"=\" * 70)\n", "for anomaly_type in actual_anomalies['Anomaly_Label'].unique():\n", " subset = actual_anomalies[actual_anomalies['Anomaly_Label'] == anomaly_type]\n", " n = len(subset)\n", " rf_caught = subset['RF_Detected'].sum()\n", " iso_caught = subset['IsoForest_Detected'].sum()\n", " lof_caught = subset['LOF_Detected'].sum()\n", " print(f\"\\n{anomaly_type} ({n} cases):\")\n", " print(f\" Random Forest: {rf_caught}/{n} caught ({rf_caught/n*100:.0f}%)\")\n", " print(f\" Isolation Forest: {iso_caught}/{n} caught ({iso_caught/n*100:.0f}%)\")\n", " print(f\" LOF: {lof_caught}/{n} caught ({lof_caught/n*100:.0f}%)\")\n" ] }, { "cell_type": "markdown", "id": "492a855b", "metadata": {}, "source": [ "---\n", "\n", "## Step 6: Key Takeaways\n", "\n", "### šŸ“Š Method Comparison Summary\n", "\n", "| Method | Type | Strengths | Weaknesses | Best For |\n", "|--------|------|-----------|------------|----------|\n", "| **Random Forest** | Supervised | Highest accuracy; identifies anomaly *type* | Needs labeled data; can't find new anomaly types | Known patterns (overtraining protocols) |\n", "| **Isolation Forest** | Unsupervised | No labels needed; fast; good with high-dimensional data | Can't tell you *why* something is anomalous | General monitoring; new datasets |\n", "| **LOF** | Unsupervised | Detects local outliers; good for clustered data | Sensitive to parameters; slower on large data | Comparing athletes within groups |\n", "\n", "### šŸƒ Real-World Sports Applications\n", "- **Pre-season screening:** Use unsupervised methods on baseline data to establish each athlete's \"normal\" profile\n", "- **In-season monitoring:** Use supervised models trained on historical injury data to flag at-risk athletes daily \n", "- **Recovery tracking:** Detect when recovery patterns deviate from an athlete's personal baseline\n", "- **Talent scouting:** Peak performance anomalies can identify standout performers\n", "\n", "### šŸ¤” Discussion Questions\n", "1. An athlete shows as \"anomalous\" but they feel fine. Should the coach still intervene? Why or why not?\n", "2. How might the **contamination** parameter in Isolation Forest change results? What if we set it too high or too low?\n", "3. In sports, some anomalies are *good* (peak performance). How should we handle positive vs. negative anomalies differently?\n", "4. If you could add one more metric to the dataset, what would it be and why?\n", "\n", "---\n", "*Notebook created for DATA 110 — Introduction to Data Science* \n", "*School of Data Science and Society, UNC Chapel Hill*\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }