Predictive Analytics & Data Mining (GMBA 621)
Each subject gets one row. All attributes (like sales over several quarters) are spread out into separate columns. This is the format you will use for most predictive modeling.
| CustomerID | Q1_Sales | Q2_Sales | Q3_Sales |
|---|---|---|---|
| C-101 | $500 | $450 | $600 |
| C-102 | $200 | $220 | $210 |
Each row represents a single observation. If you track a customer for three quarters, they will have three separate rows. This is standard for time-series and big data storage.
| CustomerID | Time_Period | Sales_Amount |
|---|---|---|
| C-101 | Q1 | $500 |
| C-101 | Q2 | $450 |
| C-101 | Q3 | $600 |
| C-102 | Q1 | $200 |
| Industry | Use Wide Format When... | Use Long Format When... |
|---|---|---|
| Finance | Building credit scores (current profile). | Analyzing historical stock market trends. |
| Healthcare | Comparing patient vitals in one chart. | Tracking heart rate over time (24h study). |
| Marketing | Customer segmentation & churn prediction. | Analyzing clickstreams or social media feeds. |
Use this table as a guide to structure your data correctly before starting your modeling assignments.
| Technique | Required Format | Course Application | Why? |
|---|---|---|---|
| Linear & Multiple Regression | Wide | Project 2 Competition | Requires one row per subject with all predictors as columns. |
| Logistic Regression | Wide | Exam 1 Practical | Predicts binary outcomes based on multiple feature columns. |
| Decision Trees & Random Forests | Wide | Week 6-7 Topics | Splits data points based on features in different columns. |
| Cluster Analysis (K-Means) | Wide | Week 9-10 Project | Groups rows based on coordinates across many columns. |
| Time Series Forecasting | Long | Week 14 Topics | Needs a sequential "Time" column to find patterns over time. |
| Text Analytics | Long | Project 3 (Text) | Unstructured data (reviews/comments) usually starts row-by-row. |