Mastering Data Formats

Predictive Analytics & Data Mining (GMBA 621)

1. Wide Format: The "At-a-Glance" View

Each subject gets one row. All attributes (like sales over several quarters) are spread out into separate columns. This is the format you will use for most predictive modeling.

CustomerID Q1_Sales Q2_Sales Q3_Sales
C-101 $500 $450 $600
C-102 $200 $220 $210

2. Long Format: The "Deep Dive" View

Each row represents a single observation. If you track a customer for three quarters, they will have three separate rows. This is standard for time-series and big data storage.

CustomerID Time_Period Sales_Amount
C-101 Q1 $500
C-101 Q2 $450
C-101 Q3 $600
C-102 Q1 $200

3. Industry Application Cheat Sheet

Industry Use Wide Format When... Use Long Format When...
Finance Building credit scores (current profile). Analyzing historical stock market trends.
Healthcare Comparing patient vitals in one chart. Tracking heart rate over time (24h study).
Marketing Customer segmentation & churn prediction. Analyzing clickstreams or social media feeds.

4. Model Compatibility Guide

Use this table as a guide to structure your data correctly before starting your modeling assignments.

Technique Required Format Course Application Why?
Linear & Multiple Regression Wide Project 2 Competition Requires one row per subject with all predictors as columns.
Logistic Regression Wide Exam 1 Practical Predicts binary outcomes based on multiple feature columns.
Decision Trees & Random Forests Wide Week 6-7 Topics Splits data points based on features in different columns.
Cluster Analysis (K-Means) Wide Week 9-10 Project Groups rows based on coordinates across many columns.
Time Series Forecasting Long Week 14 Topics Needs a sequential "Time" column to find patterns over time.
Text Analytics Long Project 3 (Text) Unstructured data (reviews/comments) usually starts row-by-row.