⚡ Operational Systems (OLTP)
This is where data is born. Every time a customer swipes a card, places an order, or creates an account, that transaction happens in an operational system.
- Optimized for speed—processing thousands of transactions per second
- Examples: Cash registers, CRM systems, order management, inventory systems
- NOT designed for analysis—querying these systems slows down operations
🌊 Data Lake
A massive storage pool that accepts data in any format—structured, unstructured, or semi-structured. Think of it as a "dump everything here" approach.
- Stores raw data in original format (videos, logs, JSON, images)
- Cheap storage for massive volumes
- Data scientists can explore and experiment
- Risk: Can become a "data swamp" without organization
🏛️ Data Warehouse
The clean, organized home for analysis-ready data. Data is structured, validated, and optimized for business intelligence queries.
- Optimized for complex analytical queries (OLAP)
- Data is cleaned, transformed, and well-documented
- Powers dashboards, reports, and business decisions
- Examples: Snowflake, BigQuery, Redshift
🔄 ETL Pipeline
Extract, Transform, Load—the journey data takes from source to destination.
- Extract: Pull data from operational systems
- Transform: Clean, validate, reshape the data
- Load: Put it in the warehouse or lake
- The "plumbing" of analytics—invisible but essential
🎯 The Key Insight
Data flows from where it's created (operational systems) to where it's analyzed (warehouses). The ETL pipeline is the bridge. The lake is the flexible storage for everything that doesn't fit neatly.