Lesson 5: Data Is the Hardest Problem

You can buy chips. You can rent compute. But the thing that actually makes an AI yours — and good — is data. Jensen says it plainly, and it's the most career-relevant idea in the whole keynote for a business-and-tech person.

"The biggest problem is data."

— Jensen Huang, NVIDIA GTC Taipei 2026

Why the right data is so scarce

Language models had it easy: the internet is full of human-written text. But teaching a robot to move is different — you need data from the robot's point of view, and that barely exists:

"In order to create data for AI robotics, it has to be from the perspective of the robot… most of the world's video data is from a third person, not first person."

— Jensen Huang

The lesson generalizes far beyond robots: the data you need is usually specific, and often doesn't exist yet. Generic data is everywhere; the right data for your problem is rare. That scarcity is exactly what makes data valuable.

Memory and storage get reinvented

Agents have to remember and retrieve information constantly — fast. Jensen says that need will overhaul how we store data:

"The memory system of AI is going to cause the storage system to be completely revolutionized."

— Jensen Huang

He gets into the weeds — working memory, what to remember, how to retrieve, whether data is structured or unstructured, and the ontology (how all the pieces of data relate to each other). You don't need the engineering, but notice the words: this is data management and knowledge representation — squarely your field.

When you can't collect data, generate it

Here's the clever twist. NVIDIA's Cosmos model creates synthetic data — realistic, physics-accurate scenarios to train robots that would be impossible to film in the real world. Jensen sums up the shift:

"Text data plus compute gives you AI. Now that we have AI, compute is data."

— Jensen Huang

Read that as: we used to feed data into computers to get AI. Now we can use AI + compute to manufacture the data we're missing. Synthetic data is powerful — and it comes with its own cautions (is it realistic? does it bake in bias? can you trust it?). Knowing when synthetic data helps versus misleads is a real judgment call.

The throughline: chips are buyable; the right data is scarce and often proprietary. That's why, for most organizations, data — not hardware — is the real competitive moat.

This is the most "MIS" lesson in the module. Data management, data quality, data governance, knowledge representation (ontologies), and data as a strategic asset are core to your field — and this keynote shows they're now the bottleneck for the most advanced AI on Earth. Two takeaways: (1) An AI is only as good as its data — "garbage in, garbage out" decides real projects. (2) A company's proprietary data (its customers, transactions, processes) is often its most defensible advantage in the AI era. The person who can find, clean, organize, and govern that data is indispensable.

Quick check

Why is data "the hardest problem" for training robots?

The internet gave language models endless human-perspective text. Robots need first-person, physical-world data that mostly doesn't exist — so NVIDIA generates it synthetically with Cosmos ("compute is data").

🧠 Think like an MIS analyst

"We want AI, but our data is a mess"

A company is excited to build an AI feature, but their customer data is scattered across spreadsheets, half-empty fields, and three systems that don't talk. Leadership wants to "just add AI." Where do you point them first?

You gently move the conversation upstream: the AI will only be as good as the data feeding it, so data strategy comes before the model. First, an honest audit — what data do we have, how clean is it, where does it live, who owns it, and is it allowed to be used (privacy/governance)? Then fix the foundations: consolidate, clean, fill the gaps, define how the pieces relate. Where real data is missing, consider carefully whether synthetic or third-party data can responsibly fill it. Only then layer on AI. Skipping the data work is the #1 way AI projects quietly fail.

✍️ Your turn

Think of a company you admire. What proprietary data do they have that competitors can't easily copy — and why is it valuable?

Synthetic data can fill gaps but also mislead. Where do you think generating data is smart, and where is it risky?

Key takeaway: Hardware is buyable; the right data is scarce, specific, and often proprietary — which makes data the real moat. Memory, retrieval, and data management are being reinvented, and synthetic data ("compute is data") can fill gaps with care. An AI is only as good as its data, and organizing that data is core MIS work.

← Previous4 · Platforms & Ecosystems Next →6 · The Agentic Enterprise