Decision 4 · Week 10

"Who are they?"

Nobody labeled your customers for you. Clustering finds the natural groups in the data — the segments that drive your marketing, your product, and your pricing.

K-Means Hierarchical Unsupervised

Why this decision matters

This is your first unsupervised technique — there's no "right answer" the model is trying to match. That makes it powerful (you discover structure you didn't know was there) and dangerous (any algorithm will find some clusters, even in random noise). Knowing how to choose the number of clusters and validate that they're meaningful is the hard part.

By the end of this topic you'll be able to

Choose between K-means and hierarchical clustering for a given problem; pick a defensible number of clusters; profile and name the resulting segments; explain why two analysts running the same algorithm might disagree; turn clusters into a marketing-ready segmentation.

Materials

Key concepts to know
  • K-means — fast, simple, requires you to pick K upfront. Sensitive to scale and outliers.
  • Hierarchical clustering — builds a tree; you cut it where it makes sense. No K needed in advance.
  • Distance metrics — Euclidean is default but not always right. Always standardize first.
  • Elbow method & silhouette score — two heuristics for choosing K. Use both, plus business judgment.
  • Variable selection — what you cluster on determines what clusters you find. Garbage in, weird segments out.
  • Cluster profiling — once you have clusters, you have to describe them in words a marketer can act on.
Slides & lecture decks
Readings & class notes
Hands-on: clustering in Excel

Start in Excel to see the mechanics with your own eyes before letting Python or SAS do it.

Hands-on: clustering in Python
Hands-on: clustering in SAS Viya

Same problem, enterprise tool. Visual Analytics gives you point-and-click clustering with governed data lineage.

Stay Ahead of the Curve

Subscribe to our bi-weekly newsletter for the latest insights on AI, data, and business strategy.