Decision 4 — Clustering

Why this decision matters

This is your first unsupervised technique — there's no "right answer" the model is trying to match. That makes it powerful (you discover structure you didn't know was there) and dangerous (any algorithm will find some clusters, even in random noise). Knowing how to choose the number of clusters and validate that they're meaningful is the hard part.

By the end of this topic you'll be able to

Choose between K-means and hierarchical clustering for a given problem; pick a defensible number of clusters; profile and name the resulting segments; explain why two analysts running the same algorithm might disagree; turn clusters into a marketing-ready segmentation.

Materials

Key concepts to know

K-means — fast, simple, requires you to pick K upfront. Sensitive to scale and outliers.
Hierarchical clustering — builds a tree; you cut it where it makes sense. No K needed in advance.
Distance metrics — Euclidean is default but not always right. Always standardize first.
Elbow method & silhouette score — two heuristics for choosing K. Use both, plus business judgment.
Variable selection — what you cluster on determines what clusters you find. Garbage in, weird segments out.
Cluster profiling — once you have clusters, you have to describe them in words a marketer can act on.

Slides & lecture decks

Unsupervised Learning — Lecture DeckThe big picture: where unsupervised fits, when to use clustering vs. other unsupervised methods.
Cluster Analysis — In-Class ActivitySlides for the Week 10 hands-on session.

Readings & class notes

Unsupervised Learning — Class NotesSelf-contained explainer of clustering and dimensionality reduction.
Cluster Analysis — Deep DiveDetailed walkthrough of K-means, hierarchical methods, and validation.
Variable Selection for Clustering — DemoHow the choice of variables changes the clusters you find.

Hands-on: clustering in Excel

Start in Excel to see the mechanics with your own eyes before letting Python or SAS do it.

Clustering in Excel — Demo WorkbookStep-by-step K-means worked out in Excel formulas. No black box.
BrewRight Customer DataPractice dataset for the in-class activity — segment a brewery's customer base.

Hands-on: clustering in Python

Bank Customer Segmentation — Demo NotebookFull pipeline: scale, cluster, profile, name. The reference walkthrough.
Bank Customer DatasetDataset used in the demo above.

Hands-on: clustering in SAS Viya

Same problem, enterprise tool. Visual Analytics gives you point-and-click clustering with governed data lineage.

SAS VA Clustering Lab — Bank CustomersLab instructions: run K-means in SAS Visual Analytics.
SAS VA Clustering Options — Reference GuideWhat each clustering parameter actually does, with screenshots.

"Who are they?"

Why this decision matters

By the end of this topic you'll be able to

Materials

Stay Ahead of the Curve