Every basic model data mining/data analytics course covers โ what it outputs, what gets deployed, and how it reaches production
| Model / Technique | Type | What's the "Model"? | What Gets Deployed? | Deployment Vehicle | How You Evaluate It | When to Retrain |
|---|---|---|---|---|---|---|
| SUPERVISED LEARNING โ Predict a Known Target | ||||||
Linear Regression Predict a continuous number |
Supervised | Coefficients (ฮฒโ, ฮฒโ, ฮฒโโฆ) that form a prediction equation: ลท = ฮฒโ + ฮฒโxโ + ฮฒโxโ + โฆ | The coefficient table โ literally a list of weights. Multiply each input by its weight, sum them up, done. | Excel SUMPRODUCT REST API Batch SQL |
Rยฒ, Adjusted Rยฒ, RMSE, MAE, residual plots, p-values | When Rยฒ drops on new data, or residual patterns appear โ suggests relationships have shifted |
Logistic Regression Predict yes/no probability |
Supervised | Coefficients + intercept that output a probability via the sigmoid function: P = 1/(1+e^-(ฮฒโ+ฮฒโxโ+โฆ)) | Coefficient table + chosen threshold. Score = probability. Decision = "if P > threshold, then yes." | Excel formula Real-time API Batch scoring |
AUC, accuracy, precision, recall, F1, confusion matrix, lift chart | When AUC drops below business threshold, or class distribution shifts (concept drift) |
Decision Tree If-then rules for prediction |
Supervised | A set of IF-THEN-ELSE rules that split on feature thresholds โ the full tree structure | The rule set. Each rule is a nested IF statement. Can be written out as plain English or Excel IF() chains. | Nested IF() formulas REST API Batch SQL CASE |
Accuracy, precision, recall, confusion matrix, feature importance, pruning validation | When accuracy degrades, or when new categories appear that the tree never saw |
Random Forest Ensemble of many trees |
Supervised | Hundreds of decision trees that each vote โ the majority wins. The forest = collection of trees | The full model object (pickle/ONNX). Too complex for Excel. Deploy as API or use a surrogate decision tree. | REST API Batch scoring โ ๏ธ Not practical in native Excel |
AUC, accuracy, precision, recall, OOB error, feature importance, cross-validation | When OOB error increases, or feature importances shift significantly |
Gradient Boosting / XGBoost Sequential error-correcting trees |
Supervised | A sequence of small trees, each fixing the previous tree's mistakes. Final score = sum of all tree outputs | The model object (pickle/PMML/ONNX). Like Random Forest โ needs an API or in-database runtime. | REST API Batch scoring โ ๏ธ Not practical in native Excel |
AUC, log-loss, RMSE, learning curves, SHAP values for explainability | When validation metrics degrade, or when retraining on recent data significantly changes predictions |
Neural Network Layers of weighted connections |
Supervised | Weight matrices + bias vectors for each layer, plus activation functions. A computational graph | The trained model file (H5, SavedModel, ONNX). Requires a runtime environment to execute. | REST API (TF Serving) Batch inference โ ๏ธ Black box โ pair with SHAP |
Accuracy, AUC, loss curves, confusion matrix โ same as logistic but less interpretable | When performance degrades, or when data distribution shifts (monitor input distributions) |
SVM Support Vector Machine |
Supervised | Support vectors + hyperplane equation that maximizes the margin between classes | The support vectors and kernel parameters. For linear SVM: equivalent to a coefficient table. | REST API Batch scoring Excel (linear only) |
Accuracy, AUC, precision, recall, margin width, support vector count | When class boundaries shift, or new data points fall in unexpected regions |
Naive Bayes Probability lookup table |
Supervised | Probability tables: P(class) and P(feature | class) for every feature-class combination | The probability tables. Scoring = multiply prior ร likelihoods. Pure lookup, no matrix math. | Excel VLOOKUP + PRODUCT REST API Batch SQL |
Accuracy, precision, recall, F1, especially useful for text classification tasks | When vocabulary shifts (for text) or when conditional independence assumption breaks down |
| UNSUPERVISED LEARNING โ Discover Hidden Structure | ||||||
K-Means Clustering Find natural customer groups |
Unsupervised | K cluster centroids (the center point of each group in feature space) | Centroids + scaler parameters. New customer โ standardize โ calculate distance to each centroid โ assign nearest. | Excel SUMPRODUCT + SQRT Batch nightly scoring REST API |
Silhouette score, elbow method, cluster stability, business naming test ("can you name it?") | When cluster sizes shift dramatically, or avg distance-to-centroid increases (segment drift) |
Hierarchical Clustering Tree of nested groups |
Unsupervised | A dendrogram (tree) showing how groups merge โ cut at desired height to get K clusters | After cutting the tree: same as K-Means โ centroids (group averages) + scaler. Dendrogram is for analysis, not scoring. | Excel (post-cut centroids) Batch scoring Dendrogram = analysis artifact only |
Dendrogram visual inspection, cophenetic correlation, silhouette score at chosen cut | When the dendrogram structure changes significantly with new data |
Anomaly Detection Find what doesn't belong |
Unsupervised | A boundary defining "normal" โ anything outside it is an anomaly. One-Class SVM / Isolation Forest | The boundary model (decision function). New record โ score โ if score < threshold โ flag as anomaly. | Real-time API (fraud) Batch scanning โ ๏ธ Requires careful threshold tuning |
Precision & recall on known anomalies (if available), false positive rate, domain expert review | When "normal" behavior evolves โ seasonal patterns, new products, market changes |
Market Basket / Association Rules What items go together? |
Unsupervised | A rules table: "If {chips, salsa} then {beer}" with support, confidence, and lift | The rules table itself. No scoring function โ it's a lookup: "customer has X โ recommend Y." | Excel filtered rules table Batch recommendation Recommendation API Rules ARE the deployment โ no model object |
Support, confidence, lift. Lift > 1 = positive association. Business validation: do the rules make sense? | When product catalog changes, seasonal shifts, or when lift values decay (products no longer co-purchased) |
| TEXT ANALYTICS โ Turn Words into Numbers, Then Actions | ||||||
Sentiment Analysis VADER / TextBlob / Custom |
Varies | A sentiment lexicon (word โ score dictionary) + scoring rules for negation, intensity, punctuation | The lexicon + rules engine. New text โ tokenize โ look up each word โ apply rules โ output polarity score. | Batch CSV scoring Real-time API Excel VBA (simplified) |
Accuracy vs. hand-labeled sample, precision/recall on pos/neg classes, domain coverage check | When domain language evolves (new slang), or when accuracy on fresh hand-labeled data drops |
Topic Extraction / Text Clustering TF-IDF + Clustering / LDA |
Unsupervised | A vocabulary + topic-word distributions (which words define each topic) from TF-IDF or LDA | The vocabulary (word โ index mapping) + topic centroids. New document โ vectorize with same vocab โ assign to nearest topic. | Batch categorization API for ticket routing Excel word-match lookup Vocabulary must match training exactly |
Topic coherence score, human interpretability ("can you name each topic?"), classification accuracy if labels exist | When new topics emerge, vocabulary drifts, or topic assignments no longer match domain expert expectations |
| TIME SERIES โ Predict What Happens Next | ||||||
Exponential Smoothing Trend + Seasonality forecasting |
Supervised | Smoothing parameters (ฮฑ, ฮฒ, ฮณ) for level, trend, and seasonal components. State-space model | The smoothing parameters + last known state. Forecast = apply parameters to generate future points + confidence intervals. | Excel FORECAST.ETS Batch monthly forecasts Streaming updates |
MAE, RMSE, MAPE, forecast vs. actuals plot, prediction intervals coverage | Every forecast cycle (monthly/quarterly) with new actuals โ time series models are inherently refresh-heavy |
ARIMA / Prophet Advanced forecasting |
Supervised | AR/MA coefficients (ARIMA) or trend changepoints + Fourier seasonality (Prophet). Parametric time model | The fitted model parameters. Generate forecasts for horizon N with confidence bands. | Batch forecasting API for demand planning Excel: limited to ETS; ARIMA needs Python/R |
MAE, RMSE, MAPE, AIC/BIC for model selection, residual autocorrelation (Ljung-Box test) | With every new data cycle โ refit on latest actuals and compare forecast accuracy to prior version |
Supervised models deploy a scoring function โ feed in new inputs, get a prediction back.
Unsupervised models deploy output artifacts โ centroids, rules tables, lexicons โ that become the inputs to downstream business logic.
Time series models deploy parameters + state โ they generate forecasts forward from the last known data point.
In every case, the deployment question is the same: "What artifact do I extract from training, and where does it live so decisions happen automatically?"