Machine Learning Interactive

Model Ensembling

Combine predictions from multiple models for better accuracy. Learn stacking, blending, and optimal weight selection for sports prediction models.

🎯 The Wisdom of Crowds

👥

Condorcet Jury

If each model is > 50% accurate, majority vote approaches 100% as n increases.

🎲

Error Cancellation

Independent errors average out. Bias remains, variance decreases.

🧩

Complementarity

Different models capture different patterns in the data.

Ensemble Configuration

2 10

55 75

0.1 0.8

📊 Ensemble Result

Best Single Model

63.9%

Ensemble Accuracy

72.2%

Ensemble Lift +8.3%

Individual Model Accuracies

RFLinearXGBoostCatBoostLightGBM

Stacking Architecture

Base Models

XGBoostRandom ForestLightGBMCatBoost

Diverse tree-based models

↓

Meta Features

OOF PredictionsRank FeaturesConfidence

Out-of-fold predictions

↓

Meta Learner

Logistic RegressionRidge

Learn optimal blend

🔀 Blending Strategies

➗

Simple Average

(1/n) × Σ predictions

Best when: Equal model quality

⚖️

Weighted Average

Σ wᵢ × predictionᵢ

Best when: Known model quality

📊

Rank Average

Average of ranks

Best when: Different scales

✖️

Geometric Mean

∏ predictions^(1/n)

Best when: Multiplicative effects

💡 Practical Tips for Sports Models

Model Selection

→ Include different algorithm families (trees, linear, neural)
→ Use different feature subsets per model
→ Vary hyperparameters for diversity
→ Remove highly correlated models (> 0.9)

Weight Optimization

→ Use cross-validation for weight selection
→ Constrain weights to sum to 1 (convex combination)
→ Consider time-weighted ensembles for drift
→ Simple average often beats complex optimization

R Code Equivalent

# Model stacking with caret
library(caret)
library(caretEnsemble)

# Define base models
model_list <- caretList(
  outcome ~ .,
  data = train_data,
  trControl = trainControl(
    method = "cv",
    number = 5,
    savePredictions = "final"
  ),
  methodList = c("xgbTree", "rf", "glmnet")
)

# Check model correlations
modelCor(resamples(model_list))

# Stack with meta-learner
stack <- caretStack(
  model_list,
  method = "glm",  # Simple meta-learner
  trControl = trainControl(method = "cv", number = 5)
)

# Predict
ensemble_pred <- predict(stack, test_data)

# Simple weighted ensemble
blend_predictions <- function(predictions, weights = NULL) { 
  if (is.null(weights)) { 
    weights <- rep(1/ncol(predictions), ncol(predictions))
  }
  as.vector(as.matrix(predictions) %*% weights)
}

✅ Key Takeaways

• Diversity is more important than individual accuracy
• Stacking: train meta-learner on OOF predictions
• Simple averaging is a strong baseline

• More models = diminishing returns after 5-7
• Validate ensemble on holdout, not CV
• Remove correlated models to improve diversity