Core Statistical Model Interactive

Ensemble Methods

Combine multiple models for better predictions. Ensembles reduce variance, improve robustness, and often outperform any single model.

🧠 Why Ensembles Work

📉

Reduce Variance

Averaging predictions smooths out noise from individual models.

🎯

Reduce Bias

Different models capture different patterns in the data.

🛡️

Robustness

Less sensitive to outliers and edge cases.

Individual Models

50 80

Ensemble Settings

0 0.9

Lower correlation = more diversity = bigger ensemble boost

Weights

0.1 0.6

Total Weight 100%

Accuracy Comparison

Model 1

67%

Model 2

61%

Model 3

67%

Ensemble

65%

Ensemble Boost +-2%

vs best individual model

Accuracy Breakdown

The Diversity Principle

High Correlation (Bad)

Models make same errors. Ensemble = single model. No diversity benefit.

Low Correlation (Good)

Models make different errors. Errors cancel out. Maximum diversity benefit.

🔧 Ensemble Methods

🎒

Bagging

Train on bootstrap samples, average predictions

Example: Random Forest

🚀

Boosting

Sequential training, focus on errors

Example: XGBoost, LightGBM

📚

Stacking

Meta-model learns to combine base models

Example: Blending layers

🗳️

Voting

Simple majority or weighted vote

Example: VotingClassifier

🏀 Sports Pricing Applications

Player Projection Ensemble

→ Model 1: Season average regression
→ Model 2: Recent form weighted
→ Model 3: Matchup-based adjustment
→ Ensemble: Weighted combination

Win Probability Ensemble

→ Model 1: Elo ratings
→ Model 2: Vegas line implied
→ Model 3: Advanced stats model
→ Ensemble: Calibrated blend

R Code Equivalent

# Simple weighted ensemble
weighted_ensemble <- function(predictions, weights) { 
  # predictions: list of model predictions
  # weights: vector of weights (should sum to 1)
  
  ensemble <- Reduce(`+`, Map(`*`, predictions, weights))
  return(ensemble)
}

# Stacking with meta-learner
library(caret)
stack_models <- function(train_data, models) { 
  # Get predictions from base models
  base_preds <- lapply(models, function(m) predict(m, train_data))
  meta_features <- do.call(cbind, base_preds)
  
  # Train meta-learner
  meta_model <- train(meta_features, train_data$y, method = "glm")
  return(meta_model)
}

# Example
weights <- c(0.4, 0.35, 0.25)
ensemble_pred <- weighted_ensemble(
  list(model1_pred, model2_pred, model3_pred),
  weights
)

✅ Key Takeaways

• Ensembles usually beat single models
• Diversity is key—uncorrelated errors cancel
• Weight by performance or learn meta-model

• Bagging reduces variance, boosting reduces bias
• Stacking is most flexible but complex
• Blend different model types for best results