Core Statistical Model Interactive
Ensemble Methods
Combine multiple models for better predictions. Ensembles reduce variance, improve robustness, and often outperform any single model.
๐ง Why Ensembles Work
๐
Reduce Variance
Averaging predictions smooths out noise from individual models.
๐ฏ
Reduce Bias
Different models capture different patterns in the data.
๐ก๏ธ
Robustness
Less sensitive to outliers and edge cases.
Individual Models
50 80
50 80
50 80
Ensemble Settings
0 0.9
Lower correlation = more diversity = bigger ensemble boost
Weights
0.1 0.6
0.1 0.6
0.1 0.6
Total Weight 100%
Accuracy Comparison
Model 1
67%
Model 2
61%
Model 3
67%
Ensemble
65%
Ensemble Boost +-2%
vs best individual model
Accuracy Breakdown
The Diversity Principle
High Correlation (Bad)
Models make same errors. Ensemble = single model. No diversity benefit.
Low Correlation (Good)
Models make different errors. Errors cancel out. Maximum diversity benefit.
๐ง Ensemble Methods
๐
Bagging
Train on bootstrap samples, average predictions
Example: Random Forest
๐
Boosting
Sequential training, focus on errors
Example: XGBoost, LightGBM
๐
Stacking
Meta-model learns to combine base models
Example: Blending layers
๐ณ๏ธ
Voting
Simple majority or weighted vote
Example: VotingClassifier
๐ Sports Pricing Applications
Player Projection Ensemble
- โ Model 1: Season average regression
- โ Model 2: Recent form weighted
- โ Model 3: Matchup-based adjustment
- โ Ensemble: Weighted combination
Win Probability Ensemble
- โ Model 1: Elo ratings
- โ Model 2: Vegas line implied
- โ Model 3: Advanced stats model
- โ Ensemble: Calibrated blend
R Code Equivalent
# Simple weighted ensemble
weighted_ensemble <- function(predictions, weights) {
# predictions: list of model predictions
# weights: vector of weights (should sum to 1)
ensemble <- Reduce(`+`, Map(`*`, predictions, weights))
return(ensemble)
}
# Stacking with meta-learner
library(caret)
stack_models <- function(train_data, models) {
# Get predictions from base models
base_preds <- lapply(models, function(m) predict(m, train_data))
meta_features <- do.call(cbind, base_preds)
# Train meta-learner
meta_model <- train(meta_features, train_data$y, method = "glm")
return(meta_model)
}
# Example
weights <- c(0.4, 0.35, 0.25)
ensemble_pred <- weighted_ensemble(
list(model1_pred, model2_pred, model3_pred),
weights
)โ Key Takeaways
- โข Ensembles usually beat single models
- โข Diversity is keyโuncorrelated errors cancel
- โข Weight by performance or learn meta-model
- โข Bagging reduces variance, boosting reduces bias
- โข Stacking is most flexible but complex
- โข Blend different model types for best results