Model Calibration
Ensure predicted probabilities match observed frequencies. A 70% prediction should win 70% of the time. Critical for pricing and risk assessment.
๐ฏ Why Calibration Matters
Accurate Pricing
Miscalibrated probabilities = mispriced lines. You either give edge or lose customers.
Risk Assessment
VaR and stress tests need calibrated probabilities for valid estimates.
Bettor Trust
Well-calibrated odds feel fair. Builds long-term customer relationships.
Model Parameters
Overconfidence: Model pushes predictions away from 50%
๐ Metrics
Acceptable calibration
Reliability Diagram
Perfect calibration: line follows diagonal. Above = underconfident, Below = overconfident.
Reading the Chart
Well Calibrated
Green (actual) line follows yellow (predicted) line closely. 70% predictions win ~70% of the time.
Overconfident
Actual outcomes lower than predictions for high probabilities. Model says 80%, actually happens 60%.
๐ง Calibration Methods
Platt Scaling
ParametricFit sigmoid to outputs
Isotonic Regression
Non-parametricNon-parametric monotonic fit
Temperature Scaling
ParametricSingle parameter softmax
Histogram Binning
Non-parametricBin-specific corrections
๐ก Practical Tips
When to Calibrate
- โ After training any classifier (SVM, NN, tree-based)
- โ Before using probabilities for pricing
- โ Regularly as model drifts
Best Practices
- โ Use holdout set for calibration (not training)
- โ Platt scaling works well with enough data
- โ Temperature scaling for neural networks
R Code Equivalent
# Calibration analysis
library(CalibrationCurves)
# Calculate Expected Calibration Error
calculate_ece <- function(predictions, actuals, n_bins = 10) {
bins <- cut(predictions, breaks = seq(0, 1, length.out = n_bins + 1))
ece <- 0
n_total <- length(predictions)
for (b in levels(bins)) {
in_bin <- which(bins == b)
if (length(in_bin) > 0) {
avg_pred <- mean(predictions[in_bin])
avg_actual <- mean(actuals[in_bin])
weight <- length(in_bin) / n_total
ece <- ece + weight * abs(avg_pred - avg_actual)
}
}
return(ece)
}
# Platt scaling calibration
platt_calibrate <- function(predictions, actuals) {
model <- glm(actuals ~ predictions, family = binomial)
calibrated <- predict(model, type = "response")
return(calibrated)
}
# Temperature scaling (for neural nets)
temperature_scale <- function(logits, temperature) {
scaled <- logits / temperature
probs <- exp(scaled) / (1 + exp(scaled))
return(probs)
}โ Key Takeaways
- โข 70% prediction should win 70% of the time
- โข Use reliability diagrams to visualize calibration
- โข ECE measures overall calibration quality
- โข Platt scaling and isotonic regression are common fixes
- โข Always calibrate on held-out data
- โข Re-calibrate as model drifts