0/70 completed
Machine Learning Interactive

Model Calibration

Ensure predicted probabilities match observed frequencies. A 70% prediction should win 70% of the time. Critical for pricing and risk assessment.

๐ŸŽฏ Why Calibration Matters

๐Ÿ“Š

Accurate Pricing

Miscalibrated probabilities = mispriced lines. You either give edge or lose customers.

โš–๏ธ

Risk Assessment

VaR and stress tests need calibrated probabilities for valid estimates.

๐ŸŽฐ

Bettor Trust

Well-calibrated odds feel fair. Builds long-term customer relationships.

Model Parameters

Calibration Error (%) 10
0 30
Overconfidence (%) 15
-20 40

Overconfidence: Model pushes predictions away from 50%

๐Ÿ“Š Metrics

ECE (Expected Cal. Error) 7.1%
Brier Score 0.181

Acceptable calibration

Reliability Diagram

Perfect calibration: line follows diagonal. Above = underconfident, Below = overconfident.

Reading the Chart

Well Calibrated

Green (actual) line follows yellow (predicted) line closely. 70% predictions win ~70% of the time.

Overconfident

Actual outcomes lower than predictions for high probabilities. Model says 80%, actually happens 60%.

๐Ÿ”ง Calibration Methods

Platt Scaling

Parametric

Fit sigmoid to outputs

Isotonic Regression

Non-parametric

Non-parametric monotonic fit

Temperature Scaling

Parametric

Single parameter softmax

Histogram Binning

Non-parametric

Bin-specific corrections

๐Ÿ’ก Practical Tips

When to Calibrate

  • โ†’ After training any classifier (SVM, NN, tree-based)
  • โ†’ Before using probabilities for pricing
  • โ†’ Regularly as model drifts

Best Practices

  • โ†’ Use holdout set for calibration (not training)
  • โ†’ Platt scaling works well with enough data
  • โ†’ Temperature scaling for neural networks

R Code Equivalent

# Calibration analysis
library(CalibrationCurves)

# Calculate Expected Calibration Error
calculate_ece <- function(predictions, actuals, n_bins = 10) { 
  bins <- cut(predictions, breaks = seq(0, 1, length.out = n_bins + 1))
  
  ece <- 0
  n_total <- length(predictions)
  
  for (b in levels(bins)) { 
    in_bin <- which(bins == b)
    if (length(in_bin) > 0) { 
      avg_pred <- mean(predictions[in_bin])
      avg_actual <- mean(actuals[in_bin])
      weight <- length(in_bin) / n_total
      ece <- ece + weight * abs(avg_pred - avg_actual)
    }
  }
  
  return(ece)
}

# Platt scaling calibration
platt_calibrate <- function(predictions, actuals) { 
  model <- glm(actuals ~ predictions, family = binomial)
  calibrated <- predict(model, type = "response")
  return(calibrated)
}

# Temperature scaling (for neural nets)
temperature_scale <- function(logits, temperature) { 
  scaled <- logits / temperature
  probs <- exp(scaled) / (1 + exp(scaled))
  return(probs)
}

โœ… Key Takeaways

  • โ€ข 70% prediction should win 70% of the time
  • โ€ข Use reliability diagrams to visualize calibration
  • โ€ข ECE measures overall calibration quality
  • โ€ข Platt scaling and isotonic regression are common fixes
  • โ€ข Always calibrate on held-out data
  • โ€ข Re-calibrate as model drifts

Pricing Models & Frameworks Tutorial

Built for mastery ยท Interactive learning