Metrics & Evaluation Interactive
Log Loss (Cross-Entropy)
Measure how well probability predictions match outcomes. Heavily penalizes confident wrong predictions. The standard loss for classification.
๐ The Log Loss Formula
LogLoss = -1/N ร ฮฃ[yยทlog(p) + (1-y)ยทlog(1-p)]
- y = Actual outcome (0 or 1)
- p = Predicted probability
- N = Number of predictions
Key Properties
- โข 0 = Perfect predictions
- โข 0.693 = Random (50/50 guessing)
- โข โ = 100% confident and wrong
Single Prediction
0.01 0.99
๐ Result
0.3567
-log(0.70)
Moderate loss. Prediction was uncertain.
Log Loss Penalty Curves
When actual=1, low probability predictions get heavily penalized (green curve rises left). The opposite for actual=0 (red curve rises right).
Model Comparison
Well-Calibrated Model
0.507
Overconfident Model
1.636
Random Baseline
0.693
Insight: Overconfident models get punished hard by log loss when they're wrong, even if their accuracy is similar.
๐ Log Loss vs Accuracy
Accuracy
Only cares about correct/incorrect. Ignores confidence.
- โข 51% prediction, actual=1 โ โ Correct
- โข 99% prediction, actual=1 โ โ Correct (same credit)
- โข Doesn't reward calibrated probabilities
Log Loss
Measures confidence AND correctness. Penalizes overconfidence.
- โข 51% prediction, actual=1 โ Loss: 0.67
- โข 99% prediction, actual=1 โ Loss: 0.01 (much better)
- โข Rewards well-calibrated probabilities
๐ Sports Betting Applications
When to Use Log Loss
- โ Evaluating probability predictions (not just picks)
- โ Training classification models
- โ Comparing model calibration
Practical Tips
- โ Lower is better (unlike accuracy)
- โ 0.693 = random baseline (beat this!)
- โ Clip probabilities to [0.01, 0.99] for stability
R Code Equivalent
# Calculate Log Loss
log_loss <- function(predicted, actual, eps = 1e-15) {
# Clip predictions to avoid log(0)
predicted <- pmax(eps, pmin(1 - eps, predicted))
# Calculate loss
loss <- -(actual * log(predicted) + (1 - actual) * log(1 - predicted))
return(mean(loss))
}
# Example
pred <- c(0.7)
actual <- c(1)
ll <- log_loss(pred, actual)
cat(sprintf("Log Loss: %.4f\n", ll))
# Compare to baseline
baseline_loss <- log(2) # Random 50/50
cat(sprintf("Baseline (random): %.4f\n", baseline_loss))
cat(sprintf("Improvement: %.1f%%\n", (1 - ll/baseline_loss) * 100))โ Key Takeaways
- โข Log loss: lower is better (0 = perfect)
- โข Heavily penalizes confident wrong predictions
- โข 0.693 = random baseline (50/50 guessing)
- โข Better than accuracy for probability evaluation
- โข Use for training classification models
- โข Clip predictions to avoid infinity