Metrics & Evaluation Interactive

Log Loss (Cross-Entropy)

Measure how well probability predictions match outcomes. Heavily penalizes confident wrong predictions. The standard loss for classification.

📊 The Log Loss Formula

LogLoss = -1/N × Σ[y·log(p) + (1-y)·log(1-p)]

y = Actual outcome (0 or 1)
p = Predicted probability
N = Number of predictions

Key Properties

• 0 = Perfect predictions
• 0.693 = Random (50/50 guessing)
• ∞ = 100% confident and wrong

Single Prediction

0.01 0.99

Actual Outcome

📊 Result

0.3567

-log(0.70)

Moderate loss. Prediction was uncertain.

Log Loss Penalty Curves

When actual=1, low probability predictions get heavily penalized (green curve rises left). The opposite for actual=0 (red curve rises right).

Model Comparison

Well-Calibrated Model

0.507

Overconfident Model

1.636

Random Baseline

0.693

Insight: Overconfident models get punished hard by log loss when they're wrong, even if their accuracy is similar.

📊 Log Loss vs Accuracy

Accuracy

Only cares about correct/incorrect. Ignores confidence.

• 51% prediction, actual=1 → ✓ Correct
• 99% prediction, actual=1 → ✓ Correct (same credit)
• Doesn't reward calibrated probabilities

Log Loss

Measures confidence AND correctness. Penalizes overconfidence.

• 51% prediction, actual=1 → Loss: 0.67
• 99% prediction, actual=1 → Loss: 0.01 (much better)
• Rewards well-calibrated probabilities

🏀 Sports Betting Applications

When to Use Log Loss

→ Evaluating probability predictions (not just picks)
→ Training classification models
→ Comparing model calibration

Practical Tips

→ Lower is better (unlike accuracy)
→ 0.693 = random baseline (beat this!)
→ Clip probabilities to [0.01, 0.99] for stability

R Code Equivalent

# Calculate Log Loss
log_loss <- function(predicted, actual, eps = 1e-15) { 
  # Clip predictions to avoid log(0)
  predicted <- pmax(eps, pmin(1 - eps, predicted))
  
  # Calculate loss
  loss <- -(actual * log(predicted) + (1 - actual) * log(1 - predicted))
  
  return(mean(loss))
}

# Example
pred <- c(0.7)
actual <- c(1)
ll <- log_loss(pred, actual)
cat(sprintf("Log Loss: %.4f\n", ll))

# Compare to baseline
baseline_loss <- log(2)  # Random 50/50
cat(sprintf("Baseline (random): %.4f\n", baseline_loss))
cat(sprintf("Improvement: %.1f%%\n", (1 - ll/baseline_loss) * 100))

✅ Key Takeaways

• Log loss: lower is better (0 = perfect)
• Heavily penalizes confident wrong predictions
• 0.693 = random baseline (50/50 guessing)

• Better than accuracy for probability evaluation
• Use for training classification models
• Clip predictions to avoid infinity