0/70 completed
Econometrics Interactive

Hierarchical Models

Model data with natural grouping structure. Borrow strength across groups while respecting heterogeneity. Perfect for small-sample sports data.

๐Ÿ“Š The Small Sample Problem

The Problem

A player has 3 games this season. Sample mean is 28 PPG. Do we really believe they're a 28 PPG player?

Small samples have high varianceโ€”extreme values likely noise.

The Solution: Shrinkage

"Shrink" the estimate toward a population mean. Less shrinkage with more data, more with less.

Hierarchical models do this automatically.

Model Parameters

Global Mean (PPG) 22
15 30
Between-Group Variance 3
1 8
Within-Group Variance 2
1 6
~Games per Team 10
3 20

๐Ÿ“Š Pooling Analysis

Pooling Factor 6%

Low pooling: estimates stay close to raw averages (large samples or high between-group variance)

Team Estimates

Lakers n=13
19.2 โ†’ 19.3
Celtics n=13
25.4 โ†’ 25.2
Warriors n=6
21.4 โ†’ 21.4
Heat n=14
22.7 โ†’ 22.7
Bucks n=7
21.5 โ†’ 21.5
Raw estimate
Shrunk estimate
Global mean

Pooling Strategies

Complete Pooling

Ignore groups, one estimate

โœ“ Low variance

โœ— High bias

No Pooling

Separate estimate per group

โœ“ No bias

โœ— High variance

Partial Pooling

Shrink toward global mean

โœ“ Balanced

โœ— Assumes structure

๐Ÿˆ Betting Applications

Player Props

Shrink early-season stats toward career/positional average

Team Ratings

Estimate team strength accounting for roster turnover

Situational Splits

Home/away, day/night with limited data

New Customers

Estimate LTV with few transactions

R Code Equivalent

# Hierarchical model with lme4
library(lme4)

# Random intercepts model
model <- lmer(points ~ 1 + (1 | team), data = df)

# Extract estimates
fixed <- fixef(model)  # Global mean
random <- ranef(model)$team  # Team deviations
shrunk_estimates <- fixed + random

# Compare to raw means
raw_means <- aggregate(points ~ team, df, mean)

# Shrinkage factor
variance_components <- as.data.frame(VarCorr(model))
between_var <- variance_components$vcov[1]
within_var <- variance_components$vcov[2]
n_per_group <- 10
pooling <- (within_var / n_per_group) / (between_var + within_var / n_per_group)
cat(sprintf("Pooling factor: %.0f%%\n", pooling * 100))

โœ… Key Takeaways

  • โ€ข Hierarchical models handle grouped data
  • โ€ข Shrinkage reduces noise in small samples
  • โ€ข More shrinkage with fewer observations
  • โ€ข Partial pooling = best of both worlds
  • โ€ข Essential for early-season projections
  • โ€ข Use lme4 (R) or PyMC (Python)

Pricing Models & Frameworks Tutorial

Built for mastery ยท Interactive learning