Panel Data Analysis
Analyze data with multiple entities observed over time. Control for unobserved heterogeneity using fixed and random effects.
๐ Panel Data Structure
Panel data = same entities tracked over multiple time periods. Combines cross-sectional and time-series dimensions.
Advantages
- โข Control for unobserved individual effects
- โข More observations = more power
- โข Study dynamics over time
- โข Reduce omitted variable bias
| Player | S1 | S2 | S3 |
|---|---|---|---|
| Player 1 | 9.6 | 6.9 | 11.0 |
| Player 2 | 31.5 | 28.9 | 34.3 |
| Player 3 | 28.7 | 32.2 | 30.8 |
Panel Parameters
๐ Panel Dimensions
Player Trajectories
Each line = one player. Parallel upward trend = time effect. Vertical spread = player fixed effects.
Fixed Effects Intuition
Within Variation
Compare each player to their own average. Removes time-invariant differences.
"How does this player's performance change when X changes?"
Between Variation
Compare averages across players. Confounded by unobserved differences.
"High scorers may just be better, not more skilled at X"
๐ Model Comparison
Pooled OLS
Ignores panel structure
โ ๏ธ Omitted variable bias
โ Simple baseline
Fixed Effects
Controls for time-invariant individual effects
โ ๏ธ Can't estimate time-invariant vars
โ Most common for causal inference
Random Effects
Assumes individual effects uncorrelated with X
โ ๏ธ Biased if assumption violated
โ More efficient if valid
R Code Equivalent
# Panel data analysis with plm
library(plm)
# Convert to panel data frame
pdata <- pdata.frame(df, index = c("player_id", "season"))
# Pooled OLS (ignores panel structure)
pooled <- plm(pts ~ usage + matchup, data = pdata, model = "pooling")
# Fixed Effects (within estimator)
fe_model <- plm(pts ~ usage + matchup, data = pdata, model = "within")
# Random Effects
re_model <- plm(pts ~ usage + matchup, data = pdata, model = "random")
# Hausman test: FE vs RE
phtest(fe_model, re_model) # p < 0.05 โ use FE
# Two-way fixed effects (player + time)
twoway_fe <- plm(pts ~ usage + matchup, data = pdata,
model = "within", effect = "twoways")โ Key Takeaways
- โข Panel = entities ร time periods
- โข Fixed effects control for unobserved heterogeneity
- โข Uses "within" variation (each entity vs itself)
- โข Hausman test: FE vs RE
- โข Can add time fixed effects too
- โข More power than cross-section or time series alone