Machine Learning Interactive

Feature Engineering

Transform raw data into predictive features. Good features are the difference between a mediocre model and a winning one.

💡 The Core Principle

📊

Raw Data

Points scored, game logs, box scores

🔧

Feature Engineering

Rolling averages, pace-adjusted, contextual

🎯

Predictive Features

Signals that improve predictions

📊 Model Performance

60.0%

Prediction Accuracy

Features: 4

Feature Importance

Season Average

35%

Base

Rolling Avg (L5)

18%

Trend

Matchup Stats

15%

Context

Home/Away

Context

Rest Days

Context

● Base ● Trend ● Context

Engineering Best Practices

✓ Do

• Create rolling averages at multiple windows
• Normalize features to same scale
• Create interaction terms
• Handle missing values explicitly

✗ Don't

• Use future information (data leakage)
• Create too many features (overfit)
• Ignore correlation between features
• Use raw counts without normalization

📚 Feature Categories

Base Statistics

Historical performance baselines

• Season average
• Career average
• Position average

Trend Features

Recent form and direction

• Rolling averages (L5, L10)
• Momentum
• Hot/cold streak

Context Features

Situational adjustments

• Home/away
• Rest days
• Back-to-back
• Opponent strength

Pace/Tempo

Opportunity-based scaling

• Team pace
• Opponent pace
• Projected possessions

Usage Features

Role and opportunity

• Projected minutes
• Usage rate
• Lineup impact

Derived/Interaction

Combined effects

• Pace × Minutes
• Home × Rest
• Matchup × Usage

🏀 Player Props Feature Set

Feature	Formula	Why It Matters
Pace-Adjusted Avg	season_avg × (matchup_pace / league_pace)	Scales for fast/slow games
Minutes-Weighted	per_min_rate × proj_minutes	Accounts for opportunity
Matchup Factor	opp_def_rating / league_avg	Defense quality adjustment
Rest Impact	1 + 0.02 × (rest_days - 1)	Back-to-back penalty
Trend Score	(L5_avg - season_avg) / season_std	Hot/cold streak signal

R Code Equivalent

# Feature engineering for player props
library(dplyr)
library(zoo)

create_features <- function(player_games) { 
  player_games %>%
    arrange(game_date) %>%
    mutate(
      # Rolling averages
      roll_5 = rollmean(pts, k = 5, fill = NA, align = "right"),
      roll_10 = rollmean(pts, k = 10, fill = NA, align = "right"),
      
      # Trend
      trend = (roll_5 - season_avg) / season_std,
      
      # Pace adjustment
      pace_factor = team_pace / league_avg_pace,
      pace_adj_pts = season_avg * pace_factor,
      
      # Minutes projection
      per_min = pts / minutes,
      min_proj_pts = per_min * proj_minutes,
      
      # Matchup adjustment
      matchup_factor = opp_pts_allowed / league_avg,
      
      # Combined projection
      projection = pace_adj_pts * matchup_factor * 
                   (1 + trend * 0.1)
    )
}

# Feature importance
library(randomForest)
rf_model <- randomForest(pts ~ ., data = features)
importance(rf_model)

✅ Key Takeaways

• Features matter more than model choice
• Rolling averages capture recent form
• Pace/tempo adjusts for opportunity

• Watch for data leakage (using future info)
• More features ≠ better model
• Domain knowledge guides feature creation

Feature Engineering

💡 The Core Principle

Raw Data

Feature Engineering

Predictive Features

Feature Selection

📊 Model Performance

Feature Importance

Engineering Best Practices

✓ Do

✗ Don't

📚 Feature Categories

Base Statistics

Trend Features

Context Features

Pace/Tempo

Usage Features

Derived/Interaction

🏀 Player Props Feature Set

R Code Equivalent

✅ Key Takeaways