0/70 completed
Machine Learning Interactive

Feature Engineering

Transform raw data into predictive features. Good features are the difference between a mediocre model and a winning one.

๐Ÿ’ก The Core Principle

๐Ÿ“Š

Raw Data

Points scored, game logs, box scores

๐Ÿ”ง

Feature Engineering

Rolling averages, pace-adjusted, contextual

๐ŸŽฏ

Predictive Features

Signals that improve predictions

Feature Selection

๐Ÿ“Š Model Performance

60.0%
Prediction Accuracy

Features: 4

Feature Importance

Season Average
35%
Base
Rolling Avg (L5)
18%
Trend
Matchup Stats
15%
Context
Home/Away
8%
Context
Rest Days
6%
Context
โ— Base โ— Trend โ— Context

Engineering Best Practices

โœ“ Do

  • โ€ข Create rolling averages at multiple windows
  • โ€ข Normalize features to same scale
  • โ€ข Create interaction terms
  • โ€ข Handle missing values explicitly

โœ— Don't

  • โ€ข Use future information (data leakage)
  • โ€ข Create too many features (overfit)
  • โ€ข Ignore correlation between features
  • โ€ข Use raw counts without normalization

๐Ÿ“š Feature Categories

Base Statistics

Historical performance baselines

  • โ€ข Season average
  • โ€ข Career average
  • โ€ข Position average

Trend Features

Recent form and direction

  • โ€ข Rolling averages (L5, L10)
  • โ€ข Momentum
  • โ€ข Hot/cold streak

Context Features

Situational adjustments

  • โ€ข Home/away
  • โ€ข Rest days
  • โ€ข Back-to-back
  • โ€ข Opponent strength

Pace/Tempo

Opportunity-based scaling

  • โ€ข Team pace
  • โ€ข Opponent pace
  • โ€ข Projected possessions

Usage Features

Role and opportunity

  • โ€ข Projected minutes
  • โ€ข Usage rate
  • โ€ข Lineup impact

Derived/Interaction

Combined effects

  • โ€ข Pace ร— Minutes
  • โ€ข Home ร— Rest
  • โ€ข Matchup ร— Usage

๐Ÿ€ Player Props Feature Set

FeatureFormulaWhy It Matters
Pace-Adjusted Avgseason_avg ร— (matchup_pace / league_pace)Scales for fast/slow games
Minutes-Weightedper_min_rate ร— proj_minutesAccounts for opportunity
Matchup Factoropp_def_rating / league_avgDefense quality adjustment
Rest Impact1 + 0.02 ร— (rest_days - 1)Back-to-back penalty
Trend Score(L5_avg - season_avg) / season_stdHot/cold streak signal

R Code Equivalent

# Feature engineering for player props
library(dplyr)
library(zoo)

create_features <- function(player_games) { 
  player_games %>%
    arrange(game_date) %>%
    mutate(
      # Rolling averages
      roll_5 = rollmean(pts, k = 5, fill = NA, align = "right"),
      roll_10 = rollmean(pts, k = 10, fill = NA, align = "right"),
      
      # Trend
      trend = (roll_5 - season_avg) / season_std,
      
      # Pace adjustment
      pace_factor = team_pace / league_avg_pace,
      pace_adj_pts = season_avg * pace_factor,
      
      # Minutes projection
      per_min = pts / minutes,
      min_proj_pts = per_min * proj_minutes,
      
      # Matchup adjustment
      matchup_factor = opp_pts_allowed / league_avg,
      
      # Combined projection
      projection = pace_adj_pts * matchup_factor * 
                   (1 + trend * 0.1)
    )
}

# Feature importance
library(randomForest)
rf_model <- randomForest(pts ~ ., data = features)
importance(rf_model)

โœ… Key Takeaways

  • โ€ข Features matter more than model choice
  • โ€ข Rolling averages capture recent form
  • โ€ข Pace/tempo adjusts for opportunity
  • โ€ข Watch for data leakage (using future info)
  • โ€ข More features โ‰  better model
  • โ€ข Domain knowledge guides feature creation

Pricing Models & Frameworks Tutorial

Built for mastery ยท Interactive learning