Infrastructure
Model Versioning
Track every model, dataset, and experiment. Enable reproducibility, rollback, and compliance in production ML.
๐ Version History
| Version | Date | Change | Accuracy | Actions |
|---|---|---|---|---|
| v1.0.0 | 2023-06-01 | Initial production model | 68.0% | Current |
| v1.1.0 | 2023-08-15 | Added weather features | 71.0% | |
| v2.0.0 | 2023-11-20 | Switched to XGBoost | 74.0% | |
| v2.1.0 | 2024-01-10 | Hyperparameter tuning | 76.0% | |
| v2.2.0 | 2024-01-25 | Feature engineering update | 75.0% |
๐ฆ What to Version
๐ป
Code
Tool: Git
Training scripts, configs
๐
Data
Tool: DVC
Training datasets, features
๐ง
Model
Tool: MLflow
Weights, artifacts
๐ฌ
Experiments
Tool: W&B
Hyperparams, metrics
๐ Lineage Tracking
Data
- raw_odds_2024.parquet
- player_stats_v3.csv
โ
Features
- feature_store/nfl_v2
- embeddings/team_v1
โ
Training
- config.yaml
- train.py@a3f2d1
โ
Model
- model_v2.1.0.pkl
- scaler.pkl
โ
Deploy
- Dockerfile
- k8s/deployment.yaml
Full lineage: know exactly what data, code, and config produced each model
โ Best Practices
Semantic Versioning
MAJOR.MINOR.PATCH for clarity
Immutable Artifacts
Never modify, always create new
Full Lineage
Track data โ code โ model โ predictions
Reproducibility
Same inputs โ same outputs
Rollback Ready
Previous versions instantly deployable
A/B Comparison
Compare versions in production
DVC + Git Example
# Initialize DVC in your repo
dvc init
# Track large data files
dvc add data/training_data.parquet
git add data/training_data.parquet.dvc .gitignore
# Track model files
dvc add models/player_projection_v2.pkl
git add models/player_projection_v2.pkl.dvc
# Commit everything together
git add .
git commit -m "v2.1.0: Add weather features"
git tag v2.1.0
# Push data to remote storage (S3, GCS)
dvc push
# Reproduce on another machine
git clone <repo>
git checkout v2.1.0
dvc pull # Downloads data/models from remote
# Compare versions
dvc diff v2.0.0 v2.1.0
dvc metrics diff v2.0.0 v2.1.0โ Key Takeaways
- โข Version code (Git), data (DVC), models (MLflow)
- โข Use semantic versioning (MAJOR.MINOR.PATCH)
- โข Never modifyโalways create new versions
- โข Track full lineage for reproducibility
- โข Enable instant rollback in production
- โข Required for audit and compliance