Infrastructure
MLOps
Machine Learning Operations: Deploy, monitor, and maintain ML models in production. Bridge the gap from notebook to reliable system.
๐ ML Lifecycle
๐ป
Develop
- โข Experiment tracking
- โข Feature store
- โข Notebooks
๐๏ธ
Train
- โข Hyperparameter tuning
- โข Cross-validation
- โข Model selection
๐
Deploy
- โข Model registry
- โข CI/CD
- โข Containerization
๐
Monitor
- โข Drift detection
- โข Performance tracking
- โข Alerting
๐ฆ Model Registry
Version control for models. Track lineage, promote through stages.
| Version | Stage | Accuracy | Deployed |
|---|---|---|---|
| v3.2.1 | Production | 74.2% | 2024-01-15 |
| v3.3.0 | Staging | 75.1% | 2024-01-20 |
| v3.4.0 | Development | 74.8% | - |
๐ Production Monitoring
Track model health, detect drift, alert on degradation.
Prediction Latency
45ms
Requests/sec
1,250
Model Drift
0.12
Data Quality
99.2%
๐ง MLOps Tools
MLflow
Experiment Tracking
Log params, metrics, artifacts
Weights & Biases
Experiment Tracking
Visualization, collaboration
Kubeflow
Orchestration
ML pipelines on K8s
Seldon
Serving
Model deployment at scale
Feast
Feature Store
Consistent feature engineering
Great Expectations
Data Quality
Data validation
Evidently
Monitoring
Model drift detection
BentoML
Serving
Model packaging and serving
๐ CI/CD for ML
Continuous Integration
- โข Unit tests for data pipelines
- โข Data validation checks
- โข Model training smoke tests
Continuous Training
- โข Scheduled retraining
- โข Trigger on data drift
- โข Auto hyperparameter tuning
Continuous Deployment
- โข Canary deployments
- โข A/B testing models
- โข Automatic rollback
Python / MLflow Example
# MLflow experiment tracking
import mlflow
from mlflow.tracking import MlflowClient
# Start experiment
mlflow.set_experiment("player_projection_model")
with mlflow.start_run():
# Log parameters
mlflow.log_param("model_type", "xgboost")
mlflow.log_param("n_estimators", 100)
mlflow.log_param("max_depth", 5)
# Train model
model = train_model(X_train, y_train)
# Log metrics
mlflow.log_metric("mae", 2.34)
mlflow.log_metric("rmse", 3.12)
mlflow.log_metric("r2", 0.78)
# Log model
mlflow.sklearn.log_model(model, "model")
# Register model
mlflow.register_model(
f"runs:{mlflow.active_run().info.run_id}/model",
"player_projection"
)
# Promote to production
client = MlflowClient()
client.transition_model_version_stage(
name="player_projection",
version=3,
stage="Production"
)โ Key Takeaways
- โข MLOps = DevOps for machine learning
- โข Track experiments, version models
- โข Automate training and deployment
- โข Monitor for drift and degradation
- โข Model registry for governance
- โข Start simple, add complexity as needed