Infrastructure

MLOps

Machine Learning Operations: Deploy, monitor, and maintain ML models in production. Bridge the gap from notebook to reliable system.

🔄 ML Lifecycle

💻

Develop

• Experiment tracking
• Feature store
• Notebooks

🏋️

Train

• Hyperparameter tuning
• Cross-validation
• Model selection

🚀

Deploy

• Model registry
• CI/CD
• Containerization

📊

Monitor

• Drift detection
• Performance tracking
• Alerting

📦 Model Registry

Version control for models. Track lineage, promote through stages.

Version	Stage	Accuracy	Deployed
v3.2.1	Production	74.2%	2024-01-15
v3.3.0	Staging	75.1%	2024-01-20
v3.4.0	Development	74.8%	-

📊 Production Monitoring

Track model health, detect drift, alert on degradation.

Prediction Latency

45ms

Requests/sec

1,250

Model Drift

0.12

Data Quality

99.2%

🔧 MLOps Tools

MLflow

Experiment Tracking

Log params, metrics, artifacts

Weights & Biases

Experiment Tracking

Visualization, collaboration

Kubeflow

Orchestration

ML pipelines on K8s

Seldon

Serving

Model deployment at scale

Feast

Feature Store

Consistent feature engineering

Great Expectations

Data Quality

Data validation

Evidently

Monitoring

Model drift detection

BentoML

Serving

Model packaging and serving

🔄 CI/CD for ML

Continuous Integration

• Unit tests for data pipelines
• Data validation checks
• Model training smoke tests

Continuous Training

• Scheduled retraining
• Trigger on data drift
• Auto hyperparameter tuning

Continuous Deployment

• Canary deployments
• A/B testing models
• Automatic rollback

Python / MLflow Example

# MLflow experiment tracking
import mlflow
from mlflow.tracking import MlflowClient

# Start experiment
mlflow.set_experiment("player_projection_model")

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("model_type", "xgboost")
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 5)
    
    # Train model
    model = train_model(X_train, y_train)
    
    # Log metrics
    mlflow.log_metric("mae", 2.34)
    mlflow.log_metric("rmse", 3.12)
    mlflow.log_metric("r2", 0.78)
    
    # Log model
    mlflow.sklearn.log_model(model, "model")
    
    # Register model
    mlflow.register_model(
        f"runs:{mlflow.active_run().info.run_id}/model",
        "player_projection"
    )

# Promote to production
client = MlflowClient()
client.transition_model_version_stage(
    name="player_projection",
    version=3,
    stage="Production"
)

✅ Key Takeaways

• MLOps = DevOps for machine learning
• Track experiments, version models
• Automate training and deployment

• Monitor for drift and degradation
• Model registry for governance
• Start simple, add complexity as needed