ML Deployment and MLOps
ML model deployment and MLOps practices for production systems. Covers serving
ML Deployment and MLOps
Overview
ML deployment bridges the gap between a trained model and a production system that delivers value. MLOps applies DevOps principles to ML systems, addressing unique challenges like data dependency, model versioning, training-serving skew, and continuous retraining. Most ML projects fail not in modeling but in deployment; a robust MLOps practice is essential for sustainable ML.
Use this skill when moving a model from notebook to production, designing ML infrastructure, setting up monitoring and retraining pipelines, or establishing MLOps practices for a team.
Core Framework
MLOps Maturity Levels
| Level | Description | Practices |
|---|---|---|
| 0 - Manual | Manual training, manual deployment | Scripts, notebooks |
| 1 - Pipeline | Automated training pipeline | Orchestration, model registry |
| 2 - CI/CD | Automated testing and deployment | CI/CD for ML, A/B testing |
| 3 - Full Auto | Automated retraining and monitoring | Drift detection, auto-rollback |
Serving Patterns
- Batch: Precompute predictions on a schedule; simplest pattern, best for non-real-time.
- Online (synchronous): REST/gRPC API; real-time predictions with latency constraints.
- Streaming: Process events in real-time via Kafka/Flink; continuous prediction.
- Edge: Deploy to devices; requires model optimization and offline capability.
Model Registry Components
- Model artifact (weights, config, preprocessing pipeline)
- Training metadata (dataset version, hyperparameters, metrics)
- Lineage tracking (data -> model -> deployment)
- Stage management (staging, production, archived)
Process
- Package the model with its preprocessing pipeline into a single deployable artifact.
- Containerize the serving application with pinned dependencies and reproducible builds.
- Register the model in a model registry with version, metrics, and training metadata.
- Set up a serving infrastructure appropriate to the access pattern (batch, online, or streaming).
- Implement health checks, input validation, and graceful error handling in the serving layer.
- Deploy to staging and run integration tests with production-like data.
- Deploy to production using a safe rollout strategy (canary, shadow, or A/B test).
- Implement monitoring: prediction distribution, latency, error rate, input data drift, model staleness.
- Set up alerting thresholds and automated rollback triggers.
- Design the retraining pipeline: trigger conditions, data freshness requirements, validation gates.
Key Principles
- The model is not the product; the prediction service is the product. Optimize for reliability, not just accuracy.
- Training-serving skew is the most common deployment failure; ensure feature computation is identical in both paths.
- Version everything: data, code, model, configuration, and pipeline definitions.
- Canary deployments catch production issues before they affect all traffic.
- Monitor inputs as aggressively as outputs; data drift precedes model degradation.
- Automate retraining but gate deployments on validation metric thresholds.
- Design for rollback from the start; every deployment should be reversible within minutes.
- Log predictions and ground truth for continuous evaluation and future retraining.
Common Pitfalls
- Reimplementing feature engineering in the serving path instead of reusing the training pipeline.
- Deploying without input validation, allowing malformed data to produce silent garbage predictions.
- Monitoring only system metrics (CPU, memory) and missing model-specific degradation.
- Retraining on a schedule without checking if new data has actually improved the model.
- Skipping shadow deployments and discovering latency issues only under production load.
- Storing models as loose files instead of using a proper registry with versioning.
Output Format
When creating a deployment plan:
- Model Artifact: What is packaged and how (container image, model format).
- Serving Architecture: Pattern choice, infrastructure components, scaling strategy.
- API Contract: Input/output schema, error responses, SLA (latency, availability).
- Rollout Strategy: Canary/shadow/A-B plan with success criteria and rollback triggers.
- Monitoring Dashboard: Key metrics, alert thresholds, escalation procedure.
- Retraining Pipeline: Trigger conditions, data requirements, validation gates.
- Runbook: Common failure scenarios and remediation steps.
Related Skills
Computer Vision Pipeline Design
Designing computer vision pipelines for image and video analysis tasks. Covers
Data Preprocessing
Systematic approach to data cleaning, transformation, and feature preparation for
ML Model Evaluation
Comprehensive model evaluation and metrics selection for machine learning. Covers
ML Model Selection
Guides you through choosing the right machine learning model for a given problem.
Neural Network Architecture Design
Guides the design of neural network architectures for various tasks. Covers layer
NLP Pipeline Design
Designing end-to-end natural language processing pipelines from text ingestion to