Skip to content
📦 Technology & EngineeringAi Ml78 lines

ML Deployment and MLOps

ML model deployment and MLOps practices for production systems. Covers serving

Paste into your CLAUDE.md or agent config

ML Deployment and MLOps

Overview

ML deployment bridges the gap between a trained model and a production system that delivers value. MLOps applies DevOps principles to ML systems, addressing unique challenges like data dependency, model versioning, training-serving skew, and continuous retraining. Most ML projects fail not in modeling but in deployment; a robust MLOps practice is essential for sustainable ML.

Use this skill when moving a model from notebook to production, designing ML infrastructure, setting up monitoring and retraining pipelines, or establishing MLOps practices for a team.

Core Framework

MLOps Maturity Levels

LevelDescriptionPractices
0 - ManualManual training, manual deploymentScripts, notebooks
1 - PipelineAutomated training pipelineOrchestration, model registry
2 - CI/CDAutomated testing and deploymentCI/CD for ML, A/B testing
3 - Full AutoAutomated retraining and monitoringDrift detection, auto-rollback

Serving Patterns

  • Batch: Precompute predictions on a schedule; simplest pattern, best for non-real-time.
  • Online (synchronous): REST/gRPC API; real-time predictions with latency constraints.
  • Streaming: Process events in real-time via Kafka/Flink; continuous prediction.
  • Edge: Deploy to devices; requires model optimization and offline capability.

Model Registry Components

  • Model artifact (weights, config, preprocessing pipeline)
  • Training metadata (dataset version, hyperparameters, metrics)
  • Lineage tracking (data -> model -> deployment)
  • Stage management (staging, production, archived)

Process

  1. Package the model with its preprocessing pipeline into a single deployable artifact.
  2. Containerize the serving application with pinned dependencies and reproducible builds.
  3. Register the model in a model registry with version, metrics, and training metadata.
  4. Set up a serving infrastructure appropriate to the access pattern (batch, online, or streaming).
  5. Implement health checks, input validation, and graceful error handling in the serving layer.
  6. Deploy to staging and run integration tests with production-like data.
  7. Deploy to production using a safe rollout strategy (canary, shadow, or A/B test).
  8. Implement monitoring: prediction distribution, latency, error rate, input data drift, model staleness.
  9. Set up alerting thresholds and automated rollback triggers.
  10. Design the retraining pipeline: trigger conditions, data freshness requirements, validation gates.

Key Principles

  • The model is not the product; the prediction service is the product. Optimize for reliability, not just accuracy.
  • Training-serving skew is the most common deployment failure; ensure feature computation is identical in both paths.
  • Version everything: data, code, model, configuration, and pipeline definitions.
  • Canary deployments catch production issues before they affect all traffic.
  • Monitor inputs as aggressively as outputs; data drift precedes model degradation.
  • Automate retraining but gate deployments on validation metric thresholds.
  • Design for rollback from the start; every deployment should be reversible within minutes.
  • Log predictions and ground truth for continuous evaluation and future retraining.

Common Pitfalls

  • Reimplementing feature engineering in the serving path instead of reusing the training pipeline.
  • Deploying without input validation, allowing malformed data to produce silent garbage predictions.
  • Monitoring only system metrics (CPU, memory) and missing model-specific degradation.
  • Retraining on a schedule without checking if new data has actually improved the model.
  • Skipping shadow deployments and discovering latency issues only under production load.
  • Storing models as loose files instead of using a proper registry with versioning.

Output Format

When creating a deployment plan:

  1. Model Artifact: What is packaged and how (container image, model format).
  2. Serving Architecture: Pattern choice, infrastructure components, scaling strategy.
  3. API Contract: Input/output schema, error responses, SLA (latency, availability).
  4. Rollout Strategy: Canary/shadow/A-B plan with success criteria and rollback triggers.
  5. Monitoring Dashboard: Key metrics, alert thresholds, escalation procedure.
  6. Retraining Pipeline: Trigger conditions, data requirements, validation gates.
  7. Runbook: Common failure scenarios and remediation steps.