Technology & EngineeringAi Ml91 lines

ML Deployment

ML model deployment and MLOps practices for production systems. Covers serving

Quick Summary18 lines

ML deployment bridges the gap between a trained model and a production system that delivers value. MLOps applies DevOps principles to ML systems, addressing unique challenges like data dependency, model versioning, training-serving skew, and continuous retraining. Most ML projects fail not in modeling but in deployment; a robust MLOps practice is essential for sustainable ML.

## Key Points

- **Batch**: Precompute predictions on a schedule; simplest pattern, best for non-real-time.
- **Online (synchronous)**: REST/gRPC API; real-time predictions with latency constraints.
- **Streaming**: Process events in real-time via Kafka/Flink; continuous prediction.
- **Edge**: Deploy to devices; requires model optimization and offline capability.
- Model artifact (weights, config, preprocessing pipeline)
- Training metadata (dataset version, hyperparameters, metrics)
- Lineage tracking (data -> model -> deployment)
- Stage management (staging, production, archived)
1. Package the model with its preprocessing pipeline into a single deployable artifact.
2. Containerize the serving application with pinned dependencies and reproducible builds.
3. Register the model in a model registry with version, metrics, and training metadata.
4. Set up a serving infrastructure appropriate to the access pattern (batch, online, or streaming).

skilldb get ai-ml-skills/ML DeploymentFull skill: 91 lines

Paste into your CLAUDE.md or agent config

ML Deployment and MLOps

Core Philosophy

Overview

ML deployment bridges the gap between a trained model and a production system that delivers value. MLOps applies DevOps principles to ML systems, addressing unique challenges like data dependency, model versioning, training-serving skew, and continuous retraining. Most ML projects fail not in modeling but in deployment; a robust MLOps practice is essential for sustainable ML.

Use this skill when moving a model from notebook to production, designing ML infrastructure, setting up monitoring and retraining pipelines, or establishing MLOps practices for a team.

Core Framework

MLOps Maturity Levels

Level	Description	Practices
0 - Manual	Manual training, manual deployment	Scripts, notebooks
1 - Pipeline	Automated training pipeline	Orchestration, model registry
2 - CI/CD	Automated testing and deployment	CI/CD for ML, A/B testing
3 - Full Auto	Automated retraining and monitoring	Drift detection, auto-rollback

Serving Patterns

Batch: Precompute predictions on a schedule; simplest pattern, best for non-real-time.
Online (synchronous): REST/gRPC API; real-time predictions with latency constraints.
Streaming: Process events in real-time via Kafka/Flink; continuous prediction.
Edge: Deploy to devices; requires model optimization and offline capability.

Model Registry Components

Model artifact (weights, config, preprocessing pipeline)
Training metadata (dataset version, hyperparameters, metrics)
Lineage tracking (data -> model -> deployment)
Stage management (staging, production, archived)

Process

Package the model with its preprocessing pipeline into a single deployable artifact.
Containerize the serving application with pinned dependencies and reproducible builds.
Register the model in a model registry with version, metrics, and training metadata.
Set up a serving infrastructure appropriate to the access pattern (batch, online, or streaming).
Implement health checks, input validation, and graceful error handling in the serving layer.
Deploy to staging and run integration tests with production-like data.
Deploy to production using a safe rollout strategy (canary, shadow, or A/B test).
Implement monitoring: prediction distribution, latency, error rate, input data drift, model staleness.
Set up alerting thresholds and automated rollback triggers.
Design the retraining pipeline: trigger conditions, data freshness requirements, validation gates.

Key Principles

The model is not the product; the prediction service is the product. Optimize for reliability, not just accuracy.
Training-serving skew is the most common deployment failure; ensure feature computation is identical in both paths.
Version everything: data, code, model, configuration, and pipeline definitions.
Canary deployments catch production issues before they affect all traffic.
Monitor inputs as aggressively as outputs; data drift precedes model degradation.
Automate retraining but gate deployments on validation metric thresholds.
Design for rollback from the start; every deployment should be reversible within minutes.
Log predictions and ground truth for continuous evaluation and future retraining.

Common Pitfalls

Reimplementing feature engineering in the serving path instead of reusing the training pipeline.
Deploying without input validation, allowing malformed data to produce silent garbage predictions.
Monitoring only system metrics (CPU, memory) and missing model-specific degradation.
Retraining on a schedule without checking if new data has actually improved the model.
Skipping shadow deployments and discovering latency issues only under production load.
Storing models as loose files instead of using a proper registry with versioning.

Output Format

When creating a deployment plan:

Model Artifact: What is packaged and how (container image, model format).
Serving Architecture: Pattern choice, infrastructure components, scaling strategy.
API Contract: Input/output schema, error responses, SLA (latency, availability).
Rollout Strategy: Canary/shadow/A-B plan with success criteria and rollback triggers.
Monitoring Dashboard: Key metrics, alert thresholds, escalation procedure.
Retraining Pipeline: Trigger conditions, data requirements, validation gates.
Runbook: Common failure scenarios and remediation steps.

Anti-Patterns

Over-engineering for hypothetical requirements. Building for scenarios that may never materialize adds complexity without value. Solve the problem in front of you first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide wastes time and introduces risk.

Premature abstraction. Creating elaborate frameworks before having enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at system boundaries. Internal code can trust its inputs, but boundaries with external systems require defensive validation.

Skipping documentation. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add ai-ml-skills

Get CLI access →

ML Deployment

ML Deployment and MLOps

Core Philosophy

Overview

Core Framework

MLOps Maturity Levels

Serving Patterns

Model Registry Components

Process

Key Principles

Common Pitfalls

Output Format

Anti-Patterns

Related Skills

Computer Vision Pipeline

Data Preprocessing

ML Evaluation

ML Model Selection

Neural Network Architecture

Nlp Pipeline