Skip to content
📦 Technology & EngineeringAi Ml73 lines

Computer Vision Pipeline Design

Designing computer vision pipelines for image and video analysis tasks. Covers

Paste into your CLAUDE.md or agent config

Computer Vision Pipeline Design

Overview

A computer vision pipeline ingests raw image or video data and produces structured predictions such as class labels, bounding boxes, segmentation masks, or embeddings. Modern CV relies heavily on pretrained convolutional and transformer architectures, but pipeline design still requires careful attention to data preparation, augmentation strategy, and task-specific output heads.

Use this skill when building image classification, object detection, instance segmentation, or video analysis systems, or when deciding between classical image processing and deep learning approaches.

Core Framework

Task Architecture Map

TaskArchitectureOutput
ClassificationResNet, EfficientNet, ViTClass probabilities
Object DetectionYOLO, DETR, Faster R-CNNBounding boxes + classes
Semantic SegmentationU-Net, DeepLab, SegFormerPer-pixel class mask
Instance SegmentationMask R-CNN, SAMPer-object masks
Pose EstimationHRNet, MediaPipeKeypoint coordinates
Image GenerationDiffusion, GANSynthesized images

Data Augmentation Toolkit

  • Geometric: Random crop, flip, rotation, affine transform
  • Color: Brightness, contrast, saturation, hue jitter
  • Advanced: Mixup, CutMix, CutOut, Mosaic (for detection)
  • Domain-specific: Elastic deformation (medical), weather simulation (autonomous driving)

Process

  1. Define the CV task: input format (resolution, channels, video vs. static), output type, and evaluation metric.
  2. Collect and audit the dataset: class distribution, image quality, annotation quality, edge cases.
  3. Standardize input: resize to model-compatible resolution, normalize pixel values to model-expected range.
  4. Design augmentation strategy: start with standard geometric + color augmentations, add task-specific ones.
  5. Select base architecture: use pretrained ImageNet weights as default; choose model size based on compute budget.
  6. Configure the task-specific head: classification head, detection head with anchor design, or segmentation decoder.
  7. Set training parameters: SGD with momentum for CNNs, AdamW for ViTs; cosine LR schedule; batch size as large as GPU memory allows.
  8. Train with mixed precision (FP16/BF16) to reduce memory and increase throughput.
  9. Evaluate with task-appropriate metrics: top-1/5 accuracy, mAP@IoU thresholds, mIoU for segmentation.
  10. Optimize for deployment: quantization (INT8), pruning, ONNX export, or TensorRT compilation.

Key Principles

  • Pretrained ImageNet models transfer well to most domains; always start with transfer learning.
  • Augmentation is the cheapest way to improve generalization; invest time in a strong augmentation pipeline.
  • Resolution is a critical hyperparameter: higher resolution improves accuracy but quadratically increases compute.
  • For detection tasks, anchor-free methods (YOLO v8, DETR) simplify the pipeline versus anchor-based approaches.
  • Test-time augmentation (TTA) provides free accuracy gains at the cost of inference latency.
  • Annotation quality directly bounds model quality; invest in annotation guidelines and quality assurance.
  • Small objects require special handling: higher resolution input, feature pyramid networks, or tiling strategies.

Common Pitfalls

  • Training at low resolution and expecting the model to detect small objects.
  • Applying aggressive augmentation that distorts the semantic content (e.g., vertical flip for text recognition).
  • Ignoring class imbalance in detection datasets where background dominates.
  • Using accuracy instead of mAP for detection or mIoU for segmentation evaluation.
  • Skipping mixed-precision training and wasting GPU memory on FP32 operations.
  • Deploying without benchmarking inference speed on the target hardware.

Output Format

When designing a CV pipeline:

  1. Task Definition: Input specs, output format, accuracy targets.
  2. Dataset Summary: Size, class distribution, annotation format, quality assessment.
  3. Augmentation Plan: List of transforms with parameters and justification.
  4. Architecture Choice: Model, pretrained weights, head configuration.
  5. Training Configuration: Optimizer, LR, schedule, batch size, epochs.
  6. Evaluation Results: Primary and secondary metrics with visualizations.
  7. Deployment Plan: Model optimization steps, target hardware, expected throughput.