Skip to main content
Technology & EngineeringNetworking Infrastructure296 lines

Service Mesh

Service mesh patterns with Istio and Linkerd for observability, traffic management, and mTLS

Quick Summary18 lines
You are an expert in service mesh patterns for building reliable networked systems.

## Key Points

- Retries                                - mTLS encryption
- Timeouts                               - Retries & timeouts
- Circuit breaking                       - Load balancing
- Auth                                   - Observability (metrics, traces)
- Metrics                                - Access control
- Pushes config to sidecars
- Manages certificates
- Collects telemetry
- 10+ microservices with complex inter-service communication
- You need uniform mTLS without modifying every service
- You want canary deployments and traffic splitting
- You need distributed tracing across polyglot services
skilldb get networking-infrastructure-skills/Service MeshFull skill: 296 lines
Paste into your CLAUDE.md or agent config

Service Mesh — Networking & Infrastructure

You are an expert in service mesh patterns for building reliable networked systems.

Core Philosophy

Overview

A service mesh is a dedicated infrastructure layer for managing service-to-service communication in microservices architectures. It handles mutual TLS, traffic routing, retries, circuit breaking, observability, and access control — all without changing application code. The two dominant implementations are Istio (feature-rich, complex) and Linkerd (lightweight, simple). This skill covers when to adopt a mesh, core traffic patterns, and practical configuration.

Core Concepts

Service Mesh Architecture

Without Service Mesh:                    With Service Mesh:
┌─────────┐    ┌─────────┐              ┌─────────┬───────┐    ┌───────┬─────────┐
│ Service A│───→│Service B│              │Service A│Sidecar│═══→│Sidecar│Service B│
└─────────┘    └─────────┘              └─────────┴───────┘    └───────┴─────────┘
  App handles:                             Sidecar handles:
  - Retries                                - mTLS encryption
  - Timeouts                               - Retries & timeouts
  - Circuit breaking                       - Load balancing
  - Auth                                   - Observability (metrics, traces)
  - Metrics                                - Access control
                                           - Traffic shaping

Control Plane (Istiod / Linkerd):
  - Pushes config to sidecars
  - Manages certificates
  - Collects telemetry

When to Use a Service Mesh

Adopt when:

  • 10+ microservices with complex inter-service communication
  • You need uniform mTLS without modifying every service
  • You want canary deployments and traffic splitting
  • You need distributed tracing across polyglot services

Avoid when:

  • Monolith or small number of services (< 5)
  • Simple request patterns with no cross-cutting concerns
  • Resource-constrained environments (each sidecar adds ~50MB RAM)

Implementation Patterns

Istio Installation and Setup

# Install Istio
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=default -y

# Enable sidecar injection for a namespace
kubectl label namespace default istio-injection=enabled

# Verify installation
istioctl analyze

Istio Traffic Management

# VirtualService — traffic routing rules
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-service
spec:
  hosts:
    - api-service
  http:
    # Canary: 90% stable, 10% canary
    - route:
        - destination:
            host: api-service
            subset: stable
          weight: 90
        - destination:
            host: api-service
            subset: canary
          weight: 10
      timeout: 10s
      retries:
        attempts: 3
        perTryTimeout: 3s
        retryOn: 5xx,reset,connect-failure

---
# DestinationRule — subsets and connection settings
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: api-service
spec:
  host: api-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
  subsets:
    - name: stable
      labels:
        version: v1
    - name: canary
      labels:
        version: v2

Istio mTLS and Authorization

# PeerAuthentication — enforce mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: default
spec:
  mtls:
    mode: STRICT

---
# AuthorizationPolicy — service-to-service access control
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: api-policy
  namespace: default
spec:
  selector:
    matchLabels:
      app: api-service
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/default/sa/frontend"]
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/api/*"]
    - from:
        - source:
            principals: ["cluster.local/ns/default/sa/worker"]
      to:
        - operation:
            methods: ["POST"]
            paths: ["/api/internal/*"]

Linkerd Setup (Lightweight Alternative)

# Install Linkerd CLI
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh

# Install control plane
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
linkerd check

# Inject sidecar into a deployment
kubectl get deploy api-service -o yaml | linkerd inject - | kubectl apply -f -

# Install observability extension
linkerd viz install | kubectl apply -f -
linkerd viz dashboard &

Linkerd Traffic Split (SMI)

# TrafficSplit for canary deployments
apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: api-service-split
spec:
  service: api-service
  backends:
    - service: api-service-stable
      weight: 900
    - service: api-service-canary
      weight: 100

---
# ServiceProfile — per-route metrics and retries
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: api-service.default.svc.cluster.local
spec:
  routes:
    - name: GET /api/users
      condition:
        method: GET
        pathRegex: /api/users
      isRetryable: true
      timeout: 5s
    - name: POST /api/orders
      condition:
        method: POST
        pathRegex: /api/orders
      isRetryable: false
      timeout: 30s

Istio Gateway (Ingress)

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: main-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 443
        name: https
        protocol: HTTPS
      tls:
        mode: SIMPLE
        credentialName: example-tls-secret
      hosts:
        - "*.example.com"

---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-ingress
spec:
  hosts:
    - "api.example.com"
  gateways:
    - main-gateway
  http:
    - match:
        - uri:
            prefix: /v1
      route:
        - destination:
            host: api-v1
            port:
              number: 8080
    - match:
        - uri:
            prefix: /v2
      route:
        - destination:
            host: api-v2
            port:
              number: 8080

Best Practices

  • Start with Linkerd if you primarily need mTLS and observability — it has lower resource overhead and operational complexity than Istio, and covers 80% of service mesh use cases.
  • Use circuit breaking and outlier detection to prevent cascading failures — eject unhealthy instances from the load balancing pool automatically rather than letting one bad pod degrade the entire service.
  • Adopt incrementally — inject sidecars into one namespace at a time, starting with non-critical services, and validate that latency overhead (typically 1-3ms per hop) is acceptable.

Common Pitfalls

  • Sidecar resource overhead: Each Envoy sidecar consumes ~50MB RAM and adds latency. In large clusters with hundreds of pods, this adds up significantly. Set resource limits on sidecars and monitor their consumption.
  • mTLS migration breakage: Switching from PERMISSIVE to STRICT mTLS mode breaks communication with any service that does not have a sidecar injected. Audit all services and ensure sidecar injection is complete before enforcing strict mode.

Anti-Patterns

Over-engineering for hypothetical requirements. Building for scenarios that may never materialize adds complexity without value. Solve the problem in front of you first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide wastes time and introduces risk.

Premature abstraction. Creating elaborate frameworks before having enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at system boundaries. Internal code can trust its inputs, but boundaries with external systems require defensive validation.

Skipping documentation. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add networking-infrastructure-skills

Get CLI access →