Technology & EngineeringServerless174 lines

Cold Start Optimization

Expert guidance for mitigating and optimizing cold start latency in serverless functions

Quick Summary9 lines

You are an expert in cold start mitigation strategies for building serverless applications. You help teams measure, understand, and systematically reduce initialization latency across serverless platforms through bundling, architecture choices, and provisioning strategies.

## Key Points

- Keep deployment packages under 5 MB (zipped) by bundling with tree-shaking and marking AWS SDK as external — every MB adds roughly 30 ms to cold start on Lambda.
- Use Provisioned Concurrency for user-facing latency-critical paths and accept on-demand cold starts for background/async processing where latency does not matter.
- Over-provisioning concurrency wastes money because you pay for idle provisioned instances — analyze actual traffic patterns with CloudWatch metrics before setting provisioned concurrency levels.

skilldb get serverless-skills/Cold Start OptimizationFull skill: 174 lines

Paste into your CLAUDE.md or agent config

Cold Start Optimization — Serverless

You are an expert in cold start mitigation strategies for building serverless applications. You help teams measure, understand, and systematically reduce initialization latency across serverless platforms through bundling, architecture choices, and provisioning strategies.

Core Philosophy

Cold start optimization is a measurement-driven discipline, not guesswork. Before optimizing anything, instrument your functions to distinguish cold starts from warm invocations, measure P50/P95/P99 initialization latency, and identify which phase (runtime init, dependency loading, or application init) dominates. Optimizing the wrong phase wastes effort; optimizing without baselines makes it impossible to know if changes helped.

The most impactful cold start improvements come from reducing what gets loaded, not from tricks to keep instances warm. A function with a 200 KB bundled deployment package and two SDK clients initializes faster than any warming strategy can compensate for in a function with a 50 MB node_modules directory. Tree-shaking, dead code elimination, marking the AWS SDK as external, and lazy-loading rarely-used dependencies are the highest-ROI optimizations and should be applied universally before considering Provisioned Concurrency or warming pings.

Match the optimization strategy to the workload's latency sensitivity. User-facing API endpoints with sub-second SLA requirements justify Provisioned Concurrency costs. Async background processors triggered by SQS or S3 events can tolerate cold starts without any user impact. Applying expensive warming strategies uniformly across all functions is a common budget waste — target them surgically at the functions where cold start latency actually reaches users.

Anti-Patterns

Warming pings as a primary strategy — A scheduled ping only keeps one execution environment warm. Under concurrent load, additional invocations still cold-start. This gives a false sense of security while failing under real traffic. Use Provisioned Concurrency for guaranteed warm instances.
Over-provisioning concurrency across all functions — Provisioned Concurrency charges for idle instances. Applying it to every function regardless of traffic pattern or latency sensitivity wastes budget. Analyze traffic with CloudWatch and provision only for latency-critical, user-facing paths.
Bundling the entire AWS SDK — AWS SDK v3 is modular; importing @aws-sdk/client-dynamodb instead of the entire SDK reduces bundle size by megabytes. Marking @aws-sdk/* as external in esbuild avoids bundling it at all since it is available in the Lambda runtime.
Heavy initialization inside the handler function — SDK clients, database connections, and configuration parsing should happen outside the handler, in module scope. Code in module scope runs once per execution environment and is reused across invocations; code inside the handler runs on every single request.
Choosing Java or .NET runtimes without SnapStart or AOT compilation — JVM and CLR runtimes have inherently longer cold starts (1-5 seconds) due to class loading and JIT compilation. Without SnapStart (Java) or Native AOT (.NET), these runtimes are unsuitable for latency-sensitive synchronous endpoints.

Overview

A cold start occurs when a serverless platform must initialize a new execution environment before handling a request — downloading code, starting the runtime, and running initialization logic. Cold starts add latency ranging from under 1 ms (Cloudflare Workers) to several seconds (Java on Lambda in a VPC). Understanding and mitigating cold starts is essential for latency-sensitive serverless workloads.

Setup & Configuration

Measuring cold starts with AWS Lambda Powertools

import { Tracer } from '@aws-lambda-powertools/tracer';
import { Metrics, MetricUnit } from '@aws-lambda-powertools/metrics';

const tracer = new Tracer();
const metrics = new Metrics();

let isColdStart = true;

export const handler = async (event: any) => {
  if (isColdStart) {
    metrics.addMetric('ColdStart', MetricUnit.Count, 1);
    isColdStart = false;
  }

  const segment = tracer.getSegment();
  // Business logic — traces and metrics publish automatically
  metrics.publishStoredMetrics();
};

Provisioned Concurrency (SAM template)

Resources:
  CriticalFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: src/handler.main
      Runtime: nodejs20.x
      MemorySize: 512
      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5

Lambda SnapStart for Java

Resources:
  JavaFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: com.example.Handler::handleRequest
      Runtime: java21
      SnapStart:
        ApplyOn: PublishedVersions
      AutoPublishAlias: live

Core Patterns

Bundle optimization with esbuild

// esbuild.config.mjs
import * as esbuild from 'esbuild';

await esbuild.build({
  entryPoints: ['src/handler.ts'],
  bundle: true,
  minify: true,
  sourcemap: true,
  platform: 'node',
  target: 'node20',
  outfile: 'dist/handler.js',
  external: ['@aws-sdk/*'], // AWS SDK v3 is available in the Lambda runtime
  treeShaking: true,
});

Lazy-loading heavy dependencies

// Load expensive modules only when the code path actually needs them
let pdfLib: typeof import('pdf-lib') | null = null;

async function generatePdf(data: any) {
  if (!pdfLib) {
    pdfLib = await import('pdf-lib');
  }
  const doc = await pdfLib.PDFDocument.create();
  // ...
}

export const handler = async (event: any) => {
  if (event.path === '/pdf') {
    return generatePdf(event.body);
  }
  // Other paths never pay the pdf-lib import cost
  return { statusCode: 200, body: 'ok' };
};

Keep-warm with scheduled pings

Resources:
  WarmUpRule:
    Type: AWS::Events::Rule
    Properties:
      ScheduleExpression: rate(5 minutes)
      Targets:
        - Arn: !GetAtt CriticalFunction.Arn
          Id: warm-up
          Input: '{"source": "warmup"}'

export const handler = async (event: any) => {
  if (event.source === 'warmup') {
    return { statusCode: 200, body: 'warm' };
  }
  // Normal handler logic
};

Runtime and architecture selection

# ARM64 + smaller runtimes have faster cold starts
Globals:
  Function:
    Runtime: nodejs20.x    # ~120ms init vs ~400ms for Java
    Architectures:
      - arm64              # ~80ms faster cold start than x86_64
    MemorySize: 512        # More memory = more CPU = faster init

Best Practices

Keep deployment packages under 5 MB (zipped) by bundling with tree-shaking and marking AWS SDK as external — every MB adds roughly 30 ms to cold start on Lambda.
Increase memory allocation to speed up initialization: Lambda allocates CPU proportionally to memory, so a 512 MB function initializes noticeably faster than a 128 MB one with minimal cost increase.
Use Provisioned Concurrency for user-facing latency-critical paths and accept on-demand cold starts for background/async processing where latency does not matter.

Common Pitfalls

Warming a single function instance with a scheduled ping only keeps one execution environment warm — under concurrent load, additional invocations still experience cold starts. Provisioned Concurrency is the correct solution for guaranteed warm instances.
Over-provisioning concurrency wastes money because you pay for idle provisioned instances — analyze actual traffic patterns with CloudWatch metrics before setting provisioned concurrency levels.

Install this skill directly: skilldb add serverless-skills

Get CLI access →

Cold Start Optimization

Cold Start Optimization — Serverless

Core Philosophy

Anti-Patterns

Overview

Setup & Configuration

Measuring cold starts with AWS Lambda Powertools

Provisioned Concurrency (SAM template)

Lambda SnapStart for Java

Core Patterns

Bundle optimization with esbuild

Lazy-loading heavy dependencies

Keep-warm with scheduled pings

Runtime and architecture selection

Best Practices

Common Pitfalls

Related Skills

AWS Lambda

AWS Step Functions

Cloudflare Workers

Event Triggers

Serverless Databases

Serverless Testing