Technology & EngineeringGcp Services180 lines

Cloud Run

Deploy and manage containerized applications on Google Cloud Run serverless platform

Quick Summary31 lines

You are an expert in Google Cloud Run for deploying and scaling containerized workloads on a fully managed serverless platform.

## Key Points

- **Storing state on the local filesystem** -- The in-memory filesystem resets between instances and is not shared. Use Cloud Storage, Firestore, or a database for any persistent data.
- **Listening on localhost instead of 0.0.0.0** -- The container must bind to all network interfaces. Binding only to 127.0.0.1 makes the service unreachable from the Cloud Run ingress.
- **Using bloated container images** -- Large images increase cold start time and push costs up. Multi-stage builds and slim base images (node:20-slim, python:3.12-slim) make a measurable difference.
- **Assuming sticky sessions or instance affinity** -- Cloud Run load-balances requests across instances with no affinity. Any state that depends on hitting the same instance will break.
- Run any container that listens on a port (HTTP/gRPC)
- Automatic horizontal scaling including scale-to-zero
- Revision-based deployments with traffic splitting
- Built-in HTTPS endpoints with custom domain support
- VPC connectivity, IAM-based access control, and secret integration
- **Set concurrency appropriately.** Default is 80 concurrent requests per instance. Lower it for CPU-heavy workloads, raise it for I/O-bound services.
- **Use min-instances to avoid cold starts.** Setting `--min-instances 1` keeps at least one instance warm for latency-sensitive endpoints.
- **Keep container startup fast.** Cloud Run bills from container start; lazy-load heavy dependencies and defer initialization where possible.

## Quick Example

```bash
gcloud services enable run.googleapis.com
gcloud auth configure-docker
```

```bash
gcloud run deploy my-service \
--source . \
--region us-central1
```

skilldb get gcp-services-skills/Cloud RunFull skill: 180 lines

Paste into your CLAUDE.md or agent config

GCP Service — Cloud Run

You are an expert in Google Cloud Run for deploying and scaling containerized workloads on a fully managed serverless platform.

Core Philosophy

Cloud Run is the sweet spot between serverless simplicity and container flexibility. You bring any container that listens on a port, and Cloud Run handles scaling, TLS termination, load balancing, and infrastructure. The contract is simple: your container must be stateless, start quickly, and respond to HTTP requests. If you can build it in a Dockerfile, you can run it on Cloud Run -- no vendor-specific framework or SDK required.

Fast startup is a first-class concern. Cloud Run bills from the moment your container starts, and cold starts directly affect user-perceived latency. Use slim base images (Alpine, distroless), lazy-load heavy dependencies, and keep your container image small. For latency-sensitive services, set min-instances to keep at least one instance warm. For background workers, use Cloud Run Jobs instead of keeping a service running idle.

Concurrency is your primary scaling lever. Unlike traditional FaaS where each function invocation gets its own instance, Cloud Run routes multiple concurrent requests to the same container. Set the concurrency limit based on your workload profile: high for I/O-bound services (web APIs waiting on databases), low for CPU-bound services (image processing). Getting concurrency right means fewer instances, lower cost, and more efficient resource usage.

Anti-Patterns

Storing state on the local filesystem -- The in-memory filesystem resets between instances and is not shared. Use Cloud Storage, Firestore, or a database for any persistent data.
Listening on localhost instead of 0.0.0.0 -- The container must bind to all network interfaces. Binding only to 127.0.0.1 makes the service unreachable from the Cloud Run ingress.
Using bloated container images -- Large images increase cold start time and push costs up. Multi-stage builds and slim base images (node:20-slim, python:3.12-slim) make a measurable difference.
Assuming sticky sessions or instance affinity -- Cloud Run load-balances requests across instances with no affinity. Any state that depends on hitting the same instance will break.
Running long background work with default CPU allocation -- By default, CPU is throttled outside of request handling. Background tasks (cron jobs, queue workers) need --cpu-throttling=false (always-on CPU) to avoid being starved between requests.

Overview

Cloud Run is a managed compute platform that runs stateless containers invoked via HTTP requests or events. It abstracts away infrastructure management, automatically scales from zero to thousands of instances, and charges only for resources consumed during request handling.

Key capabilities:

Run any container that listens on a port (HTTP/gRPC)
Automatic horizontal scaling including scale-to-zero
Revision-based deployments with traffic splitting
Built-in HTTPS endpoints with custom domain support
VPC connectivity, IAM-based access control, and secret integration

Setup & Configuration

Enable the API and authenticate

gcloud services enable run.googleapis.com
gcloud auth configure-docker

Deploy a container image

gcloud run deploy my-service \
  --image gcr.io/PROJECT_ID/my-image:latest \
  --region us-central1 \
  --platform managed \
  --allow-unauthenticated \
  --port 8080 \
  --memory 512Mi \
  --cpu 1 \
  --min-instances 0 \
  --max-instances 10 \
  --set-env-vars "ENV=production"

Deploy from source (Cloud Buildpacks)

gcloud run deploy my-service \
  --source . \
  --region us-central1

Configure secrets

gcloud run deploy my-service \
  --image gcr.io/PROJECT_ID/my-image \
  --set-secrets "DB_PASS=db-password:latest" \
  --region us-central1

Set up a custom domain

gcloud beta run domain-mappings create \
  --service my-service \
  --domain my-app.example.com \
  --region us-central1

Core Patterns

Dockerfile for Cloud Run

FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
ENV PORT=8080
EXPOSE 8080
CMD ["node", "server.js"]

Listening on the correct port (Node.js)

const express = require('express');
const app = express();
const port = parseInt(process.env.PORT, 10) || 8080;

app.get('/', (req, res) => {
  res.send('Hello from Cloud Run');
});

app.listen(port, '0.0.0.0', () => {
  console.log(`Listening on port ${port}`);
});

Traffic splitting between revisions

gcloud run services update-traffic my-service \
  --to-revisions my-service-v2=90,my-service-v1=10 \
  --region us-central1

IAM-authenticated service-to-service calls

import google.auth.transport.requests
import google.oauth2.id_token

def call_cloud_run_service(url):
    auth_req = google.auth.transport.requests.Request()
    id_token = google.oauth2.id_token.fetch_id_token(auth_req, url)
    headers = {"Authorization": f"Bearer {id_token}"}
    response = requests.get(url, headers=headers)
    return response.json()

Cloud Run Jobs (batch workloads)

gcloud run jobs create my-job \
  --image gcr.io/PROJECT_ID/my-batch-image \
  --region us-central1 \
  --tasks 10 \
  --max-retries 3

gcloud run jobs execute my-job --region us-central1

Connecting to Cloud SQL

gcloud run deploy my-service \
  --image gcr.io/PROJECT_ID/my-image \
  --add-cloudsql-instances PROJECT_ID:REGION:INSTANCE_NAME \
  --set-env-vars "DB_HOST=/cloudsql/PROJECT_ID:REGION:INSTANCE_NAME"

Best Practices

Set concurrency appropriately. Default is 80 concurrent requests per instance. Lower it for CPU-heavy workloads, raise it for I/O-bound services.
Use min-instances to avoid cold starts. Setting --min-instances 1 keeps at least one instance warm for latency-sensitive endpoints.
Keep container startup fast. Cloud Run bills from container start; lazy-load heavy dependencies and defer initialization where possible.
Use structured logging. Write JSON logs to stdout so they integrate with Cloud Logging automatically.
Set resource limits explicitly. Define --memory and --cpu based on profiling rather than relying on defaults.
Use revision tags for testing. Tag revisions to get unique URLs for pre-production validation without routing live traffic.
Store secrets in Secret Manager. Never bake secrets into images; mount them via --set-secrets.

Common Pitfalls

Listening on localhost instead of 0.0.0.0. The container must bind to all interfaces, not just 127.0.0.1.
Exceeding the request timeout. Default is 300 seconds. Long-running tasks should use Cloud Run Jobs or Cloud Tasks instead.
Ignoring the ephemeral filesystem. The in-memory filesystem resets between instances. Use Cloud Storage or a database for persistence.
Assuming sticky sessions. Requests are load-balanced across instances with no affinity. Store session state externally.
Not setting CPU allocation policy. By default, CPU is throttled outside of request handling. Use --cpu-throttling=false (always-on CPU) for background work.
Large container images. Bloated images increase cold start time. Use slim base images and multi-stage builds.

Install this skill directly: skilldb add gcp-services-skills

Get CLI access →