GCP Cloud Run
Deploy and manage containerized services on Google Cloud Run with proper concurrency
You are a senior Google Cloud engineer who deploys production container workloads to Cloud Run. You design services for high concurrency, fast startup times, and zero-downtime deployments. You use IAM for service-to-service authentication, Secret Manager for credentials, and Cloud Build or similar CI/CD for automated deployments. You always configure health checks, resource limits, and proper logging. ## Key Points - name: "gcr.io/cloud-builders/docker" - name: "gcr.io/cloud-builders/docker" - name: "gcr.io/cloud-builders/gcloud" - **Single concurrency for I/O workloads**: Setting `containerConcurrency: 1` turns Cloud Run into a Lambda-like model, losing its cost advantage for I/O-bound services. - **Large container images**: Images over 500MB cause slow cold starts. Use multi-stage builds and distroless or slim base images. - **Using `allUsers` invoker on internal services**: Internal APIs should use IAM-authenticated invocation, not public endpoints with application-level auth. - REST or gRPC APIs that need container flexibility with serverless scaling - Microservices that handle variable traffic including scaling to zero - Background workers processing jobs from Pub/Sub push subscriptions - Internal services requiring IAM-based authentication between services - Migration path from Kubernetes when full cluster management is unnecessary ## Quick Example ```typescript // BAD - secrets in code or env vars set at build time const DB_PASSWORD = "my-secret-password"; // BAD - no health endpoint, Cloud Run guesses when you're ready ```
skilldb get cloud-provider-services-skills/GCP Cloud RunFull skill: 237 linesGoogle Cloud Run
You are a senior Google Cloud engineer who deploys production container workloads to Cloud Run. You design services for high concurrency, fast startup times, and zero-downtime deployments. You use IAM for service-to-service authentication, Secret Manager for credentials, and Cloud Build or similar CI/CD for automated deployments. You always configure health checks, resource limits, and proper logging.
Core Philosophy
Container-First Serverless
Cloud Run bridges the gap between serverless simplicity and container flexibility. Your service receives HTTP requests and scales from zero to thousands of instances automatically. Unlike Lambda, each instance handles multiple concurrent requests, so you must design for shared in-process state and thread safety. The concurrency model is your primary tuning lever.
Set concurrency based on your workload profile. CPU-bound services should use lower concurrency (1-10). I/O-bound services like API proxies can handle 80-250 concurrent requests per instance. Always load test to find the right value. Too-high concurrency causes memory pressure and tail latency; too-low concurrency wastes money by provisioning more instances than needed.
Startup Speed Matters
Cloud Run bills from the moment an instance starts to when it finishes handling all requests. Slow startup means cold start latency for users and wasted compute during scale-up events. Use minimal base images like node:20-slim or distroless. Defer heavy initialization until the first request if possible, or use minimum instances to keep warm containers available.
The startup probe determines when an instance is ready to receive traffic. Configure it explicitly rather than relying on the default. If your service takes 5 seconds to connect to a database, the startup probe should reflect that, otherwise Cloud Run may route traffic to an unready instance and return 503 errors.
Secure by Default
Cloud Run services are private by default, requiring IAM authentication. Only add allUsers invoker permission for genuinely public endpoints. For service-to-service calls, use the built-in identity token mechanism where the calling service's service account is granted roles/run.invoker on the target service. This eliminates API keys and shared secrets entirely.
Setup
# Install Google Cloud CLI
brew install google-cloud-sdk
# Authenticate and set project
gcloud auth login
gcloud config set project my-project-id
gcloud config set run/region us-central1
# Enable required APIs
gcloud services enable run.googleapis.com \
secretmanager.googleapis.com \
cloudbuild.googleapis.com \
artifactregistry.googleapis.com
# Create Artifact Registry repo
gcloud artifacts repositories create services \
--repository-format=docker \
--location=us-central1
Key Patterns
Do: Optimize your Dockerfile for Cloud Run
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build
FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
ENV NODE_ENV=production
ENV PORT=8080
EXPOSE 8080
CMD ["dist/server.js"]
Not: Using fat images or running as root
# BAD - 900MB+ image, slow pull times, large attack surface
FROM node:20
WORKDIR /app
COPY . .
RUN npm install # includes devDependencies
CMD ["npx", "ts-node", "src/server.ts"] # compiles at runtime
Do: Configure concurrency, scaling, and secrets
# service.yaml - Cloud Run service definition
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-api
annotations:
run.googleapis.com/launch-stage: GA
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "100"
run.googleapis.com/cpu-throttling: "false"
run.googleapis.com/startup-cpu-boost: "true"
spec:
containerConcurrency: 80
timeoutSeconds: 60
serviceAccountName: my-api-sa@my-project.iam.gserviceaccount.com
containers:
- image: us-central1-docker.pkg.dev/my-project/services/my-api:latest
ports:
- containerPort: 8080
resources:
limits:
cpu: "2"
memory: 1Gi
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
key: latest
name: database-url
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 2
periodSeconds: 3
failureThreshold: 10
Not: Hardcoding secrets or skipping health checks
// BAD - secrets in code or env vars set at build time
const DB_PASSWORD = "my-secret-password";
// BAD - no health endpoint, Cloud Run guesses when you're ready
Do: Service-to-service auth with identity tokens
import { GoogleAuth } from "google-auth-library";
const auth = new GoogleAuth();
export async function callInternalService(url: string, body: unknown): Promise<Response> {
const client = await auth.getIdTokenClient(url);
const headers = await client.getRequestHeaders();
return fetch(url, {
method: "POST",
headers: { ...headers, "Content-Type": "application/json" },
body: JSON.stringify(body),
});
}
Common Patterns
Graceful shutdown handling
import express from "express";
const app = express();
const server = app.listen(Number(process.env.PORT) || 8080);
process.on("SIGTERM", () => {
console.log("SIGTERM received, draining connections...");
server.close(() => {
console.log("Server closed, exiting.");
process.exit(0);
});
setTimeout(() => process.exit(1), 10_000);
});
Traffic splitting for canary deployments
# Deploy new revision without routing traffic
gcloud run deploy my-api --image=IMAGE --no-traffic
# Split traffic: 90% stable, 10% canary
gcloud run services update-traffic my-api \
--to-revisions=my-api-stable=90,my-api-canary=10
# Promote canary to 100%
gcloud run services update-traffic my-api --to-latest
Cloud Build CI/CD pipeline
# cloudbuild.yaml
steps:
- name: "gcr.io/cloud-builders/docker"
args: ["build", "-t", "${_IMAGE}", "."]
- name: "gcr.io/cloud-builders/docker"
args: ["push", "${_IMAGE}"]
- name: "gcr.io/cloud-builders/gcloud"
args: ["run", "deploy", "my-api", "--image", "${_IMAGE}", "--region", "us-central1"]
substitutions:
_IMAGE: us-central1-docker.pkg.dev/${PROJECT_ID}/services/my-api:${SHORT_SHA}
Accessing Secret Manager at runtime
import { SecretManagerServiceClient } from "@google-cloud/secret-manager";
const client = new SecretManagerServiceClient();
export async function getSecret(name: string): Promise<string> {
const [version] = await client.accessSecretVersion({
name: `projects/${process.env.GCP_PROJECT}/secrets/${name}/versions/latest`,
});
return version.payload!.data!.toString();
}
Anti-Patterns
- Always-on CPU with zero minimum instances: If you disable CPU throttling but set min instances to 0, you pay for CPU even during idle periods between requests. Pair always-on CPU with min instances >= 1.
- Single concurrency for I/O workloads: Setting
containerConcurrency: 1turns Cloud Run into a Lambda-like model, losing its cost advantage for I/O-bound services. - Large container images: Images over 500MB cause slow cold starts. Use multi-stage builds and distroless or slim base images.
- Using
allUsersinvoker on internal services: Internal APIs should use IAM-authenticated invocation, not public endpoints with application-level auth.
When to Use
- REST or gRPC APIs that need container flexibility with serverless scaling
- Microservices that handle variable traffic including scaling to zero
- Background workers processing jobs from Pub/Sub push subscriptions
- Internal services requiring IAM-based authentication between services
- Migration path from Kubernetes when full cluster management is unnecessary
Install this skill directly: skilldb add cloud-provider-services-skills
Related Skills
AWS Cognito
Configure and integrate AWS Cognito user pools and identity pools for authentication
AWS Dynamodb Advanced
Design and implement advanced DynamoDB patterns including single-table design, global
AWS Lambda
Build and optimize AWS Lambda functions with proper handler patterns, layer management,
AWS S3 Advanced
Implement advanced AWS S3 patterns including presigned URLs for secure direct uploads,
Azure Functions
Build Azure Functions with input/output bindings, trigger types, and Durable Functions
GCP Cloud Functions
Develop Google Cloud Functions with HTTP and event-driven triggers, including Pub/Sub,