Cloud Run
Deploy and manage containerized applications on Google Cloud Run serverless platform
You are an expert in Google Cloud Run for deploying and scaling containerized workloads on a fully managed serverless platform. ## Key Points - **Storing state on the local filesystem** -- The in-memory filesystem resets between instances and is not shared. Use Cloud Storage, Firestore, or a database for any persistent data. - **Listening on localhost instead of 0.0.0.0** -- The container must bind to all network interfaces. Binding only to 127.0.0.1 makes the service unreachable from the Cloud Run ingress. - **Using bloated container images** -- Large images increase cold start time and push costs up. Multi-stage builds and slim base images (node:20-slim, python:3.12-slim) make a measurable difference. - **Assuming sticky sessions or instance affinity** -- Cloud Run load-balances requests across instances with no affinity. Any state that depends on hitting the same instance will break. - Run any container that listens on a port (HTTP/gRPC) - Automatic horizontal scaling including scale-to-zero - Revision-based deployments with traffic splitting - Built-in HTTPS endpoints with custom domain support - VPC connectivity, IAM-based access control, and secret integration - **Set concurrency appropriately.** Default is 80 concurrent requests per instance. Lower it for CPU-heavy workloads, raise it for I/O-bound services. - **Use min-instances to avoid cold starts.** Setting `--min-instances 1` keeps at least one instance warm for latency-sensitive endpoints. - **Keep container startup fast.** Cloud Run bills from container start; lazy-load heavy dependencies and defer initialization where possible. ## Quick Example ```bash gcloud services enable run.googleapis.com gcloud auth configure-docker ``` ```bash gcloud run deploy my-service \ --source . \ --region us-central1 ```
skilldb get gcp-services-skills/Cloud RunFull skill: 180 linesGCP Service — Cloud Run
You are an expert in Google Cloud Run for deploying and scaling containerized workloads on a fully managed serverless platform.
Core Philosophy
Cloud Run is the sweet spot between serverless simplicity and container flexibility. You bring any container that listens on a port, and Cloud Run handles scaling, TLS termination, load balancing, and infrastructure. The contract is simple: your container must be stateless, start quickly, and respond to HTTP requests. If you can build it in a Dockerfile, you can run it on Cloud Run -- no vendor-specific framework or SDK required.
Fast startup is a first-class concern. Cloud Run bills from the moment your container starts, and cold starts directly affect user-perceived latency. Use slim base images (Alpine, distroless), lazy-load heavy dependencies, and keep your container image small. For latency-sensitive services, set min-instances to keep at least one instance warm. For background workers, use Cloud Run Jobs instead of keeping a service running idle.
Concurrency is your primary scaling lever. Unlike traditional FaaS where each function invocation gets its own instance, Cloud Run routes multiple concurrent requests to the same container. Set the concurrency limit based on your workload profile: high for I/O-bound services (web APIs waiting on databases), low for CPU-bound services (image processing). Getting concurrency right means fewer instances, lower cost, and more efficient resource usage.
Anti-Patterns
- Storing state on the local filesystem -- The in-memory filesystem resets between instances and is not shared. Use Cloud Storage, Firestore, or a database for any persistent data.
- Listening on localhost instead of 0.0.0.0 -- The container must bind to all network interfaces. Binding only to 127.0.0.1 makes the service unreachable from the Cloud Run ingress.
- Using bloated container images -- Large images increase cold start time and push costs up. Multi-stage builds and slim base images (node:20-slim, python:3.12-slim) make a measurable difference.
- Assuming sticky sessions or instance affinity -- Cloud Run load-balances requests across instances with no affinity. Any state that depends on hitting the same instance will break.
- Running long background work with default CPU allocation -- By default, CPU is throttled outside of request handling. Background tasks (cron jobs, queue workers) need
--cpu-throttling=false(always-on CPU) to avoid being starved between requests.
Overview
Cloud Run is a managed compute platform that runs stateless containers invoked via HTTP requests or events. It abstracts away infrastructure management, automatically scales from zero to thousands of instances, and charges only for resources consumed during request handling.
Key capabilities:
- Run any container that listens on a port (HTTP/gRPC)
- Automatic horizontal scaling including scale-to-zero
- Revision-based deployments with traffic splitting
- Built-in HTTPS endpoints with custom domain support
- VPC connectivity, IAM-based access control, and secret integration
Setup & Configuration
Enable the API and authenticate
gcloud services enable run.googleapis.com
gcloud auth configure-docker
Deploy a container image
gcloud run deploy my-service \
--image gcr.io/PROJECT_ID/my-image:latest \
--region us-central1 \
--platform managed \
--allow-unauthenticated \
--port 8080 \
--memory 512Mi \
--cpu 1 \
--min-instances 0 \
--max-instances 10 \
--set-env-vars "ENV=production"
Deploy from source (Cloud Buildpacks)
gcloud run deploy my-service \
--source . \
--region us-central1
Configure secrets
gcloud run deploy my-service \
--image gcr.io/PROJECT_ID/my-image \
--set-secrets "DB_PASS=db-password:latest" \
--region us-central1
Set up a custom domain
gcloud beta run domain-mappings create \
--service my-service \
--domain my-app.example.com \
--region us-central1
Core Patterns
Dockerfile for Cloud Run
FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
ENV PORT=8080
EXPOSE 8080
CMD ["node", "server.js"]
Listening on the correct port (Node.js)
const express = require('express');
const app = express();
const port = parseInt(process.env.PORT, 10) || 8080;
app.get('/', (req, res) => {
res.send('Hello from Cloud Run');
});
app.listen(port, '0.0.0.0', () => {
console.log(`Listening on port ${port}`);
});
Traffic splitting between revisions
gcloud run services update-traffic my-service \
--to-revisions my-service-v2=90,my-service-v1=10 \
--region us-central1
IAM-authenticated service-to-service calls
import google.auth.transport.requests
import google.oauth2.id_token
def call_cloud_run_service(url):
auth_req = google.auth.transport.requests.Request()
id_token = google.oauth2.id_token.fetch_id_token(auth_req, url)
headers = {"Authorization": f"Bearer {id_token}"}
response = requests.get(url, headers=headers)
return response.json()
Cloud Run Jobs (batch workloads)
gcloud run jobs create my-job \
--image gcr.io/PROJECT_ID/my-batch-image \
--region us-central1 \
--tasks 10 \
--max-retries 3
gcloud run jobs execute my-job --region us-central1
Connecting to Cloud SQL
gcloud run deploy my-service \
--image gcr.io/PROJECT_ID/my-image \
--add-cloudsql-instances PROJECT_ID:REGION:INSTANCE_NAME \
--set-env-vars "DB_HOST=/cloudsql/PROJECT_ID:REGION:INSTANCE_NAME"
Best Practices
- Set concurrency appropriately. Default is 80 concurrent requests per instance. Lower it for CPU-heavy workloads, raise it for I/O-bound services.
- Use min-instances to avoid cold starts. Setting
--min-instances 1keeps at least one instance warm for latency-sensitive endpoints. - Keep container startup fast. Cloud Run bills from container start; lazy-load heavy dependencies and defer initialization where possible.
- Use structured logging. Write JSON logs to stdout so they integrate with Cloud Logging automatically.
- Set resource limits explicitly. Define
--memoryand--cpubased on profiling rather than relying on defaults. - Use revision tags for testing. Tag revisions to get unique URLs for pre-production validation without routing live traffic.
- Store secrets in Secret Manager. Never bake secrets into images; mount them via
--set-secrets.
Common Pitfalls
- Listening on localhost instead of 0.0.0.0. The container must bind to all interfaces, not just 127.0.0.1.
- Exceeding the request timeout. Default is 300 seconds. Long-running tasks should use Cloud Run Jobs or Cloud Tasks instead.
- Ignoring the ephemeral filesystem. The in-memory filesystem resets between instances. Use Cloud Storage or a database for persistence.
- Assuming sticky sessions. Requests are load-balanced across instances with no affinity. Store session state externally.
- Not setting CPU allocation policy. By default, CPU is throttled outside of request handling. Use
--cpu-throttling=false(always-on CPU) for background work. - Large container images. Bloated images increase cold start time. Use slim base images and multi-stage builds.
Install this skill directly: skilldb add gcp-services-skills
Related Skills
Bigquery
Analyze large datasets with Google BigQuery serverless data warehouse and SQL engine
Cloud Functions
Build and deploy event-driven serverless functions on Google Cloud Functions
Cloud Storage
Store, retrieve, and manage objects in Google Cloud Storage buckets
Firestore
Model, query, and manage data with Google Cloud Firestore NoSQL document database
Adversarial Code Review
Adversarial implementation review methodology that validates code completeness against requirements with fresh objectivity. Uses a coach-player dialectical loop to catch real gaps in security, logic, and data flow.
API Design Testing
Design, document, and test APIs following RESTful principles, consistent