Skip to main content
Technology & EngineeringGcp Services232 lines

Cloud Storage

Store, retrieve, and manage objects in Google Cloud Storage buckets

Quick Summary31 lines
You are an expert in Google Cloud Storage for durable, scalable object storage across all GCP workloads.

## Key Points

- **Making buckets publicly accessible for convenience** -- Public buckets are a data breach vector. Use signed URLs for temporary read/write access and IAM for service-to-service access.
- **Listing entire buckets without prefix filters** -- Enumerating millions of objects is slow and charges per request. Always scope listings with a prefix that matches your access pattern.
- **Uploading files without setting Content-Type** -- Missing or incorrect content types cause problems when serving files via signed URLs, browser downloads, or CDN integration.
- Unlimited object storage with per-object size up to 5 TiB
- Storage classes: Standard, Nearline, Coldline, Archive
- Lifecycle management and retention policies
- Signed URLs for time-limited access
- Strong global consistency for all operations
- Integration with BigQuery, Dataflow, Cloud Functions triggers, and Transfer Service
- **Enable Uniform Bucket-Level Access.** This disables per-object ACLs and uses only IAM for access control, simplifying permissions management.
- **Choose the right storage class.** Use Standard for frequently accessed data, Nearline for monthly access, Coldline for quarterly, and Archive for yearly.
- **Use lifecycle rules to manage costs.** Automatically transition objects to cheaper classes and delete expired data.

## Quick Example

```bash
gcloud services enable storage.googleapis.com
```

```bash
gcloud storage buckets create gs://my-bucket \
  --location=us-central1 \
  --default-storage-class=STANDARD \
  --uniform-bucket-level-access
```
skilldb get gcp-services-skills/Cloud StorageFull skill: 232 lines
Paste into your CLAUDE.md or agent config

GCP Service — Cloud Storage

You are an expert in Google Cloud Storage for durable, scalable object storage across all GCP workloads.

Core Philosophy

Cloud Storage is GCP's universal data layer. Whether you are storing user uploads, data lake files, backups, or static assets, Cloud Storage provides a single API with strong global consistency, virtually unlimited capacity, and fine-grained access control. The key to using it well is choosing the right storage class for each object's access pattern and automating lifecycle transitions so costs stay proportional to actual usage.

Security starts with Uniform Bucket-Level Access. Legacy per-object ACLs create a permissions model that is nearly impossible to audit at scale. With Uniform access enabled, all permissions flow through IAM, making it straightforward to answer "who can access what" with standard IAM policy analysis tools. Combine this with signed URLs for temporary access grants, and you never need to make a bucket publicly accessible.

Treat object organization as a convention, not a hierarchy. Cloud Storage has a flat namespace -- there are no directories, only key prefixes that look like paths. Use consistent prefix conventions (e.g., year/month/day/ for time-series data, tenant-id/ for multi-tenant isolation) and always filter with a prefix when listing objects. Listing an entire bucket with millions of objects is slow, expensive, and almost always unnecessary.

Anti-Patterns

  • Making buckets publicly accessible for convenience -- Public buckets are a data breach vector. Use signed URLs for temporary read/write access and IAM for service-to-service access.
  • Skipping lifecycle rules and letting storage grow unbounded -- Data that is never accessed still costs money in Standard class. Transition cold data to Nearline/Coldline/Archive and delete expired data automatically.
  • Listing entire buckets without prefix filters -- Enumerating millions of objects is slow and charges per request. Always scope listings with a prefix that matches your access pattern.
  • Uploading files without setting Content-Type -- Missing or incorrect content types cause problems when serving files via signed URLs, browser downloads, or CDN integration.
  • Embedding secrets or sensitive identifiers in bucket names -- Bucket names are globally unique and publicly enumerable. Anyone can discover a bucket name; rely on IAM and signed URLs for access control, not name obscurity.

Overview

Cloud Storage is a unified object storage service for structured and unstructured data. It provides high availability, global edge caching, and multiple storage classes to optimize cost versus access frequency.

Key capabilities:

  • Unlimited object storage with per-object size up to 5 TiB
  • Storage classes: Standard, Nearline, Coldline, Archive
  • Lifecycle management and retention policies
  • Signed URLs for time-limited access
  • Strong global consistency for all operations
  • Integration with BigQuery, Dataflow, Cloud Functions triggers, and Transfer Service

Setup & Configuration

Enable the API

gcloud services enable storage.googleapis.com

Create a bucket

gcloud storage buckets create gs://my-bucket \
  --location=us-central1 \
  --default-storage-class=STANDARD \
  --uniform-bucket-level-access

Set lifecycle rules

# lifecycle.json
cat > /tmp/lifecycle.json << 'EOF'
{
  "rule": [
    {
      "action": { "type": "SetStorageClass", "storageClass": "NEARLINE" },
      "condition": { "age": 30 }
    },
    {
      "action": { "type": "Delete" },
      "condition": { "age": 365 }
    }
  ]
}
EOF

gcloud storage buckets update gs://my-bucket \
  --lifecycle-file=/tmp/lifecycle.json

Configure CORS

cat > /tmp/cors.json << 'EOF'
[
  {
    "origin": ["https://example.com"],
    "method": ["GET", "PUT", "POST"],
    "responseHeader": ["Content-Type"],
    "maxAgeSeconds": 3600
  }
]
EOF

gcloud storage buckets update gs://my-bucket --cors-file=/tmp/cors.json

Core Patterns

Upload and download with gcloud

# Upload a file
gcloud storage cp local-file.txt gs://my-bucket/path/file.txt

# Upload a directory recursively
gcloud storage cp -r ./data gs://my-bucket/data/

# Download a file
gcloud storage cp gs://my-bucket/path/file.txt ./local-file.txt

# Sync a directory
gcloud storage rsync -r ./local-dir gs://my-bucket/remote-dir

Upload and download with client libraries (Python)

from google.cloud import storage

client = storage.Client()
bucket = client.bucket("my-bucket")

# Upload
blob = bucket.blob("path/file.txt")
blob.upload_from_filename("local-file.txt")

# Upload from string
blob = bucket.blob("path/data.json")
blob.upload_from_string('{"key": "value"}', content_type="application/json")

# Download
blob = bucket.blob("path/file.txt")
blob.download_to_filename("local-file.txt")

# Read as bytes
content = blob.download_as_bytes()

Generate signed URLs

from google.cloud import storage
import datetime

client = storage.Client()
bucket = client.bucket("my-bucket")
blob = bucket.blob("private/report.pdf")

# Signed URL valid for 1 hour
url = blob.generate_signed_url(
    version="v4",
    expiration=datetime.timedelta(hours=1),
    method="GET",
)
print(f"Signed URL: {url}")

Signed URL for uploads

upload_url = blob.generate_signed_url(
    version="v4",
    expiration=datetime.timedelta(minutes=15),
    method="PUT",
    content_type="application/octet-stream",
)
# Client can PUT directly to this URL

Stream large files (Node.js)

const { Storage } = require('@google-cloud/storage');
const storage = new Storage();

// Upload stream
const fs = require('fs');
const bucket = storage.bucket('my-bucket');

fs.createReadStream('./large-file.csv')
  .pipe(bucket.file('uploads/large-file.csv').createWriteStream({
    resumable: true,
    contentType: 'text/csv',
  }))
  .on('finish', () => console.log('Upload complete'));

// Download stream
bucket.file('uploads/large-file.csv')
  .createReadStream()
  .pipe(fs.createWriteStream('./downloaded.csv'))
  .on('finish', () => console.log('Download complete'));

Event notifications with Pub/Sub

gcloud storage buckets notifications create gs://my-bucket \
  --topic=projects/my-project/topics/storage-events \
  --event-types=OBJECT_FINALIZE,OBJECT_DELETE

Bucket-level IAM

# Grant read access to all users (public bucket)
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
  --member=allUsers \
  --role=roles/storage.objectViewer

# Grant access to a service account
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
  --member=serviceAccount:my-sa@my-project.iam.gserviceaccount.com \
  --role=roles/storage.objectAdmin

Best Practices

  • Enable Uniform Bucket-Level Access. This disables per-object ACLs and uses only IAM for access control, simplifying permissions management.
  • Choose the right storage class. Use Standard for frequently accessed data, Nearline for monthly access, Coldline for quarterly, and Archive for yearly.
  • Use lifecycle rules to manage costs. Automatically transition objects to cheaper classes and delete expired data.
  • Enable Object Versioning for critical buckets. Versioning protects against accidental deletes and overwrites.
  • Use resumable uploads for large files. Files over 5 MB should use resumable uploads to handle network interruptions.
  • Organize with prefixes, not deep nesting. Cloud Storage has a flat namespace. Use prefixes like year/month/day/ for logical organization.
  • Set appropriate retention policies. Use retention policies for compliance requirements to prevent premature deletion.

Common Pitfalls

  • Treating bucket names as private. Bucket names are globally unique and publicly enumerable. Do not embed secrets or sensitive identifiers in names.
  • Listing objects without a prefix. Listing an entire bucket with millions of objects is slow and expensive. Always filter with a prefix.
  • Ignoring egress costs. Data transfer out of Cloud Storage to the internet incurs charges. Use CDN (Cloud CDN) for high-traffic public content.
  • Not setting Content-Type on upload. Missing or incorrect content types cause problems when serving files via signed URLs or static hosting.
  • Using fine-grained ACLs with Uniform access enabled. These are mutually exclusive. Pick one access control model and stay consistent.
  • Forgetting to handle 404 in application code. Checking existence with a separate call then downloading is a race condition. Catch NotFound on download instead.

Install this skill directly: skilldb add gcp-services-skills

Get CLI access →