Cloud Storage
Store, retrieve, and manage objects in Google Cloud Storage buckets
You are an expert in Google Cloud Storage for durable, scalable object storage across all GCP workloads. ## Key Points - **Making buckets publicly accessible for convenience** -- Public buckets are a data breach vector. Use signed URLs for temporary read/write access and IAM for service-to-service access. - **Listing entire buckets without prefix filters** -- Enumerating millions of objects is slow and charges per request. Always scope listings with a prefix that matches your access pattern. - **Uploading files without setting Content-Type** -- Missing or incorrect content types cause problems when serving files via signed URLs, browser downloads, or CDN integration. - Unlimited object storage with per-object size up to 5 TiB - Storage classes: Standard, Nearline, Coldline, Archive - Lifecycle management and retention policies - Signed URLs for time-limited access - Strong global consistency for all operations - Integration with BigQuery, Dataflow, Cloud Functions triggers, and Transfer Service - **Enable Uniform Bucket-Level Access.** This disables per-object ACLs and uses only IAM for access control, simplifying permissions management. - **Choose the right storage class.** Use Standard for frequently accessed data, Nearline for monthly access, Coldline for quarterly, and Archive for yearly. - **Use lifecycle rules to manage costs.** Automatically transition objects to cheaper classes and delete expired data. ## Quick Example ```bash gcloud services enable storage.googleapis.com ``` ```bash gcloud storage buckets create gs://my-bucket \ --location=us-central1 \ --default-storage-class=STANDARD \ --uniform-bucket-level-access ```
skilldb get gcp-services-skills/Cloud StorageFull skill: 232 linesGCP Service — Cloud Storage
You are an expert in Google Cloud Storage for durable, scalable object storage across all GCP workloads.
Core Philosophy
Cloud Storage is GCP's universal data layer. Whether you are storing user uploads, data lake files, backups, or static assets, Cloud Storage provides a single API with strong global consistency, virtually unlimited capacity, and fine-grained access control. The key to using it well is choosing the right storage class for each object's access pattern and automating lifecycle transitions so costs stay proportional to actual usage.
Security starts with Uniform Bucket-Level Access. Legacy per-object ACLs create a permissions model that is nearly impossible to audit at scale. With Uniform access enabled, all permissions flow through IAM, making it straightforward to answer "who can access what" with standard IAM policy analysis tools. Combine this with signed URLs for temporary access grants, and you never need to make a bucket publicly accessible.
Treat object organization as a convention, not a hierarchy. Cloud Storage has a flat namespace -- there are no directories, only key prefixes that look like paths. Use consistent prefix conventions (e.g., year/month/day/ for time-series data, tenant-id/ for multi-tenant isolation) and always filter with a prefix when listing objects. Listing an entire bucket with millions of objects is slow, expensive, and almost always unnecessary.
Anti-Patterns
- Making buckets publicly accessible for convenience -- Public buckets are a data breach vector. Use signed URLs for temporary read/write access and IAM for service-to-service access.
- Skipping lifecycle rules and letting storage grow unbounded -- Data that is never accessed still costs money in Standard class. Transition cold data to Nearline/Coldline/Archive and delete expired data automatically.
- Listing entire buckets without prefix filters -- Enumerating millions of objects is slow and charges per request. Always scope listings with a prefix that matches your access pattern.
- Uploading files without setting Content-Type -- Missing or incorrect content types cause problems when serving files via signed URLs, browser downloads, or CDN integration.
- Embedding secrets or sensitive identifiers in bucket names -- Bucket names are globally unique and publicly enumerable. Anyone can discover a bucket name; rely on IAM and signed URLs for access control, not name obscurity.
Overview
Cloud Storage is a unified object storage service for structured and unstructured data. It provides high availability, global edge caching, and multiple storage classes to optimize cost versus access frequency.
Key capabilities:
- Unlimited object storage with per-object size up to 5 TiB
- Storage classes: Standard, Nearline, Coldline, Archive
- Lifecycle management and retention policies
- Signed URLs for time-limited access
- Strong global consistency for all operations
- Integration with BigQuery, Dataflow, Cloud Functions triggers, and Transfer Service
Setup & Configuration
Enable the API
gcloud services enable storage.googleapis.com
Create a bucket
gcloud storage buckets create gs://my-bucket \
--location=us-central1 \
--default-storage-class=STANDARD \
--uniform-bucket-level-access
Set lifecycle rules
# lifecycle.json
cat > /tmp/lifecycle.json << 'EOF'
{
"rule": [
{
"action": { "type": "SetStorageClass", "storageClass": "NEARLINE" },
"condition": { "age": 30 }
},
{
"action": { "type": "Delete" },
"condition": { "age": 365 }
}
]
}
EOF
gcloud storage buckets update gs://my-bucket \
--lifecycle-file=/tmp/lifecycle.json
Configure CORS
cat > /tmp/cors.json << 'EOF'
[
{
"origin": ["https://example.com"],
"method": ["GET", "PUT", "POST"],
"responseHeader": ["Content-Type"],
"maxAgeSeconds": 3600
}
]
EOF
gcloud storage buckets update gs://my-bucket --cors-file=/tmp/cors.json
Core Patterns
Upload and download with gcloud
# Upload a file
gcloud storage cp local-file.txt gs://my-bucket/path/file.txt
# Upload a directory recursively
gcloud storage cp -r ./data gs://my-bucket/data/
# Download a file
gcloud storage cp gs://my-bucket/path/file.txt ./local-file.txt
# Sync a directory
gcloud storage rsync -r ./local-dir gs://my-bucket/remote-dir
Upload and download with client libraries (Python)
from google.cloud import storage
client = storage.Client()
bucket = client.bucket("my-bucket")
# Upload
blob = bucket.blob("path/file.txt")
blob.upload_from_filename("local-file.txt")
# Upload from string
blob = bucket.blob("path/data.json")
blob.upload_from_string('{"key": "value"}', content_type="application/json")
# Download
blob = bucket.blob("path/file.txt")
blob.download_to_filename("local-file.txt")
# Read as bytes
content = blob.download_as_bytes()
Generate signed URLs
from google.cloud import storage
import datetime
client = storage.Client()
bucket = client.bucket("my-bucket")
blob = bucket.blob("private/report.pdf")
# Signed URL valid for 1 hour
url = blob.generate_signed_url(
version="v4",
expiration=datetime.timedelta(hours=1),
method="GET",
)
print(f"Signed URL: {url}")
Signed URL for uploads
upload_url = blob.generate_signed_url(
version="v4",
expiration=datetime.timedelta(minutes=15),
method="PUT",
content_type="application/octet-stream",
)
# Client can PUT directly to this URL
Stream large files (Node.js)
const { Storage } = require('@google-cloud/storage');
const storage = new Storage();
// Upload stream
const fs = require('fs');
const bucket = storage.bucket('my-bucket');
fs.createReadStream('./large-file.csv')
.pipe(bucket.file('uploads/large-file.csv').createWriteStream({
resumable: true,
contentType: 'text/csv',
}))
.on('finish', () => console.log('Upload complete'));
// Download stream
bucket.file('uploads/large-file.csv')
.createReadStream()
.pipe(fs.createWriteStream('./downloaded.csv'))
.on('finish', () => console.log('Download complete'));
Event notifications with Pub/Sub
gcloud storage buckets notifications create gs://my-bucket \
--topic=projects/my-project/topics/storage-events \
--event-types=OBJECT_FINALIZE,OBJECT_DELETE
Bucket-level IAM
# Grant read access to all users (public bucket)
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
--member=allUsers \
--role=roles/storage.objectViewer
# Grant access to a service account
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
--member=serviceAccount:my-sa@my-project.iam.gserviceaccount.com \
--role=roles/storage.objectAdmin
Best Practices
- Enable Uniform Bucket-Level Access. This disables per-object ACLs and uses only IAM for access control, simplifying permissions management.
- Choose the right storage class. Use Standard for frequently accessed data, Nearline for monthly access, Coldline for quarterly, and Archive for yearly.
- Use lifecycle rules to manage costs. Automatically transition objects to cheaper classes and delete expired data.
- Enable Object Versioning for critical buckets. Versioning protects against accidental deletes and overwrites.
- Use resumable uploads for large files. Files over 5 MB should use resumable uploads to handle network interruptions.
- Organize with prefixes, not deep nesting. Cloud Storage has a flat namespace. Use prefixes like
year/month/day/for logical organization. - Set appropriate retention policies. Use retention policies for compliance requirements to prevent premature deletion.
Common Pitfalls
- Treating bucket names as private. Bucket names are globally unique and publicly enumerable. Do not embed secrets or sensitive identifiers in names.
- Listing objects without a prefix. Listing an entire bucket with millions of objects is slow and expensive. Always filter with a prefix.
- Ignoring egress costs. Data transfer out of Cloud Storage to the internet incurs charges. Use CDN (Cloud CDN) for high-traffic public content.
- Not setting Content-Type on upload. Missing or incorrect content types cause problems when serving files via signed URLs or static hosting.
- Using fine-grained ACLs with Uniform access enabled. These are mutually exclusive. Pick one access control model and stay consistent.
- Forgetting to handle 404 in application code. Checking existence with a separate call then downloading is a race condition. Catch
NotFoundon download instead.
Install this skill directly: skilldb add gcp-services-skills
Related Skills
Bigquery
Analyze large datasets with Google BigQuery serverless data warehouse and SQL engine
Cloud Functions
Build and deploy event-driven serverless functions on Google Cloud Functions
Cloud Run
Deploy and manage containerized applications on Google Cloud Run serverless platform
Firestore
Model, query, and manage data with Google Cloud Firestore NoSQL document database
Adversarial Code Review
Adversarial implementation review methodology that validates code completeness against requirements with fresh objectivity. Uses a coach-player dialectical loop to catch real gaps in security, logic, and data flow.
API Design Testing
Design, document, and test APIs following RESTful principles, consistent