S3
AWS S3 object storage service for scalable, durable file and data storage
You are an expert in Amazon S3 (Simple Storage Service) for cloud object storage, static hosting, and data lake architectures.
## Key Points
- **Leaving public access enabled on buckets** -- Unless you are hosting a static website, public access is a security risk. Use the account-level and bucket-level public access blocks.
- **Uploading large files without multipart** -- Files over 100 MB should use multipart upload for reliability. Files over 5 GB require it. The SDK's transfer utilities handle this automatically.
- **Listing entire buckets without prefix filtering** -- Calling `list_objects_v2` on a bucket with millions of objects is slow, expensive, and usually unnecessary. Always filter with a prefix.
- **Always block public access** unless explicitly needed (e.g., static website hosting). Use presigned URLs for temporary access.
- **Enable versioning** on buckets containing critical data to protect against accidental deletes.
- **Use lifecycle policies** to transition infrequently accessed data to cheaper storage classes (Intelligent-Tiering, Glacier).
- **Encrypt at rest** using SSE-S3 (default), SSE-KMS for audit trails, or SSE-C for client-managed keys.
- **Use S3 Transfer Acceleration** for cross-region uploads from end users.
- **Randomize key prefixes** if making thousands of requests per second (though S3 now auto-partitions, this helps with very high throughput).
- **Enable server access logging** or use CloudTrail data events for audit.
- **Use bucket policies and IAM** for access control rather than ACLs (ACLs are legacy).
- **Forgetting `LocationConstraint`** when creating buckets outside `us-east-1` causes errors.
## Quick Example
```bash
aws s3api put-public-access-block \
--bucket my-app-assets \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
```
```bash
aws s3api put-bucket-versioning \
--bucket my-app-assets \
--versioning-configuration Status=Enabled
```skilldb get aws-services-skills/S3Full skill: 200 linesAWS S3 — Cloud Services
You are an expert in Amazon S3 (Simple Storage Service) for cloud object storage, static hosting, and data lake architectures.
Core Philosophy
S3 is the default storage layer for nearly everything in AWS -- application assets, backups, data lake files, logs, and static websites. Its durability is legendary (11 nines), but durability is not the same as security. Every bucket should start locked down: block all public access, enable versioning for critical data, encrypt at rest, and grant access through IAM policies rather than legacy ACLs. A bucket that is secure by default can always be opened selectively with presigned URLs; a bucket that starts open is a data breach waiting to happen.
Think in terms of object lifecycle, not just storage. Data that is hot today will be warm next month and cold next year. Lifecycle policies that transition objects through storage classes (Standard to Intelligent-Tiering to Glacier) and eventually expire them are not cost optimizations -- they are architectural hygiene. Without lifecycle management, S3 costs grow linearly forever while the business value of old objects decays.
Access should be temporary and scoped. Use presigned URLs for granting time-limited read or upload access to individual objects rather than making buckets public. Use IAM roles and bucket policies for service-to-service access. Avoid embedding AWS credentials in client-side code; instead, let your backend generate presigned URLs and hand them to the frontend.
Anti-Patterns
- Leaving public access enabled on buckets -- Unless you are hosting a static website, public access is a security risk. Use the account-level and bucket-level public access blocks.
- Skipping versioning on buckets with irreplaceable data -- Without versioning, an accidental delete or overwrite is permanent. Versioning adds negligible cost but provides complete recovery capability.
- Using ACLs instead of bucket policies and IAM -- ACLs are a legacy access control mechanism that is harder to audit and reason about. AWS recommends disabling ACLs entirely with Object Ownership set to BucketOwnerEnforced.
- Uploading large files without multipart -- Files over 100 MB should use multipart upload for reliability. Files over 5 GB require it. The SDK's transfer utilities handle this automatically.
- Listing entire buckets without prefix filtering -- Calling
list_objects_v2on a bucket with millions of objects is slow, expensive, and usually unnecessary. Always filter with a prefix.
Overview
Amazon S3 provides virtually unlimited object storage with 99.999999999% (11 nines) durability. Objects are stored in buckets within regions, addressable via keys. S3 supports multiple storage classes (Standard, Intelligent-Tiering, Glacier, etc.), versioning, lifecycle policies, and event notifications.
Setup & Configuration
Create a Bucket (AWS CLI)
# Create a bucket in a specific region
aws s3api create-bucket \
--bucket my-app-assets \
--region us-east-1
# Create bucket in non-us-east-1 region (requires LocationConstraint)
aws s3api create-bucket \
--bucket my-app-assets \
--region eu-west-1 \
--create-bucket-configuration LocationConstraint=eu-west-1
Block Public Access (default and recommended)
aws s3api put-public-access-block \
--bucket my-app-assets \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
Enable Versioning
aws s3api put-bucket-versioning \
--bucket my-app-assets \
--versioning-configuration Status=Enabled
SDK Setup (Node.js v3)
import { S3Client, PutObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";
const s3 = new S3Client({ region: "us-east-1" });
SDK Setup (Python boto3)
import boto3
s3_client = boto3.client("s3", region_name="us-east-1")
s3_resource = boto3.resource("s3")
Core Patterns
Upload Objects
# Upload a file
s3_client.upload_file("local-file.png", "my-app-assets", "images/photo.png")
# Upload bytes with metadata
s3_client.put_object(
Bucket="my-app-assets",
Key="data/report.json",
Body=json.dumps(report).encode("utf-8"),
ContentType="application/json",
Metadata={"generated-by": "pipeline-v2"},
)
// Node.js v3
await s3.send(new PutObjectCommand({
Bucket: "my-app-assets",
Key: "data/report.json",
Body: JSON.stringify(report),
ContentType: "application/json",
}));
Download and Read Objects
response = s3_client.get_object(Bucket="my-app-assets", Key="data/report.json")
body = response["Body"].read().decode("utf-8")
data = json.loads(body)
Generate Presigned URLs
url = s3_client.generate_presigned_url(
"get_object",
Params={"Bucket": "my-app-assets", "Key": "images/photo.png"},
ExpiresIn=3600, # 1 hour
)
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
const url = await getSignedUrl(s3, new GetObjectCommand({
Bucket: "my-app-assets",
Key: "images/photo.png",
}), { expiresIn: 3600 });
Lifecycle Rules
# Transition to Glacier after 90 days, delete after 365
aws s3api put-bucket-lifecycle-configuration \
--bucket my-app-assets \
--lifecycle-configuration '{
"Rules": [{
"ID": "archive-old-data",
"Status": "Enabled",
"Filter": {"Prefix": "logs/"},
"Transitions": [{"Days": 90, "StorageClass": "GLACIER"}],
"Expiration": {"Days": 365}
}]
}'
S3 Event Notifications to Lambda
aws s3api put-bucket-notification-configuration \
--bucket my-app-assets \
--notification-configuration '{
"LambdaFunctionConfigurations": [{
"LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:process-upload",
"Events": ["s3:ObjectCreated:*"],
"Filter": {"Key": {"FilterRules": [{"Name": "prefix", "Value": "uploads/"}]}}
}]
}'
Multipart Upload for Large Files
from boto3.s3.transfer import TransferConfig
config = TransferConfig(
multipart_threshold=8 * 1024 * 1024, # 8 MB
max_concurrency=10,
multipart_chunksize=8 * 1024 * 1024,
)
s3_client.upload_file("large-file.zip", "my-app-assets", "backups/large.zip", Config=config)
Best Practices
- Always block public access unless explicitly needed (e.g., static website hosting). Use presigned URLs for temporary access.
- Enable versioning on buckets containing critical data to protect against accidental deletes.
- Use lifecycle policies to transition infrequently accessed data to cheaper storage classes (Intelligent-Tiering, Glacier).
- Encrypt at rest using SSE-S3 (default), SSE-KMS for audit trails, or SSE-C for client-managed keys.
- Use S3 Transfer Acceleration for cross-region uploads from end users.
- Randomize key prefixes if making thousands of requests per second (though S3 now auto-partitions, this helps with very high throughput).
- Enable server access logging or use CloudTrail data events for audit.
- Use bucket policies and IAM for access control rather than ACLs (ACLs are legacy).
Common Pitfalls
- Forgetting
LocationConstraintwhen creating buckets outsideus-east-1causes errors. - Not handling pagination when listing objects.
list_objects_v2returns max 1000 keys per call; always checkIsTruncatedand useContinuationToken. - Presigned URL region mismatch: The S3 client region must match the bucket region, or presigned URLs will return
AuthorizationHeaderMalformed. - Eventual consistency confusion: S3 now provides strong read-after-write consistency, but older documentation may reference eventual consistency.
- Large file uploads without multipart: Uploads over 5 GB require multipart. Use
TransferConfigor the CLIaws s3 cpwhich handles this automatically. - Bucket name global uniqueness: Bucket names are globally unique across all AWS accounts. Deleted bucket names are not immediately reusable.
- Cost surprise from GET requests: S3 charges per request. Scanning millions of objects with
list_objectscan be expensive; consider S3 Inventory for large-scale analysis.
Install this skill directly: skilldb add aws-services-skills
Related Skills
API Gateway
AWS API Gateway for building, deploying, and managing RESTful and WebSocket APIs
Cloudformation
AWS CloudFormation infrastructure-as-code for provisioning and managing AWS resources declaratively
Cognito
AWS Cognito user authentication and authorization for web and mobile applications
Dynamodb
AWS DynamoDB NoSQL database for high-performance key-value and document workloads
Ecs Fargate
AWS ECS and Fargate for running containerized applications without managing servers
Rds Aurora
AWS RDS and Aurora managed relational databases for production SQL workloads