Skip to main content
Technology & EngineeringAws Services200 lines

S3

AWS S3 object storage service for scalable, durable file and data storage

Quick Summary33 lines
You are an expert in Amazon S3 (Simple Storage Service) for cloud object storage, static hosting, and data lake architectures.

## Key Points

- **Leaving public access enabled on buckets** -- Unless you are hosting a static website, public access is a security risk. Use the account-level and bucket-level public access blocks.
- **Uploading large files without multipart** -- Files over 100 MB should use multipart upload for reliability. Files over 5 GB require it. The SDK's transfer utilities handle this automatically.
- **Listing entire buckets without prefix filtering** -- Calling `list_objects_v2` on a bucket with millions of objects is slow, expensive, and usually unnecessary. Always filter with a prefix.
- **Always block public access** unless explicitly needed (e.g., static website hosting). Use presigned URLs for temporary access.
- **Enable versioning** on buckets containing critical data to protect against accidental deletes.
- **Use lifecycle policies** to transition infrequently accessed data to cheaper storage classes (Intelligent-Tiering, Glacier).
- **Encrypt at rest** using SSE-S3 (default), SSE-KMS for audit trails, or SSE-C for client-managed keys.
- **Use S3 Transfer Acceleration** for cross-region uploads from end users.
- **Randomize key prefixes** if making thousands of requests per second (though S3 now auto-partitions, this helps with very high throughput).
- **Enable server access logging** or use CloudTrail data events for audit.
- **Use bucket policies and IAM** for access control rather than ACLs (ACLs are legacy).
- **Forgetting `LocationConstraint`** when creating buckets outside `us-east-1` causes errors.

## Quick Example

```bash
aws s3api put-public-access-block \
  --bucket my-app-assets \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
```

```bash
aws s3api put-bucket-versioning \
  --bucket my-app-assets \
  --versioning-configuration Status=Enabled
```
skilldb get aws-services-skills/S3Full skill: 200 lines
Paste into your CLAUDE.md or agent config

AWS S3 — Cloud Services

You are an expert in Amazon S3 (Simple Storage Service) for cloud object storage, static hosting, and data lake architectures.

Core Philosophy

S3 is the default storage layer for nearly everything in AWS -- application assets, backups, data lake files, logs, and static websites. Its durability is legendary (11 nines), but durability is not the same as security. Every bucket should start locked down: block all public access, enable versioning for critical data, encrypt at rest, and grant access through IAM policies rather than legacy ACLs. A bucket that is secure by default can always be opened selectively with presigned URLs; a bucket that starts open is a data breach waiting to happen.

Think in terms of object lifecycle, not just storage. Data that is hot today will be warm next month and cold next year. Lifecycle policies that transition objects through storage classes (Standard to Intelligent-Tiering to Glacier) and eventually expire them are not cost optimizations -- they are architectural hygiene. Without lifecycle management, S3 costs grow linearly forever while the business value of old objects decays.

Access should be temporary and scoped. Use presigned URLs for granting time-limited read or upload access to individual objects rather than making buckets public. Use IAM roles and bucket policies for service-to-service access. Avoid embedding AWS credentials in client-side code; instead, let your backend generate presigned URLs and hand them to the frontend.

Anti-Patterns

  • Leaving public access enabled on buckets -- Unless you are hosting a static website, public access is a security risk. Use the account-level and bucket-level public access blocks.
  • Skipping versioning on buckets with irreplaceable data -- Without versioning, an accidental delete or overwrite is permanent. Versioning adds negligible cost but provides complete recovery capability.
  • Using ACLs instead of bucket policies and IAM -- ACLs are a legacy access control mechanism that is harder to audit and reason about. AWS recommends disabling ACLs entirely with Object Ownership set to BucketOwnerEnforced.
  • Uploading large files without multipart -- Files over 100 MB should use multipart upload for reliability. Files over 5 GB require it. The SDK's transfer utilities handle this automatically.
  • Listing entire buckets without prefix filtering -- Calling list_objects_v2 on a bucket with millions of objects is slow, expensive, and usually unnecessary. Always filter with a prefix.

Overview

Amazon S3 provides virtually unlimited object storage with 99.999999999% (11 nines) durability. Objects are stored in buckets within regions, addressable via keys. S3 supports multiple storage classes (Standard, Intelligent-Tiering, Glacier, etc.), versioning, lifecycle policies, and event notifications.

Setup & Configuration

Create a Bucket (AWS CLI)

# Create a bucket in a specific region
aws s3api create-bucket \
  --bucket my-app-assets \
  --region us-east-1

# Create bucket in non-us-east-1 region (requires LocationConstraint)
aws s3api create-bucket \
  --bucket my-app-assets \
  --region eu-west-1 \
  --create-bucket-configuration LocationConstraint=eu-west-1

Block Public Access (default and recommended)

aws s3api put-public-access-block \
  --bucket my-app-assets \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

Enable Versioning

aws s3api put-bucket-versioning \
  --bucket my-app-assets \
  --versioning-configuration Status=Enabled

SDK Setup (Node.js v3)

import { S3Client, PutObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";

const s3 = new S3Client({ region: "us-east-1" });

SDK Setup (Python boto3)

import boto3

s3_client = boto3.client("s3", region_name="us-east-1")
s3_resource = boto3.resource("s3")

Core Patterns

Upload Objects

# Upload a file
s3_client.upload_file("local-file.png", "my-app-assets", "images/photo.png")

# Upload bytes with metadata
s3_client.put_object(
    Bucket="my-app-assets",
    Key="data/report.json",
    Body=json.dumps(report).encode("utf-8"),
    ContentType="application/json",
    Metadata={"generated-by": "pipeline-v2"},
)
// Node.js v3
await s3.send(new PutObjectCommand({
  Bucket: "my-app-assets",
  Key: "data/report.json",
  Body: JSON.stringify(report),
  ContentType: "application/json",
}));

Download and Read Objects

response = s3_client.get_object(Bucket="my-app-assets", Key="data/report.json")
body = response["Body"].read().decode("utf-8")
data = json.loads(body)

Generate Presigned URLs

url = s3_client.generate_presigned_url(
    "get_object",
    Params={"Bucket": "my-app-assets", "Key": "images/photo.png"},
    ExpiresIn=3600,  # 1 hour
)
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";

const url = await getSignedUrl(s3, new GetObjectCommand({
  Bucket: "my-app-assets",
  Key: "images/photo.png",
}), { expiresIn: 3600 });

Lifecycle Rules

# Transition to Glacier after 90 days, delete after 365
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-app-assets \
  --lifecycle-configuration '{
    "Rules": [{
      "ID": "archive-old-data",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [{"Days": 90, "StorageClass": "GLACIER"}],
      "Expiration": {"Days": 365}
    }]
  }'

S3 Event Notifications to Lambda

aws s3api put-bucket-notification-configuration \
  --bucket my-app-assets \
  --notification-configuration '{
    "LambdaFunctionConfigurations": [{
      "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:process-upload",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {"Key": {"FilterRules": [{"Name": "prefix", "Value": "uploads/"}]}}
    }]
  }'

Multipart Upload for Large Files

from boto3.s3.transfer import TransferConfig

config = TransferConfig(
    multipart_threshold=8 * 1024 * 1024,  # 8 MB
    max_concurrency=10,
    multipart_chunksize=8 * 1024 * 1024,
)
s3_client.upload_file("large-file.zip", "my-app-assets", "backups/large.zip", Config=config)

Best Practices

  • Always block public access unless explicitly needed (e.g., static website hosting). Use presigned URLs for temporary access.
  • Enable versioning on buckets containing critical data to protect against accidental deletes.
  • Use lifecycle policies to transition infrequently accessed data to cheaper storage classes (Intelligent-Tiering, Glacier).
  • Encrypt at rest using SSE-S3 (default), SSE-KMS for audit trails, or SSE-C for client-managed keys.
  • Use S3 Transfer Acceleration for cross-region uploads from end users.
  • Randomize key prefixes if making thousands of requests per second (though S3 now auto-partitions, this helps with very high throughput).
  • Enable server access logging or use CloudTrail data events for audit.
  • Use bucket policies and IAM for access control rather than ACLs (ACLs are legacy).

Common Pitfalls

  • Forgetting LocationConstraint when creating buckets outside us-east-1 causes errors.
  • Not handling pagination when listing objects. list_objects_v2 returns max 1000 keys per call; always check IsTruncated and use ContinuationToken.
  • Presigned URL region mismatch: The S3 client region must match the bucket region, or presigned URLs will return AuthorizationHeaderMalformed.
  • Eventual consistency confusion: S3 now provides strong read-after-write consistency, but older documentation may reference eventual consistency.
  • Large file uploads without multipart: Uploads over 5 GB require multipart. Use TransferConfig or the CLI aws s3 cp which handles this automatically.
  • Bucket name global uniqueness: Bucket names are globally unique across all AWS accounts. Deleted bucket names are not immediately reusable.
  • Cost surprise from GET requests: S3 charges per request. Scanning millions of objects with list_objects can be expensive; consider S3 Inventory for large-scale analysis.

Install this skill directly: skilldb add aws-services-skills

Get CLI access →