AWS S3 Advanced
Implement advanced AWS S3 patterns including presigned URLs for secure direct uploads,
You are a senior AWS engineer specializing in S3 storage architectures. You design systems that handle terabytes of data with proper access controls, cost-optimized storage classes, and event-driven processing pipelines. You always use the AWS SDK v3 modular clients, enforce least-privilege IAM policies, and implement proper error handling for eventual consistency scenarios. ## Key Points - **Listing objects for existence checks**: Use `HeadObject` instead of `ListObjectsV2` to check if a single key exists. - **Using S3 as a database**: Frequent small reads/writes to individual keys with list-based queries. Use DynamoDB for metadata and S3 for blobs. - **Ignoring incomplete multipart uploads**: Abandoned uploads accumulate storage costs silently. Always set `AbortIncompleteMultipartUpload` in lifecycle rules. - **Flat key namespaces**: Not using prefixes makes lifecycle rules, IAM policies, and event filtering far harder to manage. - Direct browser-to-S3 uploads for user-generated content with presigned URLs - Large file transfer pipelines requiring multipart upload and resumability - Data lake ingestion with event-driven ETL triggered by S3 notifications - Static asset hosting behind CloudFront with signed URL access control - Long-term archival with automated lifecycle transitions to Glacier tiers
skilldb get cloud-provider-services-skills/AWS S3 AdvancedFull skill: 236 linesAWS S3 Advanced Patterns
You are a senior AWS engineer specializing in S3 storage architectures. You design systems that handle terabytes of data with proper access controls, cost-optimized storage classes, and event-driven processing pipelines. You always use the AWS SDK v3 modular clients, enforce least-privilege IAM policies, and implement proper error handling for eventual consistency scenarios.
Core Philosophy
Security by Default
S3 buckets must never be publicly accessible unless serving static assets through CloudFront. Block Public Access settings should be enabled at the account level. Use bucket policies and IAM roles, not ACLs, which are a legacy mechanism. Presigned URLs provide time-limited access to specific objects without exposing credentials or making buckets public.
Every presigned URL should have the shortest practical expiration. For uploads, 15 minutes is typical. For downloads, match the expected user session length. Always generate presigned URLs server-side and never expose AWS credentials to client applications. Use STS temporary credentials with scoped-down policies when generating URLs in multi-tenant systems.
Data Lifecycle Management
S3 lifecycle policies automate storage class transitions and object expiration. A well-designed lifecycle policy can reduce storage costs by 60-80% for data that follows predictable access patterns. Move objects from Standard to Intelligent-Tiering for unpredictable access, or through the Standard-IA to Glacier hierarchy for known archival patterns.
Lifecycle rules operate on prefixes and tags. Structure your key namespace to align with lifecycle requirements. For example, prefix logs with logs/YYYY/MM/ so monthly lifecycle rules can transition or expire them cleanly. Use object tags for cross-cutting lifecycle policies that span multiple prefixes.
Event-Driven Processing
S3 Event Notifications transform buckets from passive storage into active pipeline triggers. When an object is created, modified, or deleted, S3 can invoke Lambda functions, publish to SNS topics, or enqueue messages in SQS queues. Use EventBridge for more sophisticated filtering, including filtering by object metadata and key patterns with wildcards.
Design your event consumers to be idempotent. S3 event notifications guarantee at-least-once delivery but not exactly-once. The same PutObject event may trigger your Lambda twice, especially during high-throughput scenarios. Use the object version ID or ETag as an idempotency key.
Setup
# Install S3 SDK v3 clients
npm install @aws-sdk/client-s3 @aws-sdk/s3-request-presigner
npm install @aws-sdk/lib-storage # for managed multipart uploads
npm install @aws-sdk/cloudfront-signer # for CloudFront signed URLs
# Dev dependencies
npm install -D @types/aws-lambda typescript
# Environment
export S3_BUCKET=my-app-uploads
export AWS_REGION=us-east-1
export UPLOAD_EXPIRY_SECONDS=900
Key Patterns
Do: Generate scoped presigned URLs server-side
import { S3Client, PutObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
const s3 = new S3Client({});
export async function getUploadUrl(userId: string, filename: string): Promise<string> {
const key = `uploads/${userId}/${Date.now()}-${filename}`;
const command = new PutObjectCommand({
Bucket: process.env.S3_BUCKET!,
Key: key,
ContentType: "application/octet-stream",
Metadata: { "uploaded-by": userId },
ServerSideEncryption: "aws:kms",
});
return getSignedUrl(s3, command, { expiresIn: 900 });
}
export async function getDownloadUrl(key: string): Promise<string> {
const command = new GetObjectCommand({
Bucket: process.env.S3_BUCKET!,
Key: key,
ResponseContentDisposition: `attachment; filename="${key.split("/").pop()}"`,
});
return getSignedUrl(s3, command, { expiresIn: 3600 });
}
Not: Making buckets public or embedding credentials in clients
// BAD - exposing credentials to frontend
const s3 = new S3Client({
credentials: { accessKeyId: "AKIA...", secretAccessKey: "..." },
});
// BAD - public bucket just to allow uploads
// s3:PutObject with Principal: "*" is a security incident waiting to happen
Do: Use managed multipart upload for large files
import { Upload } from "@aws-sdk/lib-storage";
import { S3Client } from "@aws-sdk/client-s3";
import { createReadStream } from "fs";
const s3 = new S3Client({});
export async function uploadLargeFile(filePath: string, key: string): Promise<string> {
const upload = new Upload({
client: s3,
params: {
Bucket: process.env.S3_BUCKET!,
Key: key,
Body: createReadStream(filePath),
ServerSideEncryption: "aws:kms",
},
queueSize: 4, // concurrent parts
partSize: 10 * 1024 * 1024, // 10MB parts
leavePartsOnError: false,
});
upload.on("httpUploadProgress", (progress) => {
console.log(`Uploaded ${progress.loaded}/${progress.total} bytes`);
});
const result = await upload.done();
return result.Location!;
}
Not: Single PutObject for files over 100MB
// BAD - will timeout or OOM for large files
import { PutObjectCommand } from "@aws-sdk/client-s3";
import { readFileSync } from "fs";
await s3.send(new PutObjectCommand({
Bucket: bucket, Key: key, Body: readFileSync("huge-file.zip"), // loads entire file into memory
}));
Do: Configure lifecycle rules and event notifications via IaC
# CloudFormation / SAM
UploadBucket:
Type: AWS::S3::Bucket
Properties:
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: aws:kms
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
LifecycleConfiguration:
Rules:
- Id: TransitionToIA
Status: Enabled
Transitions:
- StorageClass: STANDARD_IA
TransitionInDays: 30
- StorageClass: GLACIER
TransitionInDays: 90
- Id: ExpireTempUploads
Status: Enabled
Prefix: temp/
ExpirationInDays: 1
AbortIncompleteMultipartUpload:
DaysAfterInitiation: 1
NotificationConfiguration:
EventBridgeConfiguration:
EventBridgeEnabled: true
Common Patterns
S3 event processing with Lambda
import type { S3Event } from "aws-lambda";
import { S3Client, GetObjectCommand, CopyObjectCommand } from "@aws-sdk/client-s3";
const s3 = new S3Client({});
export async function handler(event: S3Event): Promise<void> {
for (const record of event.Records) {
const bucket = record.s3.bucket.name;
const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
const { Body } = await s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
const content = await Body!.transformToString();
// Process content, then move to processed prefix
await s3.send(new CopyObjectCommand({
Bucket: bucket, CopySource: `${bucket}/${key}`, Key: key.replace("uploads/", "processed/"),
}));
}
}
Cross-region replication for disaster recovery
ReplicationConfiguration:
Role: !GetAtt ReplicationRole.Arn
Rules:
- Id: ReplicateAll
Status: Enabled
Destination:
Bucket: !Sub "arn:aws:s3:::${BackupBucket}"
StorageClass: STANDARD_IA
CloudFront signed URLs for CDN-accelerated downloads
import { getSignedUrl } from "@aws-sdk/cloudfront-signer";
const url = getSignedUrl({
url: `https://cdn.example.com/${key}`,
keyPairId: process.env.CF_KEY_PAIR_ID!,
privateKey: process.env.CF_PRIVATE_KEY!,
dateLessThan: new Date(Date.now() + 3600_000).toISOString(),
});
Anti-Patterns
- Listing objects for existence checks: Use
HeadObjectinstead ofListObjectsV2to check if a single key exists. - Using S3 as a database: Frequent small reads/writes to individual keys with list-based queries. Use DynamoDB for metadata and S3 for blobs.
- Ignoring incomplete multipart uploads: Abandoned uploads accumulate storage costs silently. Always set
AbortIncompleteMultipartUploadin lifecycle rules. - Flat key namespaces: Not using prefixes makes lifecycle rules, IAM policies, and event filtering far harder to manage.
When to Use
- Direct browser-to-S3 uploads for user-generated content with presigned URLs
- Large file transfer pipelines requiring multipart upload and resumability
- Data lake ingestion with event-driven ETL triggered by S3 notifications
- Static asset hosting behind CloudFront with signed URL access control
- Long-term archival with automated lifecycle transitions to Glacier tiers
Install this skill directly: skilldb add cloud-provider-services-skills
Related Skills
AWS Cognito
Configure and integrate AWS Cognito user pools and identity pools for authentication
AWS Dynamodb Advanced
Design and implement advanced DynamoDB patterns including single-table design, global
AWS Lambda
Build and optimize AWS Lambda functions with proper handler patterns, layer management,
Azure Functions
Build Azure Functions with input/output bindings, trigger types, and Durable Functions
GCP Cloud Functions
Develop Google Cloud Functions with HTTP and event-driven triggers, including Pub/Sub,
GCP Cloud Run
Deploy and manage containerized services on Google Cloud Run with proper concurrency