Technology & EngineeringAws Services203 lines

Dynamodb

AWS DynamoDB NoSQL database for high-performance key-value and document workloads

Quick Summary36 lines

You are an expert in Amazon DynamoDB for designing and operating NoSQL databases with single-digit millisecond latency at any scale.

## Key Points

- **Ignoring pagination in query results** -- Query and Scan return at most 1 MB per call. Failing to check `LastEvaluatedKey` and loop silently drops data beyond the first page.
- **Storing large blobs directly in items** -- The 400 KB item size limit and per-item RCU cost make DynamoDB a poor fit for large payloads. Store blobs in S3 and reference them by key.
- **Design for access patterns first.** Model your table around queries, not entities. Use single-table design when entities share access patterns.
- **Use on-demand billing** for unpredictable workloads, provisioned with auto-scaling for steady-state.
- **Keep items small** (under 400 KB limit). Store large blobs in S3 and reference them by key.
- **Use sparse indexes**: GSI items only appear if the GSI key attributes exist, so omit them to exclude items from the index.
- **Use `ProjectionExpression`** to retrieve only needed attributes, reducing read costs and latency.
- **Enable Point-in-Time Recovery (PITR)** for production tables.
- **Use TTL** for automatically expiring temporary data (sessions, caches) at no extra cost.
- **Hot partitions**: A single partition key receiving disproportionate traffic throttles that partition. Distribute writes across partition keys.
- **Scan is expensive**: `scan` reads every item in the table. Always prefer `query` with a key condition. If you need scan, use parallel scan with `Segment`/`TotalSegments`.
- **GSI throttling propagates**: If a GSI is throttled (provisioned mode), writes to the base table are also throttled. Ensure GSI capacity matches write patterns.

## Quick Example

```python
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
table = dynamodb.Table("Orders")
```

```javascript
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand, QueryCommand } from "@aws-sdk/lib-dynamodb";

const client = new DynamoDBClient({ region: "us-east-1" });
const ddb = DynamoDBDocumentClient.from(client);
```

skilldb get aws-services-skills/DynamodbFull skill: 203 lines

Paste into your CLAUDE.md or agent config

AWS DynamoDB — Cloud Services

You are an expert in Amazon DynamoDB for designing and operating NoSQL databases with single-digit millisecond latency at any scale.

Core Philosophy

DynamoDB demands you think about access patterns before you write a single line of code. Unlike relational databases where you normalize data and add queries later, DynamoDB requires you to model your table around how your application reads and writes data. Start with a list of access patterns, then design your partition key, sort key, and GSIs to serve those patterns efficiently. The schema follows the queries, not the other way around.

Single-table design is the default approach for applications where entities share access patterns. Storing users, orders, and products in one table with composite keys (e.g., PK=USER#alice, SK=ORDER#2024-01-15) enables fetching related data in a single query with no joins. This reduces the number of tables to manage, minimizes round trips, and keeps costs low. Multi-table design is appropriate when entities have completely independent access patterns or vastly different throughput requirements.

Every write should assume failure. Use conditional expressions for optimistic concurrency control, idempotency keys for retryable operations, and transactions when multiple items must be updated atomically. DynamoDB's default eventually consistent reads are sufficient for most use cases, but use strongly consistent reads when you need read-after-write guarantees -- and never on GSIs, which do not support them.

Anti-Patterns

Designing tables around entities instead of access patterns -- Normalizing data into separate tables like a relational database leads to expensive, slow scan operations and cross-table lookups that DynamoDB is not built for.
Using Scan as a query mechanism -- Scan reads every item in the table and is proportionally expensive. Always use Query with a key condition. If you need Scan, it is a sign your data model needs rethinking.
Choosing low-cardinality partition keys -- Keys like status or country concentrate traffic on a few partitions, causing throttling. Partition keys should have high cardinality and even distribution.
Ignoring pagination in query results -- Query and Scan return at most 1 MB per call. Failing to check LastEvaluatedKey and loop silently drops data beyond the first page.
Storing large blobs directly in items -- The 400 KB item size limit and per-item RCU cost make DynamoDB a poor fit for large payloads. Store blobs in S3 and reference them by key.

Overview

DynamoDB is a fully managed NoSQL key-value and document database. Tables have a primary key (partition key, or partition key + sort key). Data access patterns must be designed upfront; secondary indexes (GSI/LSI) provide alternative query paths. DynamoDB supports on-demand and provisioned capacity modes, DynamoDB Streams for change data capture, and transactions.

Setup & Configuration

Create a Table (AWS CLI)

aws dynamodb create-table \
  --table-name Orders \
  --attribute-definitions \
    AttributeName=PK,AttributeType=S \
    AttributeName=SK,AttributeType=S \
  --key-schema \
    AttributeName=PK,KeyType=HASH \
    AttributeName=SK,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

SDK Setup (Python boto3)

import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
table = dynamodb.Table("Orders")

SDK Setup (Node.js v3)

import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand, QueryCommand } from "@aws-sdk/lib-dynamodb";

const client = new DynamoDBClient({ region: "us-east-1" });
const ddb = DynamoDBDocumentClient.from(client);

Core Patterns

Single-Table Design

# Store multiple entity types in one table using PK/SK patterns
# User entity
table.put_item(Item={
    "PK": "USER#alice",
    "SK": "PROFILE",
    "name": "Alice",
    "email": "alice@example.com",
})

# Order entity belonging to user
table.put_item(Item={
    "PK": "USER#alice",
    "SK": "ORDER#2024-01-15#ord-001",
    "total": 59.99,
    "status": "shipped",
})

# Query all orders for a user
response = table.query(
    KeyConditionExpression=Key("PK").eq("USER#alice") & Key("SK").begins_with("ORDER#"),
)
orders = response["Items"]

Batch Operations

with table.batch_writer() as batch:
    for item in items:
        batch.put_item(Item=item)
# batch_writer handles chunking into 25-item batches and retries

Transactions

dynamodb_client = boto3.client("dynamodb")
dynamodb_client.transact_write_items(
    TransactItems=[
        {
            "Update": {
                "TableName": "Orders",
                "Key": {"PK": {"S": "USER#alice"}, "SK": {"S": "ORDER#ord-001"}},
                "UpdateExpression": "SET #s = :new_status",
                "ConditionExpression": "#s = :expected",
                "ExpressionAttributeNames": {"#s": "status"},
                "ExpressionAttributeValues": {
                    ":new_status": {"S": "shipped"},
                    ":expected": {"S": "processing"},
                },
            }
        },
        {
            "Put": {
                "TableName": "Orders",
                "Item": {
                    "PK": {"S": "SHIPMENT#ship-099"},
                    "SK": {"S": "ORDER#ord-001"},
                    "carrier": {"S": "UPS"},
                },
            }
        },
    ]
)

Global Secondary Index

aws dynamodb update-table \
  --table-name Orders \
  --attribute-definitions AttributeName=GSI1PK,AttributeType=S AttributeName=GSI1SK,AttributeType=S \
  --global-secondary-index-updates '[{
    "Create": {
      "IndexName": "GSI1",
      "KeySchema": [
        {"AttributeName": "GSI1PK", "KeyType": "HASH"},
        {"AttributeName": "GSI1SK", "KeyType": "RANGE"}
      ],
      "Projection": {"ProjectionType": "ALL"}
    }
  }]'

# Query the GSI
response = table.query(
    IndexName="GSI1",
    KeyConditionExpression=Key("GSI1PK").eq("STATUS#shipped") & Key("GSI1SK").begins_with("2024-01"),
)

DynamoDB Streams + Lambda

aws lambda create-event-source-mapping \
  --function-name process-order-changes \
  --event-source-arn arn:aws:dynamodb:us-east-1:123456789012:table/Orders/stream/2024-01-01T00:00:00.000 \
  --starting-position LATEST \
  --batch-size 100

Conditional Writes (Optimistic Locking)

table.update_item(
    Key={"PK": "PRODUCT#sku-100", "SK": "INVENTORY"},
    UpdateExpression="SET quantity = quantity - :dec",
    ConditionExpression="quantity >= :dec",
    ExpressionAttributeValues={":dec": 1},
)

Best Practices

Design for access patterns first. Model your table around queries, not entities. Use single-table design when entities share access patterns.
Use on-demand billing for unpredictable workloads, provisioned with auto-scaling for steady-state.
Keep items small (under 400 KB limit). Store large blobs in S3 and reference them by key.
Use sparse indexes: GSI items only appear if the GSI key attributes exist, so omit them to exclude items from the index.
Use ProjectionExpression to retrieve only needed attributes, reducing read costs and latency.
Enable Point-in-Time Recovery (PITR) for production tables.
Use TTL for automatically expiring temporary data (sessions, caches) at no extra cost.

Common Pitfalls

Hot partitions: A single partition key receiving disproportionate traffic throttles that partition. Distribute writes across partition keys.
Scan is expensive: scan reads every item in the table. Always prefer query with a key condition. If you need scan, use parallel scan with Segment/TotalSegments.
GSI throttling propagates: If a GSI is throttled (provisioned mode), writes to the base table are also throttled. Ensure GSI capacity matches write patterns.
Forgetting pagination: query and scan return max 1 MB per call. Always check for LastEvaluatedKey and loop.
Reserved words in expressions: Attributes like name, status, data are reserved. Always use ExpressionAttributeNames (e.g., #s for status).
Transaction limits: Transactions support max 100 items and 4 MB total. Items in a transaction must be in the same region.
Misunderstanding eventually consistent reads: By default, reads are eventually consistent. Use ConsistentRead=True for strong consistency (2x cost, not available on GSIs).

Install this skill directly: skilldb add aws-services-skills

Get CLI access →