Skip to main content
Technology & EngineeringMigration Patterns203 lines

SQL to Nosql

Migrate from a relational SQL database to a NoSQL document or key-value store

Quick Summary28 lines
You are an expert in migrating data from relational SQL databases to NoSQL stores (document, key-value, wide-column) for flexible schemas, horizontal scaling, and optimized read patterns.

## Key Points

1. **Identify Access Patterns** — list every query the application makes and group by read/write frequency.
2. **Design Document Models** — denormalize data to match the most common access patterns.
3. **Dual-Write Phase** — write to both SQL and NoSQL simultaneously; read from SQL.
4. **Shadow Read Phase** — read from NoSQL, compare results with SQL, log discrepancies.
5. **Cut Over** — switch reads to NoSQL, stop writing to SQL, decommission old tables.
- Model data around queries, not entities. Ask "how will this data be read?" before designing documents.
- Embed data that is always read together; reference data that is read independently or is very large.
- Set document size budgets — MongoDB has a 16 MB document limit; unbounded arrays will eventually hit it.
- Use change data capture (Debezium, DynamoDB Streams) for real-time sync during the dual-write phase.
- Keep the SQL database as a fallback until NoSQL has been validated in production for at least one full business cycle.
- Index the fields you query on. NoSQL without indexes is slower than SQL for filtered reads.
- **Lifting and shifting the relational model** — storing one document per SQL row with foreign key references defeats the purpose. Denormalize intentionally.

## Quick Example

```sql
-- Identify the top queries by frequency (PostgreSQL example)
SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;
```
skilldb get migration-patterns-skills/SQL to NosqlFull skill: 203 lines
Paste into your CLAUDE.md or agent config

SQL to NoSQL — Migration Patterns

You are an expert in migrating data from relational SQL databases to NoSQL stores (document, key-value, wide-column) for flexible schemas, horizontal scaling, and optimized read patterns.

Core Philosophy

Overview

Moving from SQL to NoSQL is not a drop-in replacement. It requires rethinking data models around access patterns rather than normalization rules. The migration works best when you identify specific workloads that benefit from NoSQL characteristics (high write throughput, flexible schemas, horizontal scaling) and migrate those selectively, rather than replacing the entire relational database.

Migration Strategy

  1. Identify Access Patterns — list every query the application makes and group by read/write frequency.
  2. Design Document Models — denormalize data to match the most common access patterns.
  3. Dual-Write Phase — write to both SQL and NoSQL simultaneously; read from SQL.
  4. Shadow Read Phase — read from NoSQL, compare results with SQL, log discrepancies.
  5. Cut Over — switch reads to NoSQL, stop writing to SQL, decommission old tables.

Step-by-Step Guide

1. Analyze access patterns

-- Identify the top queries by frequency (PostgreSQL example)
SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;

Common pattern: "Get user with their recent orders" — this involves a JOIN in SQL but maps naturally to an embedded document in NoSQL.

2. Design the document model

SQL (normalized):

-- users table
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(255) UNIQUE
);

-- orders table
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT REFERENCES users(id),
  total DECIMAL(10,2),
  status VARCHAR(20),
  created_at TIMESTAMP DEFAULT NOW()
);

-- order_items table
CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  product_name VARCHAR(200),
  quantity INT,
  price DECIMAL(10,2)
);

NoSQL (denormalized document in MongoDB):

// users collection
{
  _id: ObjectId("..."),
  name: "Jane Smith",
  email: "jane@example.com",
  recentOrders: [
    {
      orderId: "ord_001",
      total: 79.98,
      status: "delivered",
      createdAt: ISODate("2025-11-15T10:30:00Z"),
      items: [
        { productName: "Keyboard", quantity: 1, price: 49.99 },
        { productName: "Mouse Pad", quantity: 1, price: 29.99 }
      ]
    }
  ]
}

3. Write the migration script

import { Pool } from 'pg';
import { MongoClient } from 'mongodb';

const pg = new Pool({ connectionString: process.env.PG_URL });
const mongo = new MongoClient(process.env.MONGO_URL!);

async function migrateUsers() {
  await mongo.connect();
  const usersCol = mongo.db('app').collection('users');

  const { rows: users } = await pg.query('SELECT * FROM users');

  for (const user of users) {
    const { rows: orders } = await pg.query(
      `SELECT o.*, json_agg(json_build_object(
          'productName', oi.product_name,
          'quantity', oi.quantity,
          'price', oi.price
        )) AS items
       FROM orders o
       JOIN order_items oi ON oi.order_id = o.id
       WHERE o.user_id = $1
       GROUP BY o.id
       ORDER BY o.created_at DESC
       LIMIT 10`,
      [user.id]
    );

    await usersCol.insertOne({
      _id: user.id,
      name: user.name,
      email: user.email,
      recentOrders: orders.map(o => ({
        orderId: `ord_${o.id}`,
        total: parseFloat(o.total),
        status: o.status,
        createdAt: o.created_at,
        items: o.items,
      })),
    });
  }

  console.log(`Migrated ${users.length} users`);
}

4. Implement dual-write in the application

class OrderService {
  async createOrder(userId: string, items: OrderItem[]): Promise<Order> {
    // Write to SQL (source of truth during migration)
    const order = await this.sqlRepo.createOrder(userId, items);

    // Write to NoSQL (async, non-blocking)
    this.mongoRepo.appendOrder(userId, order).catch(err => {
      logger.error('NoSQL dual-write failed', { userId, orderId: order.id, err });
      this.syncQueue.enqueue({ userId, orderId: order.id }); // retry later
    });

    return order;
  }
}

5. Shadow reads with comparison

async function getUserWithOrders(userId: string) {
  const sqlResult = await sqlRepo.getUserWithOrders(userId);
  const mongoResult = await mongoRepo.getUserWithOrders(userId);

  if (!deepEqual(sqlResult, mongoResult)) {
    logger.warn('Data mismatch', { userId, sql: sqlResult, mongo: mongoResult });
    metrics.increment('migration.mismatch');
  }

  return sqlResult; // still serving from SQL during shadow phase
}

Best Practices

  • Model data around queries, not entities. Ask "how will this data be read?" before designing documents.
  • Embed data that is always read together; reference data that is read independently or is very large.
  • Set document size budgets — MongoDB has a 16 MB document limit; unbounded arrays will eventually hit it.
  • Use change data capture (Debezium, DynamoDB Streams) for real-time sync during the dual-write phase.
  • Keep the SQL database as a fallback until NoSQL has been validated in production for at least one full business cycle.
  • Index the fields you query on. NoSQL without indexes is slower than SQL for filtered reads.

Common Pitfalls

  • Lifting and shifting the relational model — storing one document per SQL row with foreign key references defeats the purpose. Denormalize intentionally.
  • Unbounded embedded arrays — embedding all orders inside a user document works for 10 orders but breaks at 10,000. Cap embedded arrays and use a separate collection for overflow.
  • Losing transactional guarantees — SQL ACID transactions do not exist in the same way in most NoSQL systems. Design for eventual consistency or use multi-document transactions where supported.
  • Migrating everything — not all workloads benefit from NoSQL. Reporting, analytics, and complex joins are often better served by SQL. Use the right tool for each workload.
  • Ignoring secondary access patterns — optimizing for the primary read path but forgetting that admin dashboards need different queries leads to expensive table scans.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add migration-patterns-skills

Get CLI access →