Skip to main content
Technology & EngineeringPerformance Optimization138 lines

Database Query Optimization

Optimize database queries through indexing, query planning, N+1 elimination, connection pooling, and schema design.

Quick Summary30 lines
You are an expert in database query optimization for optimizing application performance.

## Key Points

- **Indexing:** B-tree indexes reduce lookup time from O(n) to O(log n). A missing index on a filtered or joined column is the most common cause of slow queries.
- **N+1 problem:** Executing one query to fetch a list, then one query per item to fetch related data. 100 items = 101 queries.
- **Connection pooling:** Reusing database connections instead of opening a new one per request. Connection creation overhead is typically 20-50ms.
- **Query selectivity:** The ratio of rows returned to total rows. High selectivity (few rows matched) benefits most from indexing.
- **Slow query log:** Enable PostgreSQL's `log_min_duration_statement = 100` to log any query taking over 100ms.
- **pg_stat_statements:** Tracks cumulative query statistics — total execution time, call count, average duration. Identifies the queries consuming the most total server time.
- **Connection pool metrics:** Monitor active connections, queue depth, and wait time. If the queue grows, increase pool size or optimize long-running queries.
- **Application-level tracing:** Use APM tools (Datadog, New Relic) to trace database time as a percentage of total request time.
- Index every column used in `WHERE`, `JOIN ON`, and `ORDER BY` clauses on tables with more than 10,000 rows.
- Use `SELECT` with only the columns you need instead of `SELECT *` to reduce I/O and memory usage.
- Batch related queries into a single round-trip using `ANY()`, `IN()`, or CTEs to eliminate N+1 patterns.
- Adding indexes without considering write overhead — each index slows down `INSERT`, `UPDATE`, and `DELETE` operations. Aim for no more than 5-7 indexes per table.

## Quick Example

```sql
SELECT * FROM events WHERE user_id = 42 AND created_at > '2025-06-01';
-- Seq Scan: 3200ms
```

```sql
CREATE INDEX idx_events_user_created ON events (user_id, created_at);
-- Index Scan: 2ms
```
skilldb get performance-optimization-skills/Database Query OptimizationFull skill: 138 lines
Paste into your CLAUDE.md or agent config

Database Query Optimization — Performance Optimization

You are an expert in database query optimization for optimizing application performance.

Core Philosophy

Overview

Database queries are frequently the slowest component in a request lifecycle. A single unoptimized query can take seconds and consume disproportionate server resources. Query optimization involves analyzing execution plans, designing proper indexes, restructuring queries, and managing connections to achieve consistent sub-100ms response times.

Core Concepts

  • Execution plan: The database engine's strategy for resolving a query. Use EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL) to see whether the engine performs sequential scans, index scans, or hash joins.
  • Indexing: B-tree indexes reduce lookup time from O(n) to O(log n). A missing index on a filtered or joined column is the most common cause of slow queries.
  • N+1 problem: Executing one query to fetch a list, then one query per item to fetch related data. 100 items = 101 queries.
  • Connection pooling: Reusing database connections instead of opening a new one per request. Connection creation overhead is typically 20-50ms.
  • Query selectivity: The ratio of rows returned to total rows. High selectivity (few rows matched) benefits most from indexing.

Implementation Patterns

Identify Slow Queries with EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id;

Look for: Seq Scan on large tables, high actual time, and large rows estimates that differ significantly from actual counts.

Add Targeted Indexes

Before (sequential scan on 10M rows):

SELECT * FROM events WHERE user_id = 42 AND created_at > '2025-06-01';
-- Seq Scan: 3200ms

After (composite index):

CREATE INDEX idx_events_user_created ON events (user_id, created_at);
-- Index Scan: 2ms

Eliminate N+1 Queries

Before:

users = db.query("SELECT * FROM users LIMIT 100")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")
    # 101 queries total

After:

users = db.query("SELECT * FROM users LIMIT 100")
user_ids = [u.id for u in users]
orders = db.query("SELECT * FROM orders WHERE user_id = ANY(%s)", [user_ids])
# 2 queries total

orders_by_user = groupby(orders, key=lambda o: o.user_id)

Connection Pooling

// Node.js with pg-pool
const { Pool } = require('pg');
const pool = new Pool({
  max: 20,               // max concurrent connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

async function getUser(id) {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
    return result.rows[0];
  } finally {
    client.release(); // return connection to pool
  }
}

Pagination with Cursor-Based Approach

Before (offset pagination degrades with large offsets):

SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 100000;
-- Scans and discards 100,000 rows

After (cursor-based):

SELECT * FROM products WHERE id > 100000 ORDER BY id LIMIT 20;
-- Seeks directly to the cursor position via index

Measurement & Monitoring

  • Slow query log: Enable PostgreSQL's log_min_duration_statement = 100 to log any query taking over 100ms.
  • pg_stat_statements: Tracks cumulative query statistics — total execution time, call count, average duration. Identifies the queries consuming the most total server time.
  • Connection pool metrics: Monitor active connections, queue depth, and wait time. If the queue grows, increase pool size or optimize long-running queries.
  • Application-level tracing: Use APM tools (Datadog, New Relic) to trace database time as a percentage of total request time.

Best Practices

  • Index every column used in WHERE, JOIN ON, and ORDER BY clauses on tables with more than 10,000 rows.
  • Use SELECT with only the columns you need instead of SELECT * to reduce I/O and memory usage.
  • Batch related queries into a single round-trip using ANY(), IN(), or CTEs to eliminate N+1 patterns.

Common Pitfalls

  • Adding indexes without considering write overhead — each index slows down INSERT, UPDATE, and DELETE operations. Aim for no more than 5-7 indexes per table.
  • Using OFFSET for pagination on large datasets, which forces the database to scan and discard rows linearly, degrading performance as page numbers increase.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add performance-optimization-skills

Get CLI access →