Indexing Strategies
Index types, design strategies, and maintenance for optimal query performance in relational databases
You are an expert in database indexing strategies for designing, creating, and maintaining indexes that maximize query performance across PostgreSQL, MySQL, and SQL Server.
## Key Points
- **Selectivity** — the fraction of rows a predicate matches. Lower is better for indexing (1/N is ideal).
- **Cardinality** — the number of distinct values in a column.
- **Index-only scan** — the query is answered entirely from the index without touching the heap.
- **Covering index** — an index that includes all columns needed by a query.
1. Equality columns first (highest selectivity first among them)
2. Range/sort column last
1. **Start from slow query logs** — index what is actually queried, not what might be queried. Use `pg_stat_statements`, MySQL slow query log, or SQL Server Query Store.
2. **Check usage statistics** — monitor `pg_stat_user_indexes` or `sys.dm_db_index_usage_stats` to find unused indexes and drop them.
3. **Aim for index-only scans** — use INCLUDE columns or widen the index to cover SELECT and WHERE columns.
4. **Keep indexes narrow** — every additional column increases index size and write overhead.
5. **Use partial indexes** for common filter conditions that exclude most rows (e.g., `WHERE deleted_at IS NULL`).
6. **Rebuild or reindex periodically** — B-tree indexes can bloat after heavy UPDATE/DELETE workloads.
## Quick Example
```
[30 | 70] <- root
/ | \
[10|20] [40|50|60] [80|90] <- internal nodes
/ | \ / | \ \ / | \
leaf pages with row pointers <- leaf level (doubly linked)
```
```sql
-- Supports WHERE tenant_id = ? AND status = ? AND created_at > ?
CREATE INDEX idx_orders_tenant_status_date
ON orders (tenant_id, status, created_at);
```skilldb get sql-skills/Indexing StrategiesFull skill: 183 linesIndexing Strategies — SQL
You are an expert in database indexing strategies for designing, creating, and maintaining indexes that maximize query performance across PostgreSQL, MySQL, and SQL Server.
Overview
An index is an auxiliary data structure that allows the database engine to locate rows without scanning every page in a table. Choosing the right index type, column order, and coverage is the single highest-leverage performance optimization available. Poor indexing leads to full table scans; over-indexing wastes storage and slows writes.
Core Concepts
Index Types
| Type | Engine Support | Use Case |
|---|---|---|
| B-tree | All major engines | Equality, range, sorting, prefix LIKE |
| Hash | PostgreSQL, MySQL (Memory engine) | Equality only, no range support |
| GIN (Generalized Inverted) | PostgreSQL | Full-text search, JSONB, arrays, tsvector |
| GiST (Generalized Search Tree) | PostgreSQL | Geometric, range types, full-text |
| BRIN (Block Range Index) | PostgreSQL | Very large naturally-ordered tables (timestamps, sequences) |
| Partial / Filtered | PostgreSQL, SQL Server | Index a subset of rows matching a predicate |
| Covering / INCLUDE | PostgreSQL 11+, SQL Server | Store extra columns in the index leaf to enable index-only scans |
| Clustered | SQL Server, MySQL (InnoDB primary key) | Physically orders table data by the index key |
B-tree Internals (Simplified)
[30 | 70] <- root
/ | \
[10|20] [40|50|60] [80|90] <- internal nodes
/ | \ / | \ \ / | \
leaf pages with row pointers <- leaf level (doubly linked)
The leaf level is a doubly-linked list, enabling efficient range scans. Lookups are O(log N).
Key Terminology
- Selectivity — the fraction of rows a predicate matches. Lower is better for indexing (1/N is ideal).
- Cardinality — the number of distinct values in a column.
- Index-only scan — the query is answered entirely from the index without touching the heap.
- Covering index — an index that includes all columns needed by a query.
Implementation Patterns
Composite Index Column Order
The leftmost prefix rule: a composite index on (a, b, c) supports queries on (a), (a, b), and (a, b, c), but not (b) or (c) alone.
-- Supports WHERE tenant_id = ? AND status = ? AND created_at > ?
CREATE INDEX idx_orders_tenant_status_date
ON orders (tenant_id, status, created_at);
Rule of thumb for column order:
- Equality columns first (highest selectivity first among them)
- Range/sort column last
Partial (Filtered) Indexes
-- Only index active orders — much smaller than a full index
CREATE INDEX idx_orders_active
ON orders (customer_id, created_at)
WHERE status = 'active';
-- Query must include the matching predicate to use this index
SELECT * FROM orders
WHERE status = 'active'
AND customer_id = 42
ORDER BY created_at DESC;
Covering Indexes with INCLUDE
-- PostgreSQL: cover the SELECT columns to avoid heap fetches
CREATE INDEX idx_orders_covering
ON orders (customer_id, order_date)
INCLUDE (total_amount, status);
-- This query can now be an index-only scan
SELECT order_date, total_amount, status
FROM orders
WHERE customer_id = 123
ORDER BY order_date DESC;
Expression Indexes
-- Index on a computed expression
CREATE INDEX idx_users_email_lower
ON users (LOWER(email));
-- Query must use the same expression
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
BRIN for Time-Series Data
-- Extremely compact index for naturally ordered data
CREATE INDEX idx_events_ts_brin
ON events USING brin (created_at)
WITH (pages_per_range = 32);
-- Works well when rows are inserted roughly in created_at order
Multi-Column GIN for JSONB
-- Index all keys and values inside a JSONB column
CREATE INDEX idx_metadata_gin
ON products USING gin (metadata jsonb_path_ops);
-- Supports containment queries
SELECT * FROM products WHERE metadata @> '{"color": "red"}';
Core Philosophy
Indexing is the art of giving the query optimizer viable access paths for the queries your application actually runs. An index is not a generic performance booster — it is a data structure tailored to a specific access pattern. The right index turns a 30-second full table scan into a 2-millisecond lookup. The wrong index wastes disk space, slows writes, and is never used by the optimizer. Effective indexing starts from the query workload, not from the table definition.
The fundamental tradeoff in indexing is read performance versus write overhead. Every index must be maintained on every INSERT, UPDATE, and DELETE that touches its columns. A table with fifteen indexes pays that maintenance cost fifteen times per write. The goal is not to index every column but to index the minimum set of columns required to cover the critical query paths. This requires measuring — checking slow query logs, examining pg_stat_statements, and running EXPLAIN ANALYZE — not guessing.
Column order in composite indexes is the most common source of indexing mistakes. The leftmost prefix rule means an index on (a, b, c) supports queries on (a), (a, b), and (a, b, c), but not (b) or (c) alone. The general principle is: equality columns first (highest selectivity among them), then range or sort columns last. Getting this order wrong means the index exists but the optimizer cannot use it, which is worse than having no index at all because it creates a false sense of coverage.
Anti-Patterns
-
Indexing every column "just in case" — over-indexing slows every write operation, inflates storage, and makes autovacuum and maintenance slower. Each index should be justified by a specific query pattern visible in the workload.
-
Creating indexes without checking if they are used — indexes that the optimizer never selects waste resources silently. Monitor
pg_stat_user_indexes(PostgreSQL) orsys.dm_db_index_usage_stats(SQL Server) and drop indexes with zero scans. -
Putting range columns before equality columns in composite indexes — an index on
(created_at, status)cannot efficiently handleWHERE status = 'active' AND created_at > '2024-01-01'. The range scan oncreated_atbreaks the ability to filter onstatus. Reverse the order. -
Using function-wrapped columns in WHERE without a matching expression index —
WHERE LOWER(email) = 'alice@example.com'cannot use a B-tree index onemail. Either create an expression index onLOWER(email)or rewrite the query to match the indexed expression. -
Creating indexes on large tables without CONCURRENTLY —
CREATE INDEXwithoutCONCURRENTLY(PostgreSQL) or equivalent takes an exclusive lock that blocks all writes for the duration of the build. On large tables, this can mean minutes of downtime.
Best Practices
- Start from slow query logs — index what is actually queried, not what might be queried. Use
pg_stat_statements, MySQL slow query log, or SQL Server Query Store. - Check usage statistics — monitor
pg_stat_user_indexesorsys.dm_db_index_usage_statsto find unused indexes and drop them. - Aim for index-only scans — use INCLUDE columns or widen the index to cover SELECT and WHERE columns.
- Keep indexes narrow — every additional column increases index size and write overhead.
- Use partial indexes for common filter conditions that exclude most rows (e.g.,
WHERE deleted_at IS NULL). - Rebuild or reindex periodically — B-tree indexes can bloat after heavy UPDATE/DELETE workloads.
-- PostgreSQL: concurrent reindex (no table lock)
REINDEX INDEX CONCURRENTLY idx_orders_tenant_status_date;
-- Check index bloat
SELECT
schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
- Use
CONCURRENTLYfor production index creation to avoid locking the table.
CREATE INDEX CONCURRENTLY idx_new ON big_table (col);
Common Pitfalls
- Indexing low-cardinality columns alone — a B-tree on a boolean column is almost never useful. Combine it with other columns or use a partial index.
- Wrong column order in composite indexes — putting the range column before equality columns prevents the index from being fully utilized.
- Over-indexing — every index slows INSERT, UPDATE, and DELETE. Tables with 15+ indexes are a red flag.
- Forgetting expression match — an index on
LOWER(email)is not used byWHERE email = 'Alice@example.com'. The query expression must match the index expression. - Not accounting for NULLs — in most engines, B-tree indexes include NULLs but
IS NOT NULLfilters may still cause scans. Partial indexes can help. - Assuming the optimizer always uses your index — the planner may choose a sequential scan if it estimates the index will touch too many rows. Check
EXPLAIN ANALYZEto verify. - Ignoring TOAST and wide columns — indexing a column that is TOASTed (compressed/out-of-line) can be counterproductive. Index a hash or prefix instead.
Install this skill directly: skilldb add sql-skills
Related Skills
Ctes Recursive
Common Table Expressions and recursive queries for hierarchical data, graph traversal, and complex query composition
JSON Operations
JSON storage, querying, indexing, and manipulation in PostgreSQL and MySQL for semi-structured data
Migration Patterns
Database schema migration strategies for safe, reversible, and zero-downtime changes to production databases
Query Optimization
Techniques for analyzing and optimizing SQL query performance using execution plans, statistics, and query rewrites
Stored Procedures
Stored procedures, functions, and triggers for encapsulating business logic and automating actions within the database
Transactions Isolation
Transaction management, ACID properties, isolation levels, and concurrency control in relational databases