Technology & EngineeringPostgres224 lines

Postgres Replication

Logical and streaming replication in PostgreSQL for high availability and data distribution

Quick Summary32 lines

You are an expert in PostgreSQL replication strategies including streaming replication, logical replication, and high-availability configurations.

## Key Points

- Use replication slots to prevent WAL removal before standbys have consumed it, but monitor slot lag to avoid unbounded WAL growth.
- Set `max_slot_wal_keep_size` (v13+) to cap WAL retention from a stalled slot.
- For high availability, use synchronous replication with at least two standbys and `ANY 1` quorum to tolerate one standby failure without blocking writes.
- Monitor replication lag continuously and alert when it exceeds your RPO (Recovery Point Objective).
- Use logical replication for zero-downtime major version upgrades: replicate to the new version, then switch over.
- Set `wal_level = logical` even if you only need streaming replication today — it is a superset and enables logical replication later without a restart.
- Use `pg_basebackup` with `--wal-method=stream` to create consistent base backups without manual WAL archiving.
- **WAL bloat from inactive replication slots**: An inactive slot causes the primary to retain all WAL since the slot was last consumed. This can fill the disk. Always drop unused slots.
- **Logical replication and DDL**: Logical replication does not replicate DDL (schema changes). You must apply DDL manually on the subscriber before the publisher, or the replication will break.
- **Sequences not replicated**: Logical replication does not sync sequences. After failover, you must manually advance sequences on the new primary.
- **Large transactions and logical replication**: Very large transactions are decoded in memory. Set `logical_decoding_work_mem` to control memory usage.
- **Hot standby feedback**: Without `hot_standby_feedback = on`, long queries on a standby can conflict with WAL replay when the primary vacuums rows the standby query still needs.

## Quick Example

```
wal_level = replica -- minimum for streaming replication
max_wal_senders = 10 -- max concurrent replication connections
wal_keep_size = 1GB -- WAL retention for slow replicas
synchronous_standby_names = '' -- empty = async, set names for sync
```

```sql
-- Create a dedicated replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';
```

skilldb get postgres-skills/Postgres ReplicationFull skill: 224 lines

Paste into your CLAUDE.md or agent config

Replication — PostgreSQL

You are an expert in PostgreSQL replication strategies including streaming replication, logical replication, and high-availability configurations.

Core Philosophy

Overview

PostgreSQL supports two primary replication mechanisms: streaming replication (physical, byte-for-byte WAL shipping) and logical replication (row-level change decoding). Streaming replication is used for high availability and read scaling with identical replicas. Logical replication enables selective table replication, cross-version upgrades, and data distribution to heterogeneous systems.

Core Concepts

Streaming Replication (Physical)

Streaming replication sends WAL (Write-Ahead Log) records from the primary to one or more standby servers, creating exact byte-level copies.

Primary server configuration (postgresql.conf):

wal_level = replica                 -- minimum for streaming replication
max_wal_senders = 10                -- max concurrent replication connections
wal_keep_size = 1GB                 -- WAL retention for slow replicas
synchronous_standby_names = ''      -- empty = async, set names for sync

Replication user and access (pg_hba.conf):

-- Create a dedicated replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

# pg_hba.conf
host    replication     replicator    10.0.0.0/24    scram-sha-256

Standby setup:

# Take a base backup from the primary
pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/data \
    --wal-method=stream --checkpoint=fast --progress

# Create standby signal file
touch /var/lib/postgresql/data/standby.signal

Standby configuration (postgresql.conf):

primary_conninfo = 'host=primary-host port=5432 user=replicator password=secure_password'
hot_standby = on                    -- allow read queries on the standby

Synchronous Replication

# On the primary: require at least 1 sync standby
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'

# Or use ANY for quorum-based
synchronous_standby_names = 'ANY 2 (standby1, standby2, standby3)'

-- Check replication status from the primary
SELECT client_addr, state, sync_state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

Logical Replication

Logical replication decodes WAL into logical row changes and applies them on the subscriber, allowing selective table replication and cross-version setups.

Publisher (source):

-- postgresql.conf: wal_level = logical

-- Create a publication
CREATE PUBLICATION my_pub FOR TABLE orders, customers;

-- Or publish all tables
CREATE PUBLICATION all_tables_pub FOR ALL TABLES;

-- Publish only INSERT operations
CREATE PUBLICATION inserts_only FOR TABLE events
    WITH (publish = 'insert');

-- Publish with row filter (PostgreSQL 15+)
CREATE PUBLICATION filtered_pub FOR TABLE orders
    WHERE (region = 'us-east');

-- Publish specific columns (PostgreSQL 15+)
CREATE PUBLICATION partial_pub FOR TABLE users (id, name, email);

Subscriber (destination):

-- Create the target table (must exist with compatible schema)
CREATE TABLE orders (LIKE orders_on_publisher INCLUDING ALL);

-- Create the subscription
CREATE SUBSCRIPTION my_sub
    CONNECTION 'host=publisher-host port=5432 dbname=mydb user=replicator password=secure_password'
    PUBLICATION my_pub;

-- Check subscription status
SELECT * FROM pg_stat_subscription;

Implementation Patterns

Promoting a Standby to Primary

# Using pg_ctl
pg_ctl promote -D /var/lib/postgresql/data

# Using SQL (PostgreSQL 12+)
SELECT pg_promote(wait => true, wait_seconds => 60);

Replication Slots

Replication slots ensure the primary retains WAL until it has been consumed by the standby:

-- Physical replication slot
SELECT pg_create_physical_replication_slot('standby1_slot');

-- Logical replication slot (created automatically by subscriptions)
SELECT pg_create_logical_replication_slot('my_slot', 'pgoutput');

-- Monitor slot lag
SELECT slot_name, slot_type, active,
       pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes
FROM pg_replication_slots;

-- Drop an inactive slot to prevent WAL bloat
SELECT pg_drop_replication_slot('old_standby_slot');

Monitoring Replication Lag

-- On the primary: check replication status
SELECT
    client_addr,
    state,
    sync_state,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag_bytes,
    replay_lag  -- actual time lag (PostgreSQL 10+)
FROM pg_stat_replication;

-- On the standby: check how far behind
SELECT
    now() - pg_last_xact_replay_timestamp() AS replication_delay,
    pg_is_in_recovery() AS is_standby,
    pg_last_wal_receive_lsn(),
    pg_last_wal_replay_lsn();

Cascading Replication

Primary -> Standby1 (sync) -> Standby2 (cascaded, async)

On Standby2's postgresql.conf:

primary_conninfo = 'host=standby1-host port=5432 user=replicator'

Adding Tables to Logical Replication

-- On publisher: add table to existing publication
ALTER PUBLICATION my_pub ADD TABLE new_table;

-- On subscriber: refresh subscription to pick up new tables
ALTER SUBSCRIPTION my_sub REFRESH PUBLICATION;

Best Practices

Use replication slots to prevent WAL removal before standbys have consumed it, but monitor slot lag to avoid unbounded WAL growth.
Set max_slot_wal_keep_size (v13+) to cap WAL retention from a stalled slot.
For high availability, use synchronous replication with at least two standbys and ANY 1 quorum to tolerate one standby failure without blocking writes.
Monitor replication lag continuously and alert when it exceeds your RPO (Recovery Point Objective).
Use logical replication for zero-downtime major version upgrades: replicate to the new version, then switch over.
Set wal_level = logical even if you only need streaming replication today — it is a superset and enables logical replication later without a restart.
Use pg_basebackup with --wal-method=stream to create consistent base backups without manual WAL archiving.

Common Pitfalls

WAL bloat from inactive replication slots: An inactive slot causes the primary to retain all WAL since the slot was last consumed. This can fill the disk. Always drop unused slots.
Logical replication and DDL: Logical replication does not replicate DDL (schema changes). You must apply DDL manually on the subscriber before the publisher, or the replication will break.
Sequences not replicated: Logical replication does not sync sequences. After failover, you must manually advance sequences on the new primary.
Large transactions and logical replication: Very large transactions are decoded in memory. Set logical_decoding_work_mem to control memory usage.
Hot standby feedback: Without hot_standby_feedback = on, long queries on a standby can conflict with WAL replay when the primary vacuums rows the standby query still needs.
Network partitions with synchronous replication: If all synchronous standbys become unreachable, writes on the primary will hang indefinitely. Plan for this with timeouts or automatic failover tools.
Subscriber applying changes slowly: If the subscriber cannot keep up, the publisher retains WAL and the lag grows. Monitor pg_stat_subscription and optimize the subscriber's performance.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add postgres-skills

Get CLI access →