Postgres Replication
Logical and streaming replication in PostgreSQL for high availability and data distribution
You are an expert in PostgreSQL replication strategies including streaming replication, logical replication, and high-availability configurations. ## Key Points - Use replication slots to prevent WAL removal before standbys have consumed it, but monitor slot lag to avoid unbounded WAL growth. - Set `max_slot_wal_keep_size` (v13+) to cap WAL retention from a stalled slot. - For high availability, use synchronous replication with at least two standbys and `ANY 1` quorum to tolerate one standby failure without blocking writes. - Monitor replication lag continuously and alert when it exceeds your RPO (Recovery Point Objective). - Use logical replication for zero-downtime major version upgrades: replicate to the new version, then switch over. - Set `wal_level = logical` even if you only need streaming replication today — it is a superset and enables logical replication later without a restart. - Use `pg_basebackup` with `--wal-method=stream` to create consistent base backups without manual WAL archiving. - **WAL bloat from inactive replication slots**: An inactive slot causes the primary to retain all WAL since the slot was last consumed. This can fill the disk. Always drop unused slots. - **Logical replication and DDL**: Logical replication does not replicate DDL (schema changes). You must apply DDL manually on the subscriber before the publisher, or the replication will break. - **Sequences not replicated**: Logical replication does not sync sequences. After failover, you must manually advance sequences on the new primary. - **Large transactions and logical replication**: Very large transactions are decoded in memory. Set `logical_decoding_work_mem` to control memory usage. - **Hot standby feedback**: Without `hot_standby_feedback = on`, long queries on a standby can conflict with WAL replay when the primary vacuums rows the standby query still needs. ## Quick Example ``` wal_level = replica -- minimum for streaming replication max_wal_senders = 10 -- max concurrent replication connections wal_keep_size = 1GB -- WAL retention for slow replicas synchronous_standby_names = '' -- empty = async, set names for sync ``` ```sql -- Create a dedicated replication user CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password'; ```
skilldb get postgres-skills/Postgres ReplicationFull skill: 224 linesReplication — PostgreSQL
You are an expert in PostgreSQL replication strategies including streaming replication, logical replication, and high-availability configurations.
Core Philosophy
Overview
PostgreSQL supports two primary replication mechanisms: streaming replication (physical, byte-for-byte WAL shipping) and logical replication (row-level change decoding). Streaming replication is used for high availability and read scaling with identical replicas. Logical replication enables selective table replication, cross-version upgrades, and data distribution to heterogeneous systems.
Core Concepts
Streaming Replication (Physical)
Streaming replication sends WAL (Write-Ahead Log) records from the primary to one or more standby servers, creating exact byte-level copies.
Primary server configuration (postgresql.conf):
wal_level = replica -- minimum for streaming replication
max_wal_senders = 10 -- max concurrent replication connections
wal_keep_size = 1GB -- WAL retention for slow replicas
synchronous_standby_names = '' -- empty = async, set names for sync
Replication user and access (pg_hba.conf):
-- Create a dedicated replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';
# pg_hba.conf
host replication replicator 10.0.0.0/24 scram-sha-256
Standby setup:
# Take a base backup from the primary
pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/data \
--wal-method=stream --checkpoint=fast --progress
# Create standby signal file
touch /var/lib/postgresql/data/standby.signal
Standby configuration (postgresql.conf):
primary_conninfo = 'host=primary-host port=5432 user=replicator password=secure_password'
hot_standby = on -- allow read queries on the standby
Synchronous Replication
# On the primary: require at least 1 sync standby
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'
# Or use ANY for quorum-based
synchronous_standby_names = 'ANY 2 (standby1, standby2, standby3)'
-- Check replication status from the primary
SELECT client_addr, state, sync_state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;
Logical Replication
Logical replication decodes WAL into logical row changes and applies them on the subscriber, allowing selective table replication and cross-version setups.
Publisher (source):
-- postgresql.conf: wal_level = logical
-- Create a publication
CREATE PUBLICATION my_pub FOR TABLE orders, customers;
-- Or publish all tables
CREATE PUBLICATION all_tables_pub FOR ALL TABLES;
-- Publish only INSERT operations
CREATE PUBLICATION inserts_only FOR TABLE events
WITH (publish = 'insert');
-- Publish with row filter (PostgreSQL 15+)
CREATE PUBLICATION filtered_pub FOR TABLE orders
WHERE (region = 'us-east');
-- Publish specific columns (PostgreSQL 15+)
CREATE PUBLICATION partial_pub FOR TABLE users (id, name, email);
Subscriber (destination):
-- Create the target table (must exist with compatible schema)
CREATE TABLE orders (LIKE orders_on_publisher INCLUDING ALL);
-- Create the subscription
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=publisher-host port=5432 dbname=mydb user=replicator password=secure_password'
PUBLICATION my_pub;
-- Check subscription status
SELECT * FROM pg_stat_subscription;
Implementation Patterns
Promoting a Standby to Primary
# Using pg_ctl
pg_ctl promote -D /var/lib/postgresql/data
# Using SQL (PostgreSQL 12+)
SELECT pg_promote(wait => true, wait_seconds => 60);
Replication Slots
Replication slots ensure the primary retains WAL until it has been consumed by the standby:
-- Physical replication slot
SELECT pg_create_physical_replication_slot('standby1_slot');
-- Logical replication slot (created automatically by subscriptions)
SELECT pg_create_logical_replication_slot('my_slot', 'pgoutput');
-- Monitor slot lag
SELECT slot_name, slot_type, active,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes
FROM pg_replication_slots;
-- Drop an inactive slot to prevent WAL bloat
SELECT pg_drop_replication_slot('old_standby_slot');
Monitoring Replication Lag
-- On the primary: check replication status
SELECT
client_addr,
state,
sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag_bytes,
replay_lag -- actual time lag (PostgreSQL 10+)
FROM pg_stat_replication;
-- On the standby: check how far behind
SELECT
now() - pg_last_xact_replay_timestamp() AS replication_delay,
pg_is_in_recovery() AS is_standby,
pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn();
Cascading Replication
Primary -> Standby1 (sync) -> Standby2 (cascaded, async)
On Standby2's postgresql.conf:
primary_conninfo = 'host=standby1-host port=5432 user=replicator'
Adding Tables to Logical Replication
-- On publisher: add table to existing publication
ALTER PUBLICATION my_pub ADD TABLE new_table;
-- On subscriber: refresh subscription to pick up new tables
ALTER SUBSCRIPTION my_sub REFRESH PUBLICATION;
Best Practices
- Use replication slots to prevent WAL removal before standbys have consumed it, but monitor slot lag to avoid unbounded WAL growth.
- Set
max_slot_wal_keep_size(v13+) to cap WAL retention from a stalled slot. - For high availability, use synchronous replication with at least two standbys and
ANY 1quorum to tolerate one standby failure without blocking writes. - Monitor replication lag continuously and alert when it exceeds your RPO (Recovery Point Objective).
- Use logical replication for zero-downtime major version upgrades: replicate to the new version, then switch over.
- Set
wal_level = logicaleven if you only need streaming replication today — it is a superset and enables logical replication later without a restart. - Use
pg_basebackupwith--wal-method=streamto create consistent base backups without manual WAL archiving.
Common Pitfalls
- WAL bloat from inactive replication slots: An inactive slot causes the primary to retain all WAL since the slot was last consumed. This can fill the disk. Always drop unused slots.
- Logical replication and DDL: Logical replication does not replicate DDL (schema changes). You must apply DDL manually on the subscriber before the publisher, or the replication will break.
- Sequences not replicated: Logical replication does not sync sequences. After failover, you must manually advance sequences on the new primary.
- Large transactions and logical replication: Very large transactions are decoded in memory. Set
logical_decoding_work_memto control memory usage. - Hot standby feedback: Without
hot_standby_feedback = on, long queries on a standby can conflict with WAL replay when the primary vacuums rows the standby query still needs. - Network partitions with synchronous replication: If all synchronous standbys become unreachable, writes on the primary will hang indefinitely. Plan for this with timeouts or automatic failover tools.
- Subscriber applying changes slowly: If the subscriber cannot keep up, the publisher retains WAL and the lag grows. Monitor
pg_stat_subscriptionand optimize the subscriber's performance.
Anti-Patterns
Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.
Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.
Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.
Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.
Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.
Install this skill directly: skilldb add postgres-skills
Related Skills
Postgres Extensions
Key PostgreSQL extensions including pgvector, PostGIS, pg_cron, and other essential add-ons
Postgres Full Text Search
Full-text search in PostgreSQL using tsvector, tsquery, ranking, and GIN indexes
Postgres Partitioning
Table partitioning strategies in PostgreSQL including range, list, and hash partitioning
Postgres Row Level Security
Row-level security policies in PostgreSQL for fine-grained access control on table rows
Postgres Triggers
Triggers and PL/pgSQL functions in PostgreSQL for automated data processing and integrity enforcement
Adversarial Code Review
Adversarial implementation review methodology that validates code completeness against requirements with fresh objectivity. Uses a coach-player dialectical loop to catch real gaps in security, logic, and data flow.