Replication
How PostgreSQL copies data across servers in real-time — the backbone of high availability, disaster recovery, and read scaling in production systems.
WAL — Write Ahead Log
The foundation everything else is built on
What is WAL?
Before PostgreSQL writes anything to the actual data files on disk, it first writes a record of what it's about to do in a special log file. That log is called the Write Ahead Log (WAL). It's the "paper trail" of every change.
WAL serves two major purposes:
WAL Segments
WAL is not one giant file — it's divided into fixed-size files called WAL segments. Each segment is 16 MB by default.
The filename encodes three things: timeline, log file number, and segment number. You don't need to decode it manually — PostgreSQL manages these automatically.
Checkpoints
A checkpoint is when PostgreSQL flushes all dirty (modified) pages from memory to the actual data files on disk, and records a checkpoint LSN in WAL. After a checkpoint, WAL segments older than that LSN can safely be recycled.
Streaming Replication
Primary → Replica in near real-time
How Streaming Replication Works
In streaming replication, the replica server connects to the primary over a TCP connection and continuously receives WAL records as they are generated — just like a live stream. The replica replays those records to keep its data identical to the primary.
Primary Configuration
On the primary server, you need to enable WAL streaming and create a replication user.
Replica Configuration
The replica needs to know where the primary is. This is done via a connection string.
recovery.conf.Monitoring Replication
Always monitor replication lag — if the replica is falling behind, it might not be safe to failover to it.
Synchronous vs Asynchronous Replication
By default, streaming replication is asynchronous — the primary confirms a transaction commit without waiting for the replica. This is fast but means a small window of potential data loss on failover.
-- Primary WAITS for replica to confirm -- before returning success to client. synchronous_standby_names = 'replica1' Pros: Zero data loss on failover Cons: Latency increases (network RTT added) Use: Banking, payment systems
-- Primary does NOT wait for replica.
-- Returns success immediately.
synchronous_standby_names = '' (default)
Pros: No latency added
Cons: Small risk of losing last few
transactions on failover
Use: Most web apps, read scaling
Replication Slots
A replication slot ensures PostgreSQL doesn't delete WAL segments until the replica has consumed them. This is crucial for slow replicas that might otherwise be "left behind" when old WAL is recycled.
Logical Replication
Replicate individual tables — not the whole cluster
Streaming vs Logical — What's the Difference?
| Feature | Streaming Replication | Logical Replication |
|---|---|---|
| What it replicates | Entire PostgreSQL cluster (all databases) | Selected tables within a database |
| Replica must be same PG version | Yes | No (can be different PG version) |
| Replica can have extra indexes/tables | No | Yes |
| Replica can be written to | No (read-only) | Yes (other tables) |
| Use case | HA failover, read replicas | Cross-version migrations, analytics, partial replication |
| Performance overhead | Low | Slightly higher (decoding WAL) |
Publications
A publication is defined on the publisher (source) side. It specifies which tables and operations (INSERT, UPDATE, DELETE) to replicate.
Subscriptions
A subscription is defined on the subscriber (destination) side. It connects to the publisher and pulls the changes.
Real-World Use Case: Zero-Downtime Major Version Upgrade
This is the most common practical use of logical replication — upgrading from PostgreSQL 14 → 16 with zero downtime.
CREATE PUBLICATION upgrade_pub FOR ALL TABLES;
pg_stat_subscription until the new server is fully caught up.
Replication Decision Tool
Which type of replication should you use?
Interactive Advisor
Answer a few questions and get a replication recommendation with the exact SQL to use.
🔧 Replication Advisor Interactive
WAL Level Quick Reference
| wal_level | Supports | When to use |
|---|---|---|
minimal |
Crash recovery only | Standalone server, no replication |
replica |
Streaming replication | HA setup, read replicas |
logical |
Streaming + Logical replication | Cross-version migration, CDC, partial replication |
wal_level = replica by default. Only switch to logical when you need logical replication or CDC (Change Data Capture) tools like Debezium.Full Setup Walkthrough
Complete primary + replica setup from scratch
Complete Streaming Replication Setup
Here is the full step-by-step to set up streaming replication between two servers: Primary: 10.0.0.1 and Replica: 10.0.0.2
Failover — Promoting a Replica to Primary
When the primary fails, you need to promote the replica to become the new primary.
pg_basebackup from the new primary, or use a tool like Patroni which manages this automatically.Production Tools for Replication Management
In real production systems, you rarely manage replication manually. These tools automate failover, leader election, and replica management:
Common Production Patterns
How real companies architect PostgreSQL replication
Pattern 1: Primary + Hot Standby (HA)
Most common pattern. One primary handles all writes. One or more hot standbys are ready to take over on failure.
Pattern 2: Read Scaling with Multiple Replicas
Distribute read-heavy queries (reports, dashboards) across replicas to offload the primary.
Pattern 3: Logical Replication for CDC / Analytics
Stream changes from your OLTP PostgreSQL into a data warehouse or Kafka, without impacting the primary.