PostgreSQL — Topic 11: Partitioning

11.1

Why Partition?

The problem that partitioning solves

▾

🐘 The Big Table Problem

Imagine your orders table at Swiggy has 2 billion rows — 5 years of order history. Every query slows down. Even a simple SELECT for today's orders has to scan millions of old rows it doesn't care about.

⚠️

Symptoms of a too-big table: Slow SELECT queries even with indexes, VACUUM takes hours, index bloat, backup takes forever, ALTER TABLE locks everything.

📦 Large Tables

When a table grows to hundreds of millions of rows, even B-tree indexes get slow because the index itself is huge. Partitioning physically splits data across separate storage files — PostgreSQL only opens the files it needs.

Without Partitioning — Full Table Scan

SELECT * FROM orders WHERE created_at = '2024-11-01';

🗄️  orders (2 billion rows — ONE giant file)
    ├─ 2020 data  (500M rows)  ← scanned but useless
    ├─ 2021 data  (400M rows)  ← scanned but useless
    ├─ 2022 data  (400M rows)  ← scanned but useless
    ├─ 2023 data  (350M rows)  ← scanned but useless
    └─ 2024 data  (350M rows)  ← only this matters!

Cost: scanning 1,650M rows you don't need 😭

With Partitioning — Partition Pruning

SELECT * FROM orders WHERE created_at = '2024-11-01';

🗄️  orders (parent — just a router)
    ├─ orders_2020  ← ❌ SKIPPED (pruned)
    ├─ orders_2021  ← ❌ SKIPPED (pruned)
    ├─ orders_2022  ← ❌ SKIPPED (pruned)
    ├─ orders_2023  ← ❌ SKIPPED (pruned)
    └─ orders_2024  ← ✅ Only this partition read!

Cost: scanning only 350M rows 🚀

🗃️ Archival Data

Old data (e.g., 2020 orders) is rarely touched. With partitioning you can move old partitions to cheaper, slower storage or even drop them entirely — instantly freeing space without slow DELETE operations.

❌ Deleting 1 year of old data

-- Takes hours, locks table,
-- bloats WAL, triggers VACUUM
DELETE FROM orders
WHERE created_at < '2021-01-01';

⏱ Time: ~3-8 hours on 500M rows
🔒 Locks: Yes
💾 WAL: Huge

✅ Dropping an old partition

-- Instant. No locking. No WAL bloat.
DROP TABLE orders_2020;

⏱ Time: < 1 second
🔒 Locks: None on other partitions
💾 WAL: Tiny metadata change

📊 Performance Wins Summary

Metric	Without Partitioning	With Partitioning
Query speed (date filter)	Slow — full scan	Fast — partition pruning
Index size	Huge single index	Small per-partition indexes
VACUUM	Hours — locks	Fast — per partition
Data archival	Slow DELETE	Instant DROP PARTITION
Parallel query	Limited	Each partition in parallel

11.2

Types of Partitioning

Range · List · Hash

▾

🗂️ Overview

PostgreSQL supports three partitioning strategies. You pick one based on how you query your data.

📅

Range Partitioning

Rows are divided by a continuous range of values — dates, IDs, timestamps.

✅ Best for: time-series, logs, orders by date, audit history

🏷️

List Partitioning

Rows are divided by explicit values from a list — country, status, region.

✅ Best for: country-based sharding, status categories, enums

🔢

Hash Partitioning

Rows are divided by hash of a column — evenly spread across N buckets.

✅ Best for: even load distribution, user_id sharding

📅 Range Partitioning

Each partition holds rows where the partition key falls within a specific range. Most common with dates. Here's a real-world orders table partitioned by year:

Range Partition Structure — orders by created_at

📦 orders (parent table — just metadata, no data)

▼ routes rows to ▼

Partition 1

orders_2022

2022-01-01 → 2022-12-31

Partition 2

orders_2023

2023-01-01 → 2023-12-31

Partition 3

orders_2024

2024-01-01 → 2024-12-31

Partition 4

orders_2025

2025-01-01 → 2025-12-31

-- Step 1: Create the parent table with PARTITION BY RANGE
CREATE TABLE orders (
    order_id    BIGSERIAL,
    user_id     BIGINT       NOT NULL,
    total       NUMERIC(10,2) NOT NULL,
    status      TEXT         NOT NULL,
    created_at  TIMESTAMPTZ  NOT NULL
) PARTITION BY RANGE (created_at);  -- ← partition key

-- Step 2: Create the actual partitions
CREATE TABLE orders_2022 PARTITION OF orders
    FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');

CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- Step 3: Insert rows — PostgreSQL routes them automatically!
INSERT INTO orders (user_id, total, status, created_at)
VALUES (101, 599.00, 'completed', '2024-06-15 10:30:00+05:30');
-- ↑ automatically goes into orders_2024 partition

-- Step 4: Query the parent — PostgreSQL only reads the right partition
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
-- Only reads orders_2024! ✅

💡

Range is EXCLUSIVE on the upper bound. FROM '2024-01-01' TO '2025-01-01' means up to but not including 2025-01-01. So 2024-12-31 23:59:59 is in this partition but 2025-01-01 00:00:00 is not.

You can also do monthly partitions for even finer granularity — common for high-volume logs:

-- Monthly partitions for API logs
CREATE TABLE api_logs (
    id         BIGSERIAL,
    endpoint   TEXT,
    user_id    BIGINT,
    latency_ms INTEGER,
    logged_at  TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (logged_at);

CREATE TABLE api_logs_2024_01 PARTITION OF api_logs
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE api_logs_2024_02 PARTITION OF api_logs
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- ... and so on for each month

-- Drop a month instantly (archive/cleanup)
DROP TABLE api_logs_2024_01;  -- instant! ⚡

🏷️ List Partitioning

Each partition holds rows matching a specific set of values. Great for country codes, regions, or status fields where the values are known and finite.

List Partition Structure — users by country

📦 users (parent table)

▼ routes by country_code ▼

Partition 1

users_india

'IN'

Partition 2

users_us

'US'

Partition 3

users_eu

'DE', 'FR', 'GB'

Partition 4

users_others

DEFAULT

-- Parent table
CREATE TABLE users (
    user_id      BIGSERIAL,
    name         TEXT    NOT NULL,
    email        TEXT    NOT NULL,
    country_code CHAR(2) NOT NULL,
    created_at   TIMESTAMPTZ
) PARTITION BY LIST (country_code);

-- One partition per major country/region
CREATE TABLE users_india PARTITION OF users
    FOR VALUES IN ('IN');

CREATE TABLE users_us PARTITION OF users
    FOR VALUES IN ('US');

-- Multiple values in one partition
CREATE TABLE users_eu PARTITION OF users
    FOR VALUES IN ('DE', 'FR', 'GB', 'IT', 'ES');

-- DEFAULT catches everything else
CREATE TABLE users_others PARTITION OF users
    DEFAULT;

-- Insert — auto-routed to users_india
INSERT INTO users (name, email, country_code)
VALUES ('Ravi Kumar', 'ravi@example.com', 'IN');

-- Query — only scans users_india
SELECT * FROM users WHERE country_code = 'IN';

-- Useful for GDPR: EU user data stays in its own partition
SELECT * FROM users WHERE country_code IN ('DE', 'FR');
-- Only scans users_eu ✅

🔢 Hash Partitioning

PostgreSQL hashes the partition key and uses modulo arithmetic to assign rows to buckets. This spreads rows evenly — unlike range/list where some partitions can be much bigger. Ideal when you want even load distribution with no natural range or list.

Hash Partition Structure — messages by user_id (4 buckets)

📦 messages (parent table)

▼ hash(user_id) % 4 ▼

Bucket 0

messages_p0

MODULUS 4, REMAINDER 0

Bucket 1

messages_p1

MODULUS 4, REMAINDER 1

Bucket 2

messages_p2

MODULUS 4, REMAINDER 2

Bucket 3

messages_p3

MODULUS 4, REMAINDER 3

-- Parent table — AI chat system messages
CREATE TABLE messages (
    message_id      BIGSERIAL,
    conversation_id BIGINT  NOT NULL,
    user_id         BIGINT  NOT NULL,
    content         TEXT    NOT NULL,
    created_at      TIMESTAMPTZ
) PARTITION BY HASH (user_id);

-- 4 equal-sized partitions (MODULUS = total buckets)
CREATE TABLE messages_p0 PARTITION OF messages
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE messages_p1 PARTITION OF messages
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE messages_p2 PARTITION OF messages
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE messages_p3 PARTITION OF messages
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

-- Insert — PostgreSQL decides which bucket automatically
INSERT INTO messages (conversation_id, user_id, content)
VALUES (5001, 1234, 'Hello, Claude!');
-- Goes to messages_p(hash(1234) % 4)

-- Query — if you filter by user_id, pruning works!
SELECT * FROM messages WHERE user_id = 1234;
-- Only scans one partition ✅

⚠️

Hash partitioning trade-off: You can't easily add more partitions later (changing MODULUS requires rewriting all data). Plan your bucket count upfront. 4 or 8 buckets is typical for most use cases.

📊 Choosing the Right Type

Type	Use When	Partition Pruning	Common Use Case
RANGE	Data has natural ordering (dates, IDs)	Excellent	Orders, logs, events, audit tables
LIST	Data belongs to finite categories	Excellent	Country, region, status, tenant_id
HASH	No natural range/list, need even spread	Good (on key)	User data, message tables, general sharding

11.3

Partition Pruning

How PostgreSQL skips irrelevant partitions automatically

▾

🔍 What Is Pruning?

Partition pruning is the query planner's ability to skip entire partitions that can't possibly contain matching rows — before even reading a single byte of data. It's the #1 reason partitioning is fast.

✅

Pruning is automatic. Your query doesn't change — you still SELECT FROM orders. PostgreSQL reads the partition bounds and skips the ones that don't match your WHERE clause.

🔬 Seeing Pruning with EXPLAIN

Use EXPLAIN to verify that pruning is happening. You should see only the relevant partitions listed, with others marked as never executed or not shown at all.

-- Check which partitions a query uses
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT order_id, total
FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';

-- EXPLAIN output will show:
-- Append (cost=...)
--   -> Seq Scan on orders_2024 (cost=...)
--        Filter: ((created_at >= '2024-01-01') AND ...)
-- 
-- orders_2020, orders_2021, orders_2022, orders_2023 
-- are NOT in the plan — PRUNED! ✅

-- Pruning works with IN lists too
EXPLAIN SELECT * FROM users
WHERE country_code IN ('IN', 'US');
-- Only users_india and users_us scanned ✅

-- Pruning works with = on hash partition key
EXPLAIN SELECT * FROM messages
WHERE user_id = 1234;
-- Only the one matching hash bucket scanned ✅

-- ⚠️ Pruning does NOT work without the partition key in WHERE
EXPLAIN SELECT * FROM orders WHERE total > 500;
-- ALL partitions scanned — no partition key used ⚠️

⚡ Runtime Pruning

PostgreSQL also supports runtime pruning — skipping partitions even when the filter value is only known at execution time (e.g., from a parameter or subquery).

-- Enable runtime pruning (on by default in PG 11+)
SET enable_partition_pruning = on;

-- This also gets pruned at runtime:
PREPARE get_orders(date) AS
  SELECT * FROM orders WHERE created_at::date = $1;

EXECUTE get_orders('2024-06-15');
-- PostgreSQL prunes at execution time — only orders_2024 read ✅

📋 Pruning Rules — Quick Reference

Scenario	Pruning?	Why
`WHERE created_at = '2024-01-15'`	Yes	Exact match on partition key
`WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'`	Yes	Range matches partition bounds
`WHERE country_code = 'IN'`	Yes	List partition key exact match
`WHERE user_id = 1234`	Yes	Hash key resolves to one bucket
`WHERE total > 500` (no partition key)	No	Not the partition key column
`WHERE EXTRACT(year FROM created_at) = 2024`	No	Function wrapping prevents pruning

🚨

Avoid wrapping the partition key in functions! WHERE YEAR(created_at) = 2024 defeats pruning. Use WHERE created_at BETWEEN '2024-01-01' AND '2025-01-01' instead.

11.4

Local Indexes on Partitions

Indexes per partition for blazing fast queries

▾

📌 Global vs Local Indexes

In PostgreSQL's declarative partitioning, indexes are local — each partition has its own index. When you create an index on the parent table, PostgreSQL automatically creates matching indexes on all partitions.

💡

Local indexes = smaller, faster. Instead of one massive B-tree index over 2 billion rows, you have many small B-tree indexes over 500M rows each. Each is much more cache-friendly.

🛠️ Creating Indexes on Partitioned Tables

-- Create index on parent — propagates to ALL partitions automatically
CREATE INDEX idx_orders_user_id
    ON orders (user_id);
-- ↑ Creates idx_orders_2022_user_id, idx_orders_2023_user_id, etc.

-- Create index on just one partition (if needed)
CREATE INDEX idx_orders_2024_status
    ON orders_2024 (status);

-- Partial index on a partition — only active orders in 2024
CREATE INDEX idx_orders_2024_active
    ON orders_2024 (user_id)
    WHERE status = 'active';

-- Covering index — include total to avoid heap access
CREATE INDEX idx_orders_user_created
    ON orders (user_id, created_at)
    INCLUDE (total, status);
-- Propagates to all partitions ✅

-- View all indexes for partitioned table
SELECT
    schemaname,
    tablename,
    indexname
FROM pg_indexes
WHERE tablename LIKE 'orders%'
ORDER BY tablename, indexname;

🚀 CONCURRENTLY on Partitions

Building indexes on partitioned tables without blocking requires CONCURRENTLY — but with a twist:

-- ⚠️ CREATE INDEX CONCURRENTLY does NOT work on the parent directly
-- Do it partition by partition:

CREATE INDEX CONCURRENTLY idx_orders_2022_user
    ON orders_2022 (user_id);

CREATE INDEX CONCURRENTLY idx_orders_2023_user
    ON orders_2023 (user_id);

CREATE INDEX CONCURRENTLY idx_orders_2024_user
    ON orders_2024 (user_id);

-- This way production is never blocked ✅

🔑 Primary Keys on Partitions

The primary key must include the partition key in PostgreSQL (as of PG 11+). This is because uniqueness is enforced locally per partition:

-- ✅ CORRECT — partition key (created_at) included in PK
CREATE TABLE orders (
    order_id    BIGSERIAL,
    user_id     BIGINT      NOT NULL,
    total       NUMERIC(10,2),
    created_at  TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (order_id, created_at)  -- partition key included!
) PARTITION BY RANGE (created_at);

-- ❌ WRONG — order_id alone is not allowed as PK on partitioned table
-- PRIMARY KEY (order_id)  ← ERROR in PostgreSQL

-- Workaround: use BIGSERIAL + application-level unique IDs (UUID)
CREATE TABLE orders (
    order_id    UUID DEFAULT gen_random_uuid(),
    user_id     BIGINT,
    created_at  TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (order_id, created_at)
) PARTITION BY RANGE (created_at);

📊 Index Size Benefit

Single index (2B rows)Huge — slow cache fill

4 local indexes (500M rows each)4× smaller — fits in memory

Active partition index only (recent data)Tiny — always in cache

11.5

Interactive: Partition Planner

See which SQL to write for your use case

▾

🧪 Try It: Generate Partition DDL

Fill in your table details and see the exact SQL to create a partitioned table.

🗂️ Partition DDL Generator

Table name:

Partition type:

Start year: End year:

Click "Generate SQL" to see the DDL →

🔍 Partition Pruning Checker

Check if your WHERE clause will trigger partition pruning.

🔬 Will My Query Prune?

Partition type:

WHERE clause:

Enter a WHERE clause and click Check →

11.6

Real-World Partitioning Patterns

How production systems use partitioning

▾

📦 E-Commerce: Orders by Month

Swiggy/Zomato style — partition orders by month, auto-create next month's partition, drop old ones for archival.

-- Monthly partitioned orders table
CREATE TABLE orders (
    order_id   UUID DEFAULT gen_random_uuid(),
    user_id    BIGINT      NOT NULL,
    restaurant_id BIGINT   NOT NULL,
    total      NUMERIC(8,2) NOT NULL,
    status     TEXT        NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (order_id, created_at)
) PARTITION BY RANGE (created_at);

-- Create partitions per month
CREATE TABLE orders_2025_01 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE orders_2025_02 PARTITION OF orders
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- Index each partition on user_id for fast user history
CREATE INDEX ON orders_2025_01 (user_id, created_at);
CREATE INDEX ON orders_2025_02 (user_id, created_at);

-- Typical queries (fast due to pruning)
SELECT * FROM orders
WHERE user_id = 42
  AND created_at >= '2025-01-01'
  AND created_at < '2025-03-01';
-- Reads only orders_2025_01 and orders_2025_02 ✅

-- Drop old partition (instant archival)
DROP TABLE orders_2024_01;  -- instant ⚡

🤖 AI SaaS: Conversations by Tenant

Multi-tenant AI platform — list partition by org_id so each customer's data is physically isolated. Combine with Row Level Security.

-- Conversations partitioned by organization
CREATE TABLE conversations (
    convo_id   UUID DEFAULT gen_random_uuid(),
    org_id     BIGINT      NOT NULL,
    user_id    BIGINT      NOT NULL,
    title      TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY (convo_id, org_id)
) PARTITION BY LIST (org_id);

-- Each major org gets its own partition
CREATE TABLE conversations_org_1001 PARTITION OF conversations
    FOR VALUES IN (1001);

CREATE TABLE conversations_org_1002 PARTITION OF conversations
    FOR VALUES IN (1002);

-- Small orgs share a default partition
CREATE TABLE conversations_default PARTITION OF conversations
    DEFAULT;

-- Query for one org — only its partition is scanned
SELECT * FROM conversations
WHERE org_id = 1001
ORDER BY created_at DESC;
-- Only reads conversations_org_1001 ✅

📡 IoT / Telemetry: Hybrid Partitioning

You can combine partitioning with sub-partitioning — partition first by year, then by device type within each year:

-- Parent: partitioned by year (RANGE)
CREATE TABLE sensor_readings (
    reading_id  BIGSERIAL,
    device_id   BIGINT      NOT NULL,
    device_type TEXT        NOT NULL,
    value       DOUBLE PRECISION,
    recorded_at TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (reading_id, device_type, recorded_at)
) PARTITION BY RANGE (recorded_at);

-- 2024 partition, sub-partitioned by device_type (LIST)
CREATE TABLE sensor_readings_2024
    PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
    PARTITION BY LIST (device_type);

-- Sub-partitions within 2024
CREATE TABLE sensor_readings_2024_temperature
    PARTITION OF sensor_readings_2024
    FOR VALUES IN ('temperature');

CREATE TABLE sensor_readings_2024_humidity
    PARTITION OF sensor_readings_2024
    FOR VALUES IN ('humidity');

-- Query uses both pruning levels!
SELECT * FROM sensor_readings
WHERE recorded_at >= '2024-06-01'
  AND device_type = 'temperature';
-- Only reads sensor_readings_2024_temperature ✅✅

💡 Production Best Practices

1

Always include the partition key in WHERE clauses

No partition key = no pruning = full scan across all partitions. The whole point is wasted.

2

Pre-create future partitions

Inserting a row with no matching partition throws an error. Create next month's partition at the start of the month (or use a cron job).

3

Default partition catches unmapped data

Always have a DEFAULT partition so unexpected values don't cause insert errors.

4

Don't over-partition

Too many partitions (1000+) slow down the query planner. Monthly partitions for 10 years = 120 — that's fine. Daily for 10 years = 3650 — that's too many.

5

VACUUM runs per partition

Autovacuum works on each partition independently and in parallel — much faster than a single huge table.