PostgreSQL Data Types
Every column needs a type. Choosing the right type saves storage, enables powerful operators, and unlocks index superpowers you cannot get elsewhere.
Numeric Types
Integers, decimals, and floating-point numbers
Integer Types
PostgreSQL offers three integer sizes. Pick the smallest one that fits your data range — smaller types mean faster scans and less storage.
| Type | Storage | Min Value | Max Value | Best Use |
|---|---|---|---|---|
smallint |
2 bytes | -32,768 | 32,767 | Status codes, small counters |
integer / int |
4 bytes | -2.1 billion | 2.1 billion | Most IDs, counts |
bigint |
8 bytes | -9.2 quintillion | 9.2 quintillion | User IDs, large counters, timestamps in ms |
integer for most IDs. Switch to bigint when a table grows beyond 2 billion rows, or when you store Twitter-style snowflake IDs / epoch milliseconds.
-- Table showing when to use each integer size CREATE TABLE demo_integers ( rating SMALLINT, -- e.g. 1-5 star rating product_id INTEGER, -- typical surrogate key event_id BIGINT -- high-volume event log ); -- Integer arithmetic SELECT 10::INTEGER / 3; -- → 3 (integer division!) SELECT 10::NUMERIC / 3; -- → 3.333… (use NUMERIC for exact fractions)
Exact Decimals — NUMERIC / DECIMAL
NUMERIC(precision, scale) stores exact decimal values. It is mandatory for money, tax, scientific measurements — anything where rounding errors are unacceptable.
e.g.
NUMERIC(10,2) → up to 10 digitse.g.
NUMERIC(10,2) → 2 decimal placesCREATE TABLE payments ( amount NUMERIC(12, 2), -- e.g. 9999999999.99 tax_rate NUMERIC(5, 4) -- e.g. 0.1875 (18.75%) ); INSERT INTO payments VALUES (1999.99, 0.1800); -- NEVER use FLOAT for money! SELECT 0.1::FLOAT + 0.2::FLOAT; -- → 0.30000000000000004 ❌ SELECT 0.1::NUMERIC + 0.2::NUMERIC; -- → 0.3 ✅
Floating-Point — REAL & DOUBLE PRECISION
These are IEEE 754 floating-point numbers. Fast but inexact. Use for scientific calculations, ML feature scores, or statistics where tiny rounding errors don't matter.
| Type | Storage | Precision | Use When |
|---|---|---|---|
real |
4 bytes | ~6 decimal digits | Sensor data, low-precision scores |
double precision |
8 bytes | ~15 decimal digits | Scientific data, ML embeddings, coordinates |
CREATE TABLE sensor_readings ( temperature REAL, -- 24.7°C (close enough) ml_score DOUBLE PRECISION -- 0.9237481920... ); -- Special float values SELECT 'Infinity'::FLOAT; SELECT 'NaN'::FLOAT; -- Not a Number
Interactive: Pick the Right Numeric Type
🧠 What are you storing?
Character Types
char, varchar, and text — and why text wins
Overview
PostgreSQL has three text storage types. In practice, text is almost always the best choice — but knowing the differences matters for legacy code and interviews.
| Type | Length | Padding? | Use |
|---|---|---|---|
char(n) |
Exactly n chars | Yes – space padded | Fixed codes (country codes 'IN', 'US') |
varchar(n) |
Up to n chars | No | When you want an enforced max length |
text |
Unlimited | No | Everything else — emails, names, content |
varchar and text use the exact same storage. The only difference is the optional length constraint. Use text by default and add a CHECK constraint if you need a length limit.
CREATE TABLE users ( country_code CHAR(2) NOT NULL, -- 'IN', 'US', 'GB' email TEXT NOT NULL, -- no length needed username VARCHAR(50) NOT NULL, -- explicit cap bio TEXT -- unlimited, nullable ); -- char pads with spaces — be careful with comparisons SELECT 'IN'::CHAR(5) = 'IN '; -- → true (padded) SELECT 'IN'::TEXT = 'IN '; -- → false
Storage
PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to transparently compress and store large text values out-of-line. You don't manage this — it just happens.
Common String Operators
-- Concatenation SELECT 'Hello' || ' ' || 'World'; -- Hello World SELECT CONCAT('Hello', ' ', 'World'); -- Hello World -- Case SELECT UPPER('hello'); -- HELLO SELECT LOWER('HELLO'); -- hello -- Search SELECT email FROM users WHERE email LIKE '%@gmail.com'; -- ends with @gmail.com SELECT email FROM users WHERE email ILIKE '%@GMAIL.COM'; -- case-insensitive LIKE -- Length, trim SELECT LENGTH('hello'); -- 5 SELECT TRIM(' hello '); -- 'hello' -- Substring SELECT SUBSTRING('PostgreSQL', 1, 4); -- 'Post'
Indexing Text
Standard BTree indexes work well for exact match and prefix search. For suffix/contains search, use special index types.
-- BTree: fast for equality and prefix CREATE INDEX idx_users_email ON users(email); SELECT * FROM users WHERE email = 'alice@example.com'; -- uses index SELECT * FROM users WHERE email LIKE 'alice%'; -- uses index SELECT * FROM users WHERE email LIKE '%gmail%'; -- ❌ full scan -- pg_trgm extension: enables contains/suffix indexing CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE INDEX idx_users_email_trgm ON users USING gin(email gin_trgm_ops); SELECT * FROM users WHERE email LIKE '%gmail%'; -- ✅ now uses index
Date & Time Types
date, time, timestamp, timestamptz, interval
The Five Types
| Type | Stores | Size | Example |
|---|---|---|---|
date |
Year, month, day only | 4 bytes | 2025-01-15 |
time |
Time of day (no date) | 8 bytes | 14:30:00 |
timestamp |
Date + time, no timezone | 8 bytes | 2025-01-15 14:30:00 |
timestamptz |
Date + time with timezone | 8 bytes | 2025-01-15 14:30:00+05:30 |
interval |
A duration (not a point in time) | 16 bytes | 3 days 2 hours |
timestamptz for production systems unless you have a specific reason not to. timestamp stores local time with no timezone context — a recipe for bugs when your app or DB server's timezone changes.
How timestamptz Works
PostgreSQL always stores timestamptz in UTC internally. When you insert with a timezone offset, it converts to UTC. When you query, it converts back to the session's timezone. Zero data loss.
-- Set session timezone SET timezone = 'Asia/Kolkata'; CREATE TABLE orders ( order_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Insert in IST, stored as UTC internally INSERT INTO orders DEFAULT VALUES; -- Query returns in current session timezone (IST) SELECT created_at FROM orders; -- → 2025-01-15 20:00:00+05:30 -- Convert to UTC explicitly SELECT created_at AT TIME ZONE 'UTC' FROM orders; -- → 2025-01-15 14:30:00
Date / Time Operators & Functions
-- Current date and time SELECT NOW(); -- timestamptz of now SELECT CURRENT_DATE; -- just the date SELECT CURRENT_TIME; -- just the time -- Arithmetic with intervals SELECT NOW() + '7 days'::INTERVAL; -- 7 days from now SELECT NOW() - '1 hour'::INTERVAL; -- 1 hour ago -- Extract parts SELECT EXTRACT(YEAR FROM NOW()); -- 2025 SELECT EXTRACT(MONTH FROM NOW()); -- 1 SELECT DATE_PART('hour', NOW()); -- 14 (hour of day) -- Truncate SELECT DATE_TRUNC('month', NOW()); -- first moment of this month SELECT DATE_TRUNC('day', NOW()); -- midnight today -- Diff between two timestamps (→ interval) SELECT '2025-12-31'::DATE - CURRENT_DATE; -- days until new year
Interval — Duration Type
An INTERVAL represents a span of time, not a point. Useful for calculating deadlines, subscription periods, or time-to-live values.
CREATE TABLE subscriptions ( user_id INTEGER, started_at TIMESTAMPTZ, duration INTERVAL -- '1 month', '1 year', '14 days' ); -- Computed expiry SELECT user_id, started_at + duration AS expires_at FROM subscriptions WHERE started_at + duration > NOW(); -- active subs -- Interval literals SELECT '30 days'::INTERVAL; SELECT '2 years 3 months 5 days'::INTERVAL;
Indexing Date Columns
-- Always index created_at / updated_at on high-volume tables CREATE INDEX idx_orders_created ON orders(created_at); -- Range queries use the index efficiently SELECT * FROM orders WHERE created_at >= DATE_TRUNC('month', NOW()); -- orders this month -- ⚠️ Wrapping in a function defeats the index SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2025; -- ❌ full scan -- Fix: use a range instead SELECT * FROM orders WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31 23:59:59'; -- ✅
Boolean
True / False — and the sneaky NULL
Boolean Basics
The BOOLEAN type stores TRUE, FALSE, or NULL. PostgreSQL accepts many literals for true/false.
CREATE TABLE products ( id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, is_active BOOLEAN NOT NULL DEFAULT TRUE, is_deleted BOOLEAN NOT NULL DEFAULT FALSE, is_featured BOOLEAN -- nullable: unknown / unset ); -- Accepted literals for TRUE SELECT TRUE, 'true'::BOOLEAN, 'yes'::BOOLEAN, 'on'::BOOLEAN, '1'::BOOLEAN; -- Querying booleans SELECT * FROM products WHERE is_active; -- WHERE is_active = TRUE SELECT * FROM products WHERE NOT is_deleted; -- active products -- NULL boolean needs IS NULL / IS NOT NULL SELECT * FROM products WHERE is_featured IS NULL; -- not yet set
NULL AND TRUE → NULL. Never use = NULL to check for null — always use IS NULL.
Real-World Pattern: Soft Deletes with Boolean
-- OLTP soft delete pattern (from Topic 2) ALTER TABLE products ADD COLUMN is_deleted BOOLEAN NOT NULL DEFAULT FALSE, ADD COLUMN deleted_at TIMESTAMPTZ; -- Soft delete UPDATE products SET is_deleted = TRUE, deleted_at = NOW() WHERE id = 42; -- Partial index: only index active products CREATE INDEX idx_active_products ON products(id) WHERE is_deleted = FALSE; -- tiny, fast index!
UUID
Universally Unique Identifiers for distributed systems
What is UUID?
A UUID is a 128-bit identifier, stored as 16 bytes internally, displayed as 550e8400-e29b-41d4-a716-446655440000. PostgreSQL has a native UUID type — store it as UUID, not TEXT.
Generating UUIDs
-- Enable uuid-ossp or pgcrypto extension CREATE EXTENSION IF NOT EXISTS "pgcrypto"; CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- v4 email TEXT NOT NULL UNIQUE, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); INSERT INTO users(email) VALUES ('alice@example.com'); SELECT * FROM users; -- id → 'a3f1b2c4-...' (auto-generated) -- Manual UUID SELECT gen_random_uuid();
UUID vs BIGINT — When to Use Each
| Criterion | BIGINT IDENTITY | UUID |
|---|---|---|
| Storage | 8 bytes | 16 bytes |
| Sequential inserts | Excellent (B-tree friendly) | Random → index bloat |
| Distributed / microservices | Needs central sequence | Generate anywhere safely |
| Exposing in URLs/APIs | Leaks row count | Opaque, safe |
| Join performance | Faster | Slightly slower |
BIGINT GENERATED ALWAYS AS IDENTITY as the internal primary key (for performance), plus a UUID column as the public-facing identifier. Best of both worlds.
CREATE TABLE users ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, public_id UUID NOT NULL UNIQUE DEFAULT gen_random_uuid(), email TEXT NOT NULL UNIQUE ); -- Internal joins use BIGINT id (fast) -- External API returns public_id (safe, opaque)
JSON & JSONB
Flexible semi-structured data — storage, operators, indexing
JSON vs JSONB — Which to Use?
| Feature | JSON | JSONB |
|---|---|---|
| Storage | Exact text copy | Decomposed binary |
| Duplicate keys | Kept (last wins on read) | Removed |
| Key order | Preserved | Not preserved |
| Indexing | Not supported | GIN, BTREE |
| Query speed | Slower (re-parse each time) | Faster |
| Write speed | Slightly faster | Slightly slower (parse on write) |
JSONB. The only reason to use JSON is if you need exact byte-for-byte round-trip preservation (very rare). JSONB is what every production system uses.
JSONB Operators
CREATE TABLE products ( id INTEGER PRIMARY KEY, meta JSONB ); INSERT INTO products VALUES (1, '{"name":"iPhone","price":999,"brand":"Apple"}'), (2, '{"name":"Pixel","price":799,"brand":"Google"}'); -- -> returns JSONB value SELECT meta->'name' FROM products; -- → "iPhone" (quoted) -- ->> returns TEXT value (unquoted) SELECT meta->>'name' FROM products; -- → iPhone -- Filter by JSON field SELECT * FROM products WHERE (meta->>'brand') = 'Apple';
INSERT INTO products VALUES (3, '{ "name": "MacBook", "specs": { "ram": 16, "storage": {"size": 512, "unit": "GB"} }, "tags": ["laptop","apple","m2"] }'); -- #> returns nested JSONB SELECT meta #> '{specs,storage}' FROM products WHERE id=3; -- → {"size": 512, "unit": "GB"} -- #>> returns nested TEXT SELECT meta #>> '{specs,storage,size}' FROM products WHERE id=3; -- → 512 -- Array element access (0-indexed) SELECT meta -> 'tags' -> 0 FROM products WHERE id=3; -- → "laptop"
-- ? key exists SELECT * FROM products WHERE meta ? 'specs'; -- has 'specs' key -- ?| any of these keys exist SELECT * FROM products WHERE meta ?| ARRAY['specs','brand']; -- ?& all keys must exist SELECT * FROM products WHERE meta ?& ARRAY['name','price']; -- @> contains (subset check) SELECT * FROM products WHERE meta @> '{"brand":"Apple"}'::JSONB; -- brand = Apple
Indexing JSONB
-- GIN index: makes all @> and ? operators fast CREATE INDEX idx_products_meta ON products USING gin(meta); -- Expression index: index a single JSON field CREATE INDEX idx_products_brand ON products((meta->>'brand')); -- Now this query uses the index SELECT * FROM products WHERE meta->>'brand' = 'Apple';
Modifying JSONB
-- Add / update a key UPDATE products SET meta = meta || '{"discount": 0.1}'::JSONB WHERE id = 1; -- Remove a key UPDATE products SET meta = meta - 'discount' WHERE id = 1; -- jsonb_set: set nested path UPDATE products SET meta = jsonb_set(meta, '{specs,ram}', '32') WHERE id = 3;
Arrays
Native 1D & multi-dimensional arrays — types, operators, indexing
Array Types
Any PostgreSQL type can become an array by appending []. Arrays are useful for tags, permissions, phone numbers — small, bounded, variable-length lists.
CREATE TABLE articles ( id INTEGER PRIMARY KEY, title TEXT NOT NULL, tags TEXT[], -- array of text tags related_ids INTEGER[] -- array of related article IDs ); INSERT INTO articles VALUES (1, 'PostgreSQL Arrays', ARRAY['postgres','database','sql'], ARRAY[2,3]), (2, 'JSONB Deep Dive', ARRAY['postgres','json'], ARRAY[1]), (3, 'Indexing Strategies',ARRAY['indexes','performance'], ARRAY[1,2]);
Array Operators
-- Element access (1-indexed!) SELECT tags[1] FROM articles WHERE id = 1; -- 'postgres' -- Length SELECT array_length(tags, 1) FROM articles; -- dimension 1 length -- @> contains (left contains all elements of right) SELECT * FROM articles WHERE tags @> ARRAY['postgres']; -- articles 1 and 2 (both contain 'postgres') -- && overlap (any element in common) SELECT * FROM articles WHERE tags && ARRAY['json', 'performance']; -- articles 2 and 3 -- ANY / ALL SELECT * FROM articles WHERE 'postgres' = ANY(tags); SELECT * FROM articles WHERE 'public' = ALL(tags); -- all elements match -- Append / prepend / concatenate UPDATE articles SET tags = tags || ARRAY['featured'] WHERE id = 1; -- Unnest: expand array to rows SELECT id, UNNEST(tags) AS tag FROM articles; -- Returns one row per tag per article
Indexing Arrays
-- GIN index on array column — makes @> and && fast CREATE INDEX idx_articles_tags ON articles USING gin(tags); -- Now fast even on millions of rows SELECT * FROM articles WHERE tags @> ARRAY['postgres'];
ENUM
Type-safe, storage-efficient fixed value sets
Creating ENUMs
An ENUM is a custom type with a fixed ordered list of labels. Internally stored as an integer — very compact and fast to compare.
-- Define enum types CREATE TYPE order_status AS ENUM ( 'pending', 'confirmed', 'processing', 'shipped', 'delivered', 'cancelled' ); CREATE TYPE user_role AS ENUM ( 'guest', 'user', 'moderator', 'admin', 'superadmin' ); CREATE TABLE orders ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, status order_status NOT NULL DEFAULT 'pending', created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Insert INSERT INTO orders(status) VALUES ('confirmed'); -- Fails! 'lost' is not a valid enum value INSERT INTO orders(status) VALUES ('lost'); -- ❌ ERROR
Enum Ordering
Enums are ordered by their definition sequence — useful for ORDER BY and range comparisons.
-- Ordering uses the defined sequence SELECT * FROM orders ORDER BY status; -- → pending, confirmed, processing, shipped, delivered, cancelled -- Range comparison SELECT * FROM orders WHERE status > 'confirmed'; -- processing, shipped, delivered, cancelled -- Inspect all enum values SELECT unnest(enum_range(NULL::order_status)); -- → pending, confirmed, processing, ..., cancelled
Modifying ENUMs
You can add values (but not remove or reorder without recreation). Adding is safe and doesn't lock the table.
-- Add a new value ALTER TYPE order_status ADD VALUE 'refunded' AFTER 'delivered'; -- Rename a value (PG 10+) ALTER TYPE order_status RENAME VALUE 'cancelled' TO 'canceled'; -- ❌ CANNOT remove a value — must drop & recreate type -- Plan your enum values carefully upfront!
Quick Reference — Type Chooser
Which type should I use? All decisions in one place.
Data Type Decision Table
| Scenario | Recommended Type | Why |
|---|---|---|
| Internal row ID | BIGINT GENERATED ALWAYS AS IDENTITY | Sequential, small, fast joins |
| Public-facing ID (API) | UUID | Opaque, no row count leak |
| Money / prices | NUMERIC(12,2) | Exact; never use FLOAT |
| 1–5 rating, status code | SMALLINT | Tiny storage |
| Names, emails, content | TEXT | Flexible, same perf as VARCHAR |
| Fixed-length codes (ISO) | CHAR(2) | Truly fixed length |
| Date only (birthday) | DATE | No time needed |
| Event timestamp | TIMESTAMPTZ | Timezone-aware, stored as UTC |
| Duration / TTL | INTERVAL | Arithmetic with timestamps |
| On/off flag | BOOLEAN | Explicit, readable |
| Semi-structured data | JSONB | Indexable, queryable |
| Tags, permissions list | TEXT[] | GIN-indexable, easy operators |
| Finite state values | ENUM | Type-safe, ordered |
| ML vector / embeddings | DOUBLE PRECISION[] or vector (pgvector) | Float precision needed |