3.1

Numeric Types

Integers, decimals, and floating-point numbers

Integer Types

PostgreSQL offers three integer sizes. Pick the smallest one that fits your data range — smaller types mean faster scans and less storage.

Type Storage Min Value Max Value Best Use
smallint 2 bytes -32,768 32,767 Status codes, small counters
integer / int 4 bytes -2.1 billion 2.1 billion Most IDs, counts
bigint 8 bytes -9.2 quintillion 9.2 quintillion User IDs, large counters, timestamps in ms
💡
Rule of thumb: Default to integer for most IDs. Switch to bigint when a table grows beyond 2 billion rows, or when you store Twitter-style snowflake IDs / epoch milliseconds.
-- Table showing when to use each integer size
CREATE TABLE demo_integers (
  rating       SMALLINT,        -- e.g. 1-5 star rating
  product_id   INTEGER,         -- typical surrogate key
  event_id     BIGINT           -- high-volume event log
);

-- Integer arithmetic
SELECT 10::INTEGER / 3;          -- → 3  (integer division!)
SELECT 10::NUMERIC / 3;          -- → 3.333… (use NUMERIC for exact fractions)

Exact Decimals — NUMERIC / DECIMAL

NUMERIC(precision, scale) stores exact decimal values. It is mandatory for money, tax, scientific measurements — anything where rounding errors are unacceptable.

precision
Total significant digits.
e.g. NUMERIC(10,2) → up to 10 digits
scale
Digits after decimal point.
e.g. NUMERIC(10,2) → 2 decimal places
CREATE TABLE payments (
  amount       NUMERIC(12, 2),   -- e.g. 9999999999.99
  tax_rate     NUMERIC(5, 4)    -- e.g. 0.1875 (18.75%)
);

INSERT INTO payments VALUES (1999.99, 0.1800);

-- NEVER use FLOAT for money!
SELECT 0.1::FLOAT + 0.2::FLOAT;     -- → 0.30000000000000004  ❌
SELECT 0.1::NUMERIC + 0.2::NUMERIC; -- → 0.3  ✅

Floating-Point — REAL & DOUBLE PRECISION

These are IEEE 754 floating-point numbers. Fast but inexact. Use for scientific calculations, ML feature scores, or statistics where tiny rounding errors don't matter.

TypeStoragePrecisionUse When
real 4 bytes ~6 decimal digits Sensor data, low-precision scores
double precision 8 bytes ~15 decimal digits Scientific data, ML embeddings, coordinates
CREATE TABLE sensor_readings (
  temperature  REAL,              -- 24.7°C (close enough)
  ml_score     DOUBLE PRECISION   -- 0.9237481920...
);

-- Special float values
SELECT 'Infinity'::FLOAT;
SELECT 'NaN'::FLOAT;            -- Not a Number

Interactive: Pick the Right Numeric Type

🧠 What are you storing?
← Click a scenario to see the recommended type
3.2

Character Types

char, varchar, and text — and why text wins

Overview

PostgreSQL has three text storage types. In practice, text is almost always the best choice — but knowing the differences matters for legacy code and interviews.

TypeLengthPadding?Use
char(n) Exactly n chars Yes – space padded Fixed codes (country codes 'IN', 'US')
varchar(n) Up to n chars No When you want an enforced max length
text Unlimited No Everything else — emails, names, content
PostgreSQL tip: Internally varchar and text use the exact same storage. The only difference is the optional length constraint. Use text by default and add a CHECK constraint if you need a length limit.
CREATE TABLE users (
  country_code  CHAR(2)          NOT NULL,  -- 'IN', 'US', 'GB'
  email         TEXT             NOT NULL,  -- no length needed
  username      VARCHAR(50)      NOT NULL,  -- explicit cap
  bio           TEXT                        -- unlimited, nullable
);

-- char pads with spaces — be careful with comparisons
SELECT 'IN'::CHAR(5) = 'IN   ';  -- → true (padded)
SELECT 'IN'::TEXT   = 'IN   ';  -- → false

Storage

PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to transparently compress and store large text values out-of-line. You don't manage this — it just happens.

📦
TOAST threshold: Values > ~2 KB are automatically compressed or stored out-of-line. Querying small columns on wide-text tables is still fast because TOAST only fetches what you SELECT.

Common String Operators

-- Concatenation
SELECT 'Hello' || ' ' || 'World';           -- Hello World
SELECT CONCAT('Hello', ' ', 'World');      -- Hello World

-- Case
SELECT UPPER('hello');                    -- HELLO
SELECT LOWER('HELLO');                    -- hello

-- Search
SELECT email FROM users
  WHERE email LIKE '%@gmail.com';         -- ends with @gmail.com

SELECT email FROM users
  WHERE email ILIKE '%@GMAIL.COM';       -- case-insensitive LIKE

-- Length, trim
SELECT LENGTH('hello');                   -- 5
SELECT TRIM('  hello  ');               -- 'hello'

-- Substring
SELECT SUBSTRING('PostgreSQL', 1, 4);    -- 'Post'

Indexing Text

Standard BTree indexes work well for exact match and prefix search. For suffix/contains search, use special index types.

-- BTree: fast for equality and prefix
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'alice@example.com'; -- uses index
SELECT * FROM users WHERE email LIKE 'alice%';         -- uses index
SELECT * FROM users WHERE email LIKE '%gmail%';       -- ❌ full scan

-- pg_trgm extension: enables contains/suffix indexing
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_email_trgm ON users
  USING gin(email gin_trgm_ops);
SELECT * FROM users WHERE email LIKE '%gmail%';       -- ✅ now uses index
3.3

Date & Time Types

date, time, timestamp, timestamptz, interval

The Five Types

TypeStoresSizeExample
date Year, month, day only 4 bytes 2025-01-15
time Time of day (no date) 8 bytes 14:30:00
timestamp Date + time, no timezone 8 bytes 2025-01-15 14:30:00
timestamptz Date + time with timezone 8 bytes 2025-01-15 14:30:00+05:30
interval A duration (not a point in time) 16 bytes 3 days 2 hours
⚠️
Always use timestamptz for production systems unless you have a specific reason not to. timestamp stores local time with no timezone context — a recipe for bugs when your app or DB server's timezone changes.

How timestamptz Works

PostgreSQL always stores timestamptz in UTC internally. When you insert with a timezone offset, it converts to UTC. When you query, it converts back to the session's timezone. Zero data loss.

-- Set session timezone
SET timezone = 'Asia/Kolkata';

CREATE TABLE orders (
  order_id    BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Insert in IST, stored as UTC internally
INSERT INTO orders DEFAULT VALUES;

-- Query returns in current session timezone (IST)
SELECT created_at FROM orders;
-- → 2025-01-15 20:00:00+05:30

-- Convert to UTC explicitly
SELECT created_at AT TIME ZONE 'UTC' FROM orders;
-- → 2025-01-15 14:30:00

Date / Time Operators & Functions

-- Current date and time
SELECT NOW();              -- timestamptz of now
SELECT CURRENT_DATE;      -- just the date
SELECT CURRENT_TIME;      -- just the time

-- Arithmetic with intervals
SELECT NOW() + '7 days'::INTERVAL;        -- 7 days from now
SELECT NOW() - '1 hour'::INTERVAL;        -- 1 hour ago

-- Extract parts
SELECT EXTRACT(YEAR  FROM NOW());          -- 2025
SELECT EXTRACT(MONTH FROM NOW());          -- 1
SELECT DATE_PART('hour', NOW());           -- 14 (hour of day)

-- Truncate
SELECT DATE_TRUNC('month', NOW());         -- first moment of this month
SELECT DATE_TRUNC('day',   NOW());         -- midnight today

-- Diff between two timestamps (→ interval)
SELECT '2025-12-31'::DATE - CURRENT_DATE;  -- days until new year

Interval — Duration Type

An INTERVAL represents a span of time, not a point. Useful for calculating deadlines, subscription periods, or time-to-live values.

CREATE TABLE subscriptions (
  user_id    INTEGER,
  started_at TIMESTAMPTZ,
  duration   INTERVAL           -- '1 month', '1 year', '14 days'
);

-- Computed expiry
SELECT user_id,
       started_at + duration AS expires_at
FROM   subscriptions
WHERE  started_at + duration > NOW();   -- active subs

-- Interval literals
SELECT '30 days'::INTERVAL;
SELECT '2 years 3 months 5 days'::INTERVAL;

Indexing Date Columns

-- Always index created_at / updated_at on high-volume tables
CREATE INDEX idx_orders_created ON orders(created_at);

-- Range queries use the index efficiently
SELECT * FROM orders
WHERE created_at >= DATE_TRUNC('month', NOW());  -- orders this month

-- ⚠️ Wrapping in a function defeats the index
SELECT * FROM orders
WHERE EXTRACT(YEAR FROM created_at) = 2025;    -- ❌ full scan

-- Fix: use a range instead
SELECT * FROM orders
WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31 23:59:59'; -- ✅
3.4

Boolean

True / False — and the sneaky NULL

Boolean Basics

The BOOLEAN type stores TRUE, FALSE, or NULL. PostgreSQL accepts many literals for true/false.

CREATE TABLE products (
  id          INTEGER    GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  is_active   BOOLEAN    NOT NULL DEFAULT TRUE,
  is_deleted  BOOLEAN    NOT NULL DEFAULT FALSE,
  is_featured BOOLEAN              -- nullable: unknown / unset
);

-- Accepted literals for TRUE
SELECT TRUE, 'true'::BOOLEAN, 'yes'::BOOLEAN, 'on'::BOOLEAN, '1'::BOOLEAN;

-- Querying booleans
SELECT * FROM products WHERE is_active;          -- WHERE is_active = TRUE
SELECT * FROM products WHERE NOT is_deleted;     -- active products

-- NULL boolean needs IS NULL / IS NOT NULL
SELECT * FROM products WHERE is_featured IS NULL; -- not yet set
⚠️
Three-valued logic: NULL AND TRUE → NULL. Never use = NULL to check for null — always use IS NULL.

Real-World Pattern: Soft Deletes with Boolean

-- OLTP soft delete pattern (from Topic 2)
ALTER TABLE products
  ADD COLUMN is_deleted  BOOLEAN     NOT NULL DEFAULT FALSE,
  ADD COLUMN deleted_at  TIMESTAMPTZ;

-- Soft delete
UPDATE products
SET    is_deleted = TRUE, deleted_at = NOW()
WHERE  id = 42;

-- Partial index: only index active products
CREATE INDEX idx_active_products
  ON products(id)
  WHERE is_deleted = FALSE;   -- tiny, fast index!
3.5

UUID

Universally Unique Identifiers for distributed systems

What is UUID?

A UUID is a 128-bit identifier, stored as 16 bytes internally, displayed as 550e8400-e29b-41d4-a716-446655440000. PostgreSQL has a native UUID type — store it as UUID, not TEXT.

UUID as TEXT
36 bytes per value. No native UUID operations. Slow comparisons.
UUID as UUID type
16 bytes per value. Efficient indexing. Proper UUID operators.

Generating UUIDs

-- Enable uuid-ossp or pgcrypto extension
CREATE EXTENSION IF NOT EXISTS "pgcrypto";

CREATE TABLE users (
  id         UUID        PRIMARY KEY DEFAULT gen_random_uuid(), -- v4
  email      TEXT        NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

INSERT INTO users(email) VALUES ('alice@example.com');
SELECT * FROM users;
-- id → 'a3f1b2c4-...'  (auto-generated)

-- Manual UUID
SELECT gen_random_uuid();

UUID vs BIGINT — When to Use Each

CriterionBIGINT IDENTITYUUID
Storage 8 bytes 16 bytes
Sequential inserts Excellent (B-tree friendly) Random → index bloat
Distributed / microservices Needs central sequence Generate anywhere safely
Exposing in URLs/APIs Leaks row count Opaque, safe
Join performance Faster Slightly slower
🔑
Modern pattern: Use BIGINT GENERATED ALWAYS AS IDENTITY as the internal primary key (for performance), plus a UUID column as the public-facing identifier. Best of both worlds.
CREATE TABLE users (
  id         BIGINT      GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  public_id  UUID        NOT NULL UNIQUE DEFAULT gen_random_uuid(),
  email      TEXT        NOT NULL UNIQUE
);

-- Internal joins use BIGINT id (fast)
-- External API returns public_id (safe, opaque)
3.6

JSON & JSONB

Flexible semi-structured data — storage, operators, indexing

JSON vs JSONB — Which to Use?

FeatureJSONJSONB
Storage Exact text copy Decomposed binary
Duplicate keys Kept (last wins on read) Removed
Key order Preserved Not preserved
Indexing Not supported GIN, BTREE
Query speed Slower (re-parse each time) Faster
Write speed Slightly faster Slightly slower (parse on write)
Always use JSONB. The only reason to use JSON is if you need exact byte-for-byte round-trip preservation (very rare). JSONB is what every production system uses.

JSONB Operators

CREATE TABLE products (
  id    INTEGER PRIMARY KEY,
  meta  JSONB
);

INSERT INTO products VALUES
  (1, '{"name":"iPhone","price":999,"brand":"Apple"}'),
  (2, '{"name":"Pixel","price":799,"brand":"Google"}');

-- -> returns JSONB value
SELECT meta->'name'   FROM products;  -- → "iPhone" (quoted)

-- ->> returns TEXT value (unquoted)
SELECT meta->>'name'  FROM products;  -- → iPhone

-- Filter by JSON field
SELECT * FROM products
WHERE  (meta->>'brand') = 'Apple';
INSERT INTO products VALUES (3, '{
  "name": "MacBook",
  "specs": {
    "ram": 16,
    "storage": {"size": 512, "unit": "GB"}
  },
  "tags": ["laptop","apple","m2"]
}');

-- #> returns nested JSONB
SELECT meta #> '{specs,storage}'     FROM products WHERE id=3;
-- → {"size": 512, "unit": "GB"}

-- #>> returns nested TEXT
SELECT meta #>> '{specs,storage,size}' FROM products WHERE id=3;
-- → 512

-- Array element access (0-indexed)
SELECT meta -> 'tags' -> 0            FROM products WHERE id=3;
-- → "laptop"
-- ? key exists
SELECT * FROM products WHERE meta ? 'specs';       -- has 'specs' key

-- ?| any of these keys exist
SELECT * FROM products WHERE meta ?| ARRAY['specs','brand'];

-- ?& all keys must exist
SELECT * FROM products WHERE meta ?& ARRAY['name','price'];

-- @> contains (subset check)
SELECT * FROM products
WHERE  meta @> '{"brand":"Apple"}'::JSONB;        -- brand = Apple

Indexing JSONB

-- GIN index: makes all @> and ? operators fast
CREATE INDEX idx_products_meta ON products USING gin(meta);

-- Expression index: index a single JSON field
CREATE INDEX idx_products_brand
  ON products((meta->>'brand'));

-- Now this query uses the index
SELECT * FROM products WHERE meta->>'brand' = 'Apple';

Modifying JSONB

-- Add / update a key
UPDATE products
SET    meta = meta || '{"discount": 0.1}'::JSONB
WHERE  id = 1;

-- Remove a key
UPDATE products
SET    meta = meta - 'discount'
WHERE  id = 1;

-- jsonb_set: set nested path
UPDATE products
SET    meta = jsonb_set(meta, '{specs,ram}', '32')
WHERE  id = 3;
3.7

Arrays

Native 1D & multi-dimensional arrays — types, operators, indexing

Array Types

Any PostgreSQL type can become an array by appending []. Arrays are useful for tags, permissions, phone numbers — small, bounded, variable-length lists.

INTEGER[] TEXT[] UUID[] BOOLEAN[] TIMESTAMPTZ[]
CREATE TABLE articles (
  id         INTEGER    PRIMARY KEY,
  title      TEXT       NOT NULL,
  tags       TEXT[],             -- array of text tags
  related_ids INTEGER[]          -- array of related article IDs
);

INSERT INTO articles VALUES
  (1, 'PostgreSQL Arrays',  ARRAY['postgres','database','sql'], ARRAY[2,3]),
  (2, 'JSONB Deep Dive',    ARRAY['postgres','json'],           ARRAY[1]),
  (3, 'Indexing Strategies',ARRAY['indexes','performance'],       ARRAY[1,2]);

Array Operators

-- Element access (1-indexed!)
SELECT tags[1] FROM articles WHERE id = 1;  -- 'postgres'

-- Length
SELECT array_length(tags, 1) FROM articles;  -- dimension 1 length

-- @> contains (left contains all elements of right)
SELECT * FROM articles WHERE tags @> ARRAY['postgres'];
-- articles 1 and 2 (both contain 'postgres')

-- && overlap (any element in common)
SELECT * FROM articles
WHERE  tags && ARRAY['json', 'performance'];  -- articles 2 and 3

-- ANY / ALL
SELECT * FROM articles WHERE 'postgres' = ANY(tags);
SELECT * FROM articles WHERE 'public'   = ALL(tags);  -- all elements match

-- Append / prepend / concatenate
UPDATE articles
SET    tags = tags || ARRAY['featured']
WHERE  id = 1;

-- Unnest: expand array to rows
SELECT id, UNNEST(tags) AS tag FROM articles;
-- Returns one row per tag per article

Indexing Arrays

-- GIN index on array column — makes @> and && fast
CREATE INDEX idx_articles_tags ON articles USING gin(tags);

-- Now fast even on millions of rows
SELECT * FROM articles WHERE tags @> ARRAY['postgres'];
⚠️
Arrays vs JOIN tables: If your array grows large (hundreds of items) or you need to query by the array elements heavily with ordering, consider a separate join table instead. Arrays shine for small, bounded lists (5–20 items).
3.8

ENUM

Type-safe, storage-efficient fixed value sets

Creating ENUMs

An ENUM is a custom type with a fixed ordered list of labels. Internally stored as an integer — very compact and fast to compare.

-- Define enum types
CREATE TYPE order_status AS ENUM (
  'pending', 'confirmed', 'processing', 'shipped', 'delivered', 'cancelled'
);

CREATE TYPE user_role AS ENUM (
  'guest', 'user', 'moderator', 'admin', 'superadmin'
);

CREATE TABLE orders (
  id         BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  status     order_status NOT NULL DEFAULT 'pending',
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Insert
INSERT INTO orders(status) VALUES ('confirmed');

-- Fails! 'lost' is not a valid enum value
INSERT INTO orders(status) VALUES ('lost');  -- ❌ ERROR

Enum Ordering

Enums are ordered by their definition sequence — useful for ORDER BY and range comparisons.

-- Ordering uses the defined sequence
SELECT * FROM orders ORDER BY status;
-- → pending, confirmed, processing, shipped, delivered, cancelled

-- Range comparison
SELECT * FROM orders
WHERE  status > 'confirmed';       -- processing, shipped, delivered, cancelled

-- Inspect all enum values
SELECT unnest(enum_range(NULL::order_status));
-- → pending, confirmed, processing, ..., cancelled

Modifying ENUMs

You can add values (but not remove or reorder without recreation). Adding is safe and doesn't lock the table.

-- Add a new value
ALTER TYPE order_status
  ADD VALUE 'refunded' AFTER 'delivered';

-- Rename a value (PG 10+)
ALTER TYPE order_status
  RENAME VALUE 'cancelled' TO 'canceled';

-- ❌ CANNOT remove a value — must drop & recreate type
-- Plan your enum values carefully upfront!
💡
ENUM vs TEXT + CHECK constraint: ENUMs are slightly faster and use less storage. TEXT + CHECK is more flexible (you can add/remove values easily). For stable, ordered states like order status — use ENUM. For evolving categories — use TEXT + CHECK or a lookup table.
🗒

Quick Reference — Type Chooser

Which type should I use? All decisions in one place.

Data Type Decision Table

ScenarioRecommended TypeWhy
Internal row IDBIGINT GENERATED ALWAYS AS IDENTITYSequential, small, fast joins
Public-facing ID (API)UUIDOpaque, no row count leak
Money / pricesNUMERIC(12,2)Exact; never use FLOAT
1–5 rating, status codeSMALLINTTiny storage
Names, emails, contentTEXTFlexible, same perf as VARCHAR
Fixed-length codes (ISO)CHAR(2)Truly fixed length
Date only (birthday)DATENo time needed
Event timestampTIMESTAMPTZTimezone-aware, stored as UTC
Duration / TTLINTERVALArithmetic with timestamps
On/off flagBOOLEANExplicit, readable
Semi-structured dataJSONBIndexable, queryable
Tags, permissions listTEXT[]GIN-indexable, easy operators
Finite state valuesENUMType-safe, ordered
ML vector / embeddingsDOUBLE PRECISION[] or vector (pgvector)Float precision needed

Quick Quiz

Q1: You're building a Swiggy-style app. What type should the amount column in the payments table be?
Q2: Your user table will have 5 billion rows across multiple microservices that need to generate IDs independently. Which primary key type is best?
Q3: You want to store an article's tags like ['postgres', 'indexing'] and query "find all articles tagged postgres". What type + index?