PostgreSQL — Topic 3: Data Types

3.1

Numeric Types

Integers, decimals, and floating-point numbers

▾

Integer Types

PostgreSQL offers three integer sizes. Pick the smallest one that fits your data range — smaller types mean faster scans and less storage.

Type	Storage	Min Value	Max Value	Best Use
`smallint`	2 bytes	-32,768	32,767	Status codes, small counters
`integer` / `int`	4 bytes	-2.1 billion	2.1 billion	Most IDs, counts
`bigint`	8 bytes	-9.2 quintillion	9.2 quintillion	User IDs, large counters, timestamps in ms

💡

Rule of thumb: Default to integer for most IDs. Switch to bigint when a table grows beyond 2 billion rows, or when you store Twitter-style snowflake IDs / epoch milliseconds.

-- Table showing when to use each integer size
CREATE TABLE demo_integers (
  rating       SMALLINT,        -- e.g. 1-5 star rating
  product_id   INTEGER,         -- typical surrogate key
  event_id     BIGINT           -- high-volume event log
);

-- Integer arithmetic
SELECT 10::INTEGER / 3;          -- → 3  (integer division!)
SELECT 10::NUMERIC / 3;          -- → 3.333… (use NUMERIC for exact fractions)

Exact Decimals — NUMERIC / DECIMAL

NUMERIC(precision, scale) stores exact decimal values. It is mandatory for money, tax, scientific measurements — anything where rounding errors are unacceptable.

precision

Total significant digits.
e.g. NUMERIC(10,2) → up to 10 digits

scale

Digits after decimal point.
e.g. NUMERIC(10,2) → 2 decimal places

CREATE TABLE payments (
  amount       NUMERIC(12, 2),   -- e.g. 9999999999.99
  tax_rate     NUMERIC(5, 4)    -- e.g. 0.1875 (18.75%)
);

INSERT INTO payments VALUES (1999.99, 0.1800);

-- NEVER use FLOAT for money!
SELECT 0.1::FLOAT + 0.2::FLOAT;     -- → 0.30000000000000004  ❌
SELECT 0.1::NUMERIC + 0.2::NUMERIC; -- → 0.3  ✅

Floating-Point — REAL & DOUBLE PRECISION

These are IEEE 754 floating-point numbers. Fast but inexact. Use for scientific calculations, ML feature scores, or statistics where tiny rounding errors don't matter.

Type	Storage	Precision	Use When
`real`	4 bytes	~6 decimal digits	Sensor data, low-precision scores
`double precision`	8 bytes	~15 decimal digits	Scientific data, ML embeddings, coordinates

CREATE TABLE sensor_readings (
  temperature  REAL,              -- 24.7°C (close enough)
  ml_score     DOUBLE PRECISION   -- 0.9237481920...
);

-- Special float values
SELECT 'Infinity'::FLOAT;
SELECT 'NaN'::FLOAT;            -- Not a Number

Interactive: Pick the Right Numeric Type

🧠 What are you storing?

← Click a scenario to see the recommended type

3.2

Character Types

char, varchar, and text — and why text wins

▾

Overview

PostgreSQL has three text storage types. In practice, text is almost always the best choice — but knowing the differences matters for legacy code and interviews.

Type	Length	Padding?	Use
`char(n)`	Exactly n chars	Yes – space padded	Fixed codes (country codes 'IN', 'US')
`varchar(n)`	Up to n chars	No	When you want an enforced max length
`text`	Unlimited	No	Everything else — emails, names, content

✅

PostgreSQL tip: Internally varchar and text use the exact same storage. The only difference is the optional length constraint. Use text by default and add a CHECK constraint if you need a length limit.

CREATE TABLE users (
  country_code  CHAR(2)          NOT NULL,  -- 'IN', 'US', 'GB'
  email         TEXT             NOT NULL,  -- no length needed
  username      VARCHAR(50)      NOT NULL,  -- explicit cap
  bio           TEXT                        -- unlimited, nullable
);

-- char pads with spaces — be careful with comparisons
SELECT 'IN'::CHAR(5) = 'IN   ';  -- → true (padded)
SELECT 'IN'::TEXT   = 'IN   ';  -- → false

Storage

PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to transparently compress and store large text values out-of-line. You don't manage this — it just happens.

📦

TOAST threshold: Values > ~2 KB are automatically compressed or stored out-of-line. Querying small columns on wide-text tables is still fast because TOAST only fetches what you SELECT.

Common String Operators

-- Concatenation
SELECT 'Hello' || ' ' || 'World';           -- Hello World
SELECT CONCAT('Hello', ' ', 'World');      -- Hello World

-- Case
SELECT UPPER('hello');                    -- HELLO
SELECT LOWER('HELLO');                    -- hello

-- Search
SELECT email FROM users
  WHERE email LIKE '%@gmail.com';         -- ends with @gmail.com

SELECT email FROM users
  WHERE email ILIKE '%@GMAIL.COM';       -- case-insensitive LIKE

-- Length, trim
SELECT LENGTH('hello');                   -- 5
SELECT TRIM('  hello  ');               -- 'hello'

-- Substring
SELECT SUBSTRING('PostgreSQL', 1, 4);    -- 'Post'

Indexing Text

Standard BTree indexes work well for exact match and prefix search. For suffix/contains search, use special index types.

-- BTree: fast for equality and prefix
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'alice@example.com'; -- uses index
SELECT * FROM users WHERE email LIKE 'alice%';         -- uses index
SELECT * FROM users WHERE email LIKE '%gmail%';       -- ❌ full scan

-- pg_trgm extension: enables contains/suffix indexing
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_email_trgm ON users
  USING gin(email gin_trgm_ops);
SELECT * FROM users WHERE email LIKE '%gmail%';       -- ✅ now uses index

3.3

Date & Time Types

date, time, timestamp, timestamptz, interval

▾

The Five Types

Type	Stores	Size	Example
`date`	Year, month, day only	4 bytes	`2025-01-15`
`time`	Time of day (no date)	8 bytes	`14:30:00`
`timestamp`	Date + time, no timezone	8 bytes	`2025-01-15 14:30:00`
`timestamptz`	Date + time with timezone	8 bytes	`2025-01-15 14:30:00+05:30`
`interval`	A duration (not a point in time)	16 bytes	`3 days 2 hours`

⚠️

Always use timestamptz for production systems unless you have a specific reason not to. timestamp stores local time with no timezone context — a recipe for bugs when your app or DB server's timezone changes.

How `timestamptz` Works

PostgreSQL always stores timestamptz in UTC internally. When you insert with a timezone offset, it converts to UTC. When you query, it converts back to the session's timezone. Zero data loss.

-- Set session timezone
SET timezone = 'Asia/Kolkata';

CREATE TABLE orders (
  order_id    BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Insert in IST, stored as UTC internally
INSERT INTO orders DEFAULT VALUES;

-- Query returns in current session timezone (IST)
SELECT created_at FROM orders;
-- → 2025-01-15 20:00:00+05:30

-- Convert to UTC explicitly
SELECT created_at AT TIME ZONE 'UTC' FROM orders;
-- → 2025-01-15 14:30:00

Date / Time Operators & Functions

-- Current date and time
SELECT NOW();              -- timestamptz of now
SELECT CURRENT_DATE;      -- just the date
SELECT CURRENT_TIME;      -- just the time

-- Arithmetic with intervals
SELECT NOW() + '7 days'::INTERVAL;        -- 7 days from now
SELECT NOW() - '1 hour'::INTERVAL;        -- 1 hour ago

-- Extract parts
SELECT EXTRACT(YEAR  FROM NOW());          -- 2025
SELECT EXTRACT(MONTH FROM NOW());          -- 1
SELECT DATE_PART('hour', NOW());           -- 14 (hour of day)

-- Truncate
SELECT DATE_TRUNC('month', NOW());         -- first moment of this month
SELECT DATE_TRUNC('day',   NOW());         -- midnight today

-- Diff between two timestamps (→ interval)
SELECT '2025-12-31'::DATE - CURRENT_DATE;  -- days until new year

Interval — Duration Type

An INTERVAL represents a span of time, not a point. Useful for calculating deadlines, subscription periods, or time-to-live values.

CREATE TABLE subscriptions (
  user_id    INTEGER,
  started_at TIMESTAMPTZ,
  duration   INTERVAL           -- '1 month', '1 year', '14 days'
);

-- Computed expiry
SELECT user_id,
       started_at + duration AS expires_at
FROM   subscriptions
WHERE  started_at + duration > NOW();   -- active subs

-- Interval literals
SELECT '30 days'::INTERVAL;
SELECT '2 years 3 months 5 days'::INTERVAL;

Indexing Date Columns

-- Always index created_at / updated_at on high-volume tables
CREATE INDEX idx_orders_created ON orders(created_at);

-- Range queries use the index efficiently
SELECT * FROM orders
WHERE created_at >= DATE_TRUNC('month', NOW());  -- orders this month

-- ⚠️ Wrapping in a function defeats the index
SELECT * FROM orders
WHERE EXTRACT(YEAR FROM created_at) = 2025;    -- ❌ full scan

-- Fix: use a range instead
SELECT * FROM orders
WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31 23:59:59'; -- ✅

3.4

Boolean

True / False — and the sneaky NULL

▾

Boolean Basics

The BOOLEAN type stores TRUE, FALSE, or NULL. PostgreSQL accepts many literals for true/false.

CREATE TABLE products (
  id          INTEGER    GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  is_active   BOOLEAN    NOT NULL DEFAULT TRUE,
  is_deleted  BOOLEAN    NOT NULL DEFAULT FALSE,
  is_featured BOOLEAN              -- nullable: unknown / unset
);

-- Accepted literals for TRUE
SELECT TRUE, 'true'::BOOLEAN, 'yes'::BOOLEAN, 'on'::BOOLEAN, '1'::BOOLEAN;

-- Querying booleans
SELECT * FROM products WHERE is_active;          -- WHERE is_active = TRUE
SELECT * FROM products WHERE NOT is_deleted;     -- active products

-- NULL boolean needs IS NULL / IS NOT NULL
SELECT * FROM products WHERE is_featured IS NULL; -- not yet set

⚠️

Three-valued logic: NULL AND TRUE → NULL. Never use = NULL to check for null — always use IS NULL.

Real-World Pattern: Soft Deletes with Boolean

-- OLTP soft delete pattern (from Topic 2)
ALTER TABLE products
  ADD COLUMN is_deleted  BOOLEAN     NOT NULL DEFAULT FALSE,
  ADD COLUMN deleted_at  TIMESTAMPTZ;

-- Soft delete
UPDATE products
SET    is_deleted = TRUE, deleted_at = NOW()
WHERE  id = 42;

-- Partial index: only index active products
CREATE INDEX idx_active_products
  ON products(id)
  WHERE is_deleted = FALSE;   -- tiny, fast index!

3.5

UUID

Universally Unique Identifiers for distributed systems

▾

What is UUID?

A UUID is a 128-bit identifier, stored as 16 bytes internally, displayed as 550e8400-e29b-41d4-a716-446655440000. PostgreSQL has a native UUID type — store it as UUID, not TEXT.

UUID as TEXT

36 bytes per value. No native UUID operations. Slow comparisons.

UUID as UUID type

16 bytes per value. Efficient indexing. Proper UUID operators.

Generating UUIDs

-- Enable uuid-ossp or pgcrypto extension
CREATE EXTENSION IF NOT EXISTS "pgcrypto";

CREATE TABLE users (
  id         UUID        PRIMARY KEY DEFAULT gen_random_uuid(), -- v4
  email      TEXT        NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

INSERT INTO users(email) VALUES ('alice@example.com');
SELECT * FROM users;
-- id → 'a3f1b2c4-...'  (auto-generated)

-- Manual UUID
SELECT gen_random_uuid();

UUID vs BIGINT — When to Use Each

Criterion	BIGINT IDENTITY	UUID
Storage	8 bytes	16 bytes
Sequential inserts	Excellent (B-tree friendly)	Random → index bloat
Distributed / microservices	Needs central sequence	Generate anywhere safely
Exposing in URLs/APIs	Leaks row count	Opaque, safe
Join performance	Faster	Slightly slower

🔑

Modern pattern: Use BIGINT GENERATED ALWAYS AS IDENTITY as the internal primary key (for performance), plus a UUID column as the public-facing identifier. Best of both worlds.

CREATE TABLE users (
  id         BIGINT      GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  public_id  UUID        NOT NULL UNIQUE DEFAULT gen_random_uuid(),
  email      TEXT        NOT NULL UNIQUE
);

-- Internal joins use BIGINT id (fast)
-- External API returns public_id (safe, opaque)

3.6

JSON & JSONB

Flexible semi-structured data — storage, operators, indexing

▾

JSON vs JSONB — Which to Use?

Feature	JSON	JSONB
Storage	Exact text copy	Decomposed binary
Duplicate keys	Kept (last wins on read)	Removed
Key order	Preserved	Not preserved
Indexing	Not supported	GIN, BTREE
Query speed	Slower (re-parse each time)	Faster
Write speed	Slightly faster	Slightly slower (parse on write)

✅

Always use JSONB. The only reason to use JSON is if you need exact byte-for-byte round-trip preservation (very rare). JSONB is what every production system uses.

JSONB Operators

CREATE TABLE products (
  id    INTEGER PRIMARY KEY,
  meta  JSONB
);

INSERT INTO products VALUES
  (1, '{"name":"iPhone","price":999,"brand":"Apple"}'),
  (2, '{"name":"Pixel","price":799,"brand":"Google"}');

-- -> returns JSONB value
SELECT meta->'name'   FROM products;  -- → "iPhone" (quoted)

-- ->> returns TEXT value (unquoted)
SELECT meta->>'name'  FROM products;  -- → iPhone

-- Filter by JSON field
SELECT * FROM products
WHERE  (meta->>'brand') = 'Apple';

INSERT INTO products VALUES (3, '{
  "name": "MacBook",
  "specs": {
    "ram": 16,
    "storage": {"size": 512, "unit": "GB"}
  },
  "tags": ["laptop","apple","m2"]
}');

-- #> returns nested JSONB
SELECT meta #> '{specs,storage}'     FROM products WHERE id=3;
-- → {"size": 512, "unit": "GB"}

-- #>> returns nested TEXT
SELECT meta #>> '{specs,storage,size}' FROM products WHERE id=3;
-- → 512

-- Array element access (0-indexed)
SELECT meta -> 'tags' -> 0            FROM products WHERE id=3;
-- → "laptop"

-- ? key exists
SELECT * FROM products WHERE meta ? 'specs';       -- has 'specs' key

-- ?| any of these keys exist
SELECT * FROM products WHERE meta ?| ARRAY['specs','brand'];

-- ?& all keys must exist
SELECT * FROM products WHERE meta ?& ARRAY['name','price'];

-- @> contains (subset check)
SELECT * FROM products
WHERE  meta @> '{"brand":"Apple"}'::JSONB;        -- brand = Apple

Indexing JSONB

-- GIN index: makes all @> and ? operators fast
CREATE INDEX idx_products_meta ON products USING gin(meta);

-- Expression index: index a single JSON field
CREATE INDEX idx_products_brand
  ON products((meta->>'brand'));

-- Now this query uses the index
SELECT * FROM products WHERE meta->>'brand' = 'Apple';

Modifying JSONB

-- Add / update a key
UPDATE products
SET    meta = meta || '{"discount": 0.1}'::JSONB
WHERE  id = 1;

-- Remove a key
UPDATE products
SET    meta = meta - 'discount'
WHERE  id = 1;

-- jsonb_set: set nested path
UPDATE products
SET    meta = jsonb_set(meta, '{specs,ram}', '32')
WHERE  id = 3;

3.7

Arrays

Native 1D & multi-dimensional arrays — types, operators, indexing

▾

Array Types

Any PostgreSQL type can become an array by appending []. Arrays are useful for tags, permissions, phone numbers — small, bounded, variable-length lists.

INTEGER[] TEXT[] UUID[] BOOLEAN[] TIMESTAMPTZ[]

CREATE TABLE articles (
  id         INTEGER    PRIMARY KEY,
  title      TEXT       NOT NULL,
  tags       TEXT[],             -- array of text tags
  related_ids INTEGER[]          -- array of related article IDs
);

INSERT INTO articles VALUES
  (1, 'PostgreSQL Arrays',  ARRAY['postgres','database','sql'], ARRAY[2,3]),
  (2, 'JSONB Deep Dive',    ARRAY['postgres','json'],           ARRAY[1]),
  (3, 'Indexing Strategies',ARRAY['indexes','performance'],       ARRAY[1,2]);

Array Operators

-- Element access (1-indexed!)
SELECT tags[1] FROM articles WHERE id = 1;  -- 'postgres'

-- Length
SELECT array_length(tags, 1) FROM articles;  -- dimension 1 length

-- @> contains (left contains all elements of right)
SELECT * FROM articles WHERE tags @> ARRAY['postgres'];
-- articles 1 and 2 (both contain 'postgres')

-- && overlap (any element in common)
SELECT * FROM articles
WHERE  tags && ARRAY['json', 'performance'];  -- articles 2 and 3

-- ANY / ALL
SELECT * FROM articles WHERE 'postgres' = ANY(tags);
SELECT * FROM articles WHERE 'public'   = ALL(tags);  -- all elements match

-- Append / prepend / concatenate
UPDATE articles
SET    tags = tags || ARRAY['featured']
WHERE  id = 1;

-- Unnest: expand array to rows
SELECT id, UNNEST(tags) AS tag FROM articles;
-- Returns one row per tag per article

Indexing Arrays

-- GIN index on array column — makes @> and && fast
CREATE INDEX idx_articles_tags ON articles USING gin(tags);

-- Now fast even on millions of rows
SELECT * FROM articles WHERE tags @> ARRAY['postgres'];

⚠️

Arrays vs JOIN tables: If your array grows large (hundreds of items) or you need to query by the array elements heavily with ordering, consider a separate join table instead. Arrays shine for small, bounded lists (5–20 items).

3.8

ENUM

Type-safe, storage-efficient fixed value sets

▾

Creating ENUMs

An ENUM is a custom type with a fixed ordered list of labels. Internally stored as an integer — very compact and fast to compare.

-- Define enum types
CREATE TYPE order_status AS ENUM (
  'pending', 'confirmed', 'processing', 'shipped', 'delivered', 'cancelled'
);

CREATE TYPE user_role AS ENUM (
  'guest', 'user', 'moderator', 'admin', 'superadmin'
);

CREATE TABLE orders (
  id         BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  status     order_status NOT NULL DEFAULT 'pending',
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Insert
INSERT INTO orders(status) VALUES ('confirmed');

-- Fails! 'lost' is not a valid enum value
INSERT INTO orders(status) VALUES ('lost');  -- ❌ ERROR

Enum Ordering

Enums are ordered by their definition sequence — useful for ORDER BY and range comparisons.

-- Ordering uses the defined sequence
SELECT * FROM orders ORDER BY status;
-- → pending, confirmed, processing, shipped, delivered, cancelled

-- Range comparison
SELECT * FROM orders
WHERE  status > 'confirmed';       -- processing, shipped, delivered, cancelled

-- Inspect all enum values
SELECT unnest(enum_range(NULL::order_status));
-- → pending, confirmed, processing, ..., cancelled

Modifying ENUMs

You can add values (but not remove or reorder without recreation). Adding is safe and doesn't lock the table.

-- Add a new value
ALTER TYPE order_status
  ADD VALUE 'refunded' AFTER 'delivered';

-- Rename a value (PG 10+)
ALTER TYPE order_status
  RENAME VALUE 'cancelled' TO 'canceled';

-- ❌ CANNOT remove a value — must drop & recreate type
-- Plan your enum values carefully upfront!

💡

ENUM vs TEXT + CHECK constraint: ENUMs are slightly faster and use less storage. TEXT + CHECK is more flexible (you can add/remove values easily). For stable, ordered states like order status — use ENUM. For evolving categories — use TEXT + CHECK or a lookup table.

🗒

Quick Reference — Type Chooser

Which type should I use? All decisions in one place.

▾

Data Type Decision Table

Scenario	Recommended Type	Why
Internal row ID	`BIGINT GENERATED ALWAYS AS IDENTITY`	Sequential, small, fast joins
Public-facing ID (API)	`UUID`	Opaque, no row count leak
Money / prices	`NUMERIC(12,2)`	Exact; never use FLOAT
1–5 rating, status code	`SMALLINT`	Tiny storage
Names, emails, content	`TEXT`	Flexible, same perf as VARCHAR
Fixed-length codes (ISO)	`CHAR(2)`	Truly fixed length
Date only (birthday)	`DATE`	No time needed
Event timestamp	`TIMESTAMPTZ`	Timezone-aware, stored as UTC
Duration / TTL	`INTERVAL`	Arithmetic with timestamps
On/off flag	`BOOLEAN`	Explicit, readable
Semi-structured data	`JSONB`	Indexable, queryable
Tags, permissions list	`TEXT[]`	GIN-indexable, easy operators
Finite state values	`ENUM`	Type-safe, ordered
ML vector / embeddings	`DOUBLE PRECISION[]` or `vector` (pgvector)	Float precision needed

PostgreSQL Data Types

Numeric Types

Integer Types

Exact Decimals — NUMERIC / DECIMAL

Floating-Point — REAL & DOUBLE PRECISION

Interactive: Pick the Right Numeric Type

🧠 What are you storing?

Character Types

Overview

Storage

Common String Operators

Indexing Text

Date & Time Types

The Five Types

How timestamptz Works

Date / Time Operators & Functions

Interval — Duration Type

Indexing Date Columns

Boolean

Boolean Basics

Real-World Pattern: Soft Deletes with Boolean

UUID

What is UUID?

Generating UUIDs

UUID vs BIGINT — When to Use Each

JSON & JSONB

JSON vs JSONB — Which to Use?

JSONB Operators

Indexing JSONB

Modifying JSONB

Arrays

Array Types

Array Operators

Indexing Arrays

ENUM

Creating ENUMs

Enum Ordering

Modifying ENUMs

Quick Reference — Type Chooser

Data Type Decision Table

Quick Quiz

Q1: You're building a Swiggy-style app. What type should the amount column in the payments table be?

Q2: Your user table will have 5 billion rows across multiple microservices that need to generate IDs independently. Which primary key type is best?

Q3: You want to store an article's tags like ['postgres', 'indexing'] and query "find all articles tagged postgres". What type + index?

How `timestamptz` Works

Q1: You're building a Swiggy-style app. What type should the `amount` column in the `payments` table be?