Why Does PostgreSQL Skip My Index?

...because your vendor's Cost Model Is from 2002

Summary

PostgreSQL's default random_page_cost = 4.0 was calibrated in 2002 for spinning disks with 90% cache hit assumptions. It has not been changed in roughly 25 years. On modern SSD-backed cloud Postgres with a warm cache, the planner systematically prefers Parallel Seq Scan over index lookups when both are valid, because the cost model thinks random I/O is 4x more expensive than sequential I/O. At 200K customers and 2M orders with the obvious indexes already in place, a single SET random_page_cost = 1.1 cuts execution time from 125.98ms to 68.69ms (1.83x) with zero schema changes. A covering index closes the rest of the gap to 41.16ms (3.06x), at the cost of write amplification. Measured on PostgreSQL 17 using ExoBench across plans verified with EXPLAIN (ANALYZE, BUFFERS).

You've Already Written This

Here's a query. You've seen this query. I've seen this query. Every e-commerce backend, CRM, and SaaS admin panel has run a version of it. "Find all orders for high-value customers in the northeast":

SELECT c.email, o.id, o.total, o.status, o.shipped_at
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE c.region = 'northeast' AND c.lifetime_value > 5000

The schema is the obvious one:

CREATE TABLE customers (
  id INT PRIMARY KEY,
  email TEXT,
  region TEXT,
  lifetime_value NUMERIC,
  created_at TIMESTAMPTZ
);

CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT,
  total NUMERIC,
  status TEXT,
  shipped_at TIMESTAMPTZ
);

CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_customers_region_ltv ON customers(region, lifetime_value);

Region is equality, lifetime_value is range, composite index leads with equality. Textbook. The orders side is indexed on customer_id. That's the join key. Anyone reading this in code review would nod and move on.

The query is slow at scale. Not catastrophic. Not 30 seconds. Just slow enough that the dashboard takes a beat to load and nobody on the team can quite tell you why. The DBA shrugs. You stare at EXPLAIN. The plan says "Parallel Seq Scan on orders" and you think, but I have an index on orders. Why isn't it using it?

The answer is that your vendor is running a cost model from before iPods existed.

How I Measured This

I didn't write any of these benchmarks by hand. I described the query and schema to my AI assistant in plain English, the assistant called ExoBench (an MCP server) on my behalf, and ExoBench spun up a PostgreSQL instance, generated synthetic data at the scale I asked for (200K customers, 2M orders, ~10% high-value, even region split), and ran EXPLAIN (ANALYZE, BUFFERS). I watched. The numbers in this post are real plans against real data. I just didn't have to write the scaffolding.

ExoBench does not connect to your database. Each benchmark is a fresh Postgres instance with synthetic data matching whatever schema and distribution you specified. Real plans. Real timings. Real buffer counts. Your production data never leaves your systems.

I'll quote my actual prompts inline so you can see the shape of the asks that drove this analysis. I never wrote a CREATE TABLE, never ran a VACUUM ANALYZE, never hand-crafted a single benchmark query. The AI did. ExoBench measured. I read the output.

If you want the architecture in full, here's how ExoBench works, and here's how it compares to chatbots and other tools.

What I Asked First

The originating prompt:

*I have a query [pastes the SELECT above]. This is "find all orders for high-value customers in the northeast." Every e-commerce backend, CRM, and SaaS admin panel runs this pattern. Roughly speaking my schema [pastes the schema]. Use ExoBench to benchmark and test. Explain the schema to me with an emotive query plan.*

The AI ran the baseline. At 200K customers and 2M orders, the query took 125.98 ms. Here's the plan ExoBench handed back. The full EXPLAIN ANALYZE output was 24 lines. I'll show the annotated form:

๐Ÿ˜ Gather (Workers Planned: 2)
  +- ๐Ÿค Parallel Hash Join on customer_id
๐Ÿ”ด   +- **Parallel Seq Scan on orders โ†’ 666,667 rows/loop in 41.5ms**
๐Ÿ”ด   |     **๐Ÿ“š 16,667 buffer hits per worker, every page of orders read**
๐Ÿ”ด   |     **idx_orders_customer_id EXISTS but planner did not pick it**
     +- ๐Ÿ˜Ž Parallel Hash (build side, ~1,660 rows/worker, 416kB, one batch)
        +- ๐Ÿ˜‡ Bitmap Heap Scan on customers (4,980 rows matched)
              ๐ŸŽฏ idx_customers_region_ltv used for both predicates

A note on the markers. I'm using emotive plans throughout: ๐Ÿ˜‡ good, ๐Ÿ˜Ž clever win, ๐Ÿค neutral structural, ๐Ÿ˜ necessary cost, ๐Ÿ˜ข missed optimization, ๐Ÿคฎ avoidable cost, ๐Ÿ˜ฑ catastrophic. ๐Ÿ”ด highlights the operator the diagnosis is about, ๐Ÿ’ก marks a concrete fix. The original operator names stay on every line. Skim the emojis. Read the operators when you want to verify.

The diagnosis the AI came back with: pulling four columns from orders (total, status, shipped_at, plus id) means even if the planner used idx_orders_customer_id, it would still need a heap fetch for every matching row. ~50K result rows ร— random heap I/O = the planner does the math, concludes sequential scan is cheaper, and reads all 2M rows. Recommended fix: a covering index.

CREATE INDEX ON orders(customer_id) INCLUDE (id, total, status, shipped_at);

ExoBench ran that version. 41.16 ms. 3.06x faster. Heap Fetches: 0. The plan flipped to Nested Loop + Index Only Scan. Tidy. Clean. The textbook answer.

I almost moved on.

The Nag Before the Storm

The thing that bugged me was a footnote in the AI's recommendation: "the planner correctly judged that a sequential scan would be cheaper than 50K random heap fetches." That phrasing.

Correctly judged.

Wait. Did it though? The AI just told me the planner picked the wrong plan, and then explained why the planner was right to pick it. Both can't be true. So I pushed back:

Hold up! WHY DIDN'T YOU USE THE orders index??? Use the index!

The AI ran the same benchmark again with SET enable_seqscan = off; SET enable_bitmapscan = off;, which forces the planner away from the Seq Scan and into whatever it can find with indexes. Result:

74.44 ms. The query the planner refused to consider runs in roughly 60% of the time of the plan that the planner chose. With no schema change. With the indexes you already had.

๐Ÿ˜ข Gather (Nested Loop, planner-rejected plan)
  +- ๐Ÿค Nested Loop
๐Ÿ”ด   +- **Parallel Index Scan using customers_pkey on customers**
๐Ÿ”ด   |     **Filter: lifetime_value > 5000 AND region = 'northeast'**
๐Ÿ”ด   |     **Rows Removed by Filter: 97,475 per worker (95% wasted)**
     +- ๐Ÿ˜‡ Index Scan using idx_orders_customer_id on orders
           ๐ŸŽฏ customer_id = c.id, ~10 rows per probe

The forced plan is faster than the planner's pick, but it's not the good plan. Look at the customers side. The planner fell back to scanning the entire primary key and filtering row by row, because I'd disabled bitmap scans wholesale and there was nothing left. The right composite index was sitting right there and went unused. The forced plan is winning despite doing the wrong thing on the customers side, because the planner's original choice was so bad that even a stupid pkey-scan-with-filter beats it.

The AI's first explanation appealed to "the cost model assumed random_page_cost = 4.0." but that's an answer made of more questions. What is random_page_cost? Why is it 4? Is 4 right? What should it be?

I made the AI go look it up.

Wait, A Number From 2002?

A "more conservative random_page_cost"?? What does that even mean!? Why is this metric wrong and WHAT is it supposed to be? Look this up ON THE INTERNET! Don't just make stuff up!

What the AI came back with, after actually reading the Postgres docs and the pgsql-hackers mailing list:

  • random_page_cost is a unitless ratio. Not milliseconds. Not bytes. It tells the planner how much more expensive a random page fetch is compared to seq_page_cost = 1.0. Lower it, index scans look cheaper. Raise it, sequential scans look cheaper.

  • The default is 4.0. It has been 4.0 for roughly 25 years.

  • The historical Postgres docs (versions 9.3 through 17) justified the 4.0 by explaining that random access to mechanical disk storage is even MORE expensive than four times sequential access, but they LOWERED it to 4.0 because 90% of random accesses to disk should be in cache. The default random access can be 40 times slower!

  • Tomas Vondra (Postgres committer) wrote on pgsql-hackers in October 2025 that the GUC and its 4.0 default were introduced in ~2000. The abstract for his upcoming POSETTE 2026 talk on this exact parameter says 2002. Here's the full quote because this is so good you can't make it up! "The GUC (and the 4.0 default) was introduced in ~2000 [1], so ~25 years ago. During that time the world went from rotational drives through multiple generations of flash / network-attached storage. I find it hard to believe those changes wouldn't affect random_page_cost."

Read that twice. The default bakes in two assumptions: spinning disks, and a 90% cache hit ratio. From a time when those assumptions were reasonable.

The current standard recommendations from pgTune, Cybertec, Frederik Himpe's tuning guide, and roughly every Postgres consultancy:

  • random_page_cost = 1.1 for SSD-backed deployments
  • random_page_cost = 1.0 when the entire working set fits in RAM

Is this universally agreed? No. There is an active pgsql-hackers thread from October 2025 where Tomas Vondra (committer) posted direct-I/O benchmark results measuring random_page_cost values of 20 to 50 when the cache is cold, even on modern NVMe SSDs. His point: 1.1 is right when pages are mostly cached, but the uncached SSD ratio is much higher than people assume. The right value depends on your cache hit rate. For a fully-cached OLTP workload (most real workloads of this shape), 1.1 is right. For a workload that misses cache constantly, 1.1 will over-favor indexes and you'll get bad plans in the other direction.

The thread is worth reading if you want to see committers argue about a 25-year-old default in real time. The TLDR: nobody is happy with 4.0, but nobody agrees what to change it to either.

For my benchmark, every page was in shared_buffers. Look at the buffer counts from earlier: shared hit=18678, read=27. 99.9% cache hit. The 4.0 default is wildly wrong for this case.

The One-Line Fix

So you're saying my vendor IS USING A METRIC MADE FOR HDDs!!?? Adjust it for SSDs and rerun the benchmarks (with otherwise normal settings!)

ExoBench ran the original schema. Same indexes. Same data. The only change: SET random_page_cost = 1.1 before the query. Result: 68.69 ms. 1.83x faster than the default-configured baseline. Zero schema changes. The planner switched plans on its own.

๐Ÿ˜Ž Gather (planner picked the right plan once the cost model matched reality)
  +- ๐Ÿค Nested Loop (5,144 customers ร— ~10 orders each)
     +- ๐Ÿ˜‡ Parallel Bitmap Heap Scan on customers
     |     ๐ŸŽฏ idx_customers_region_ltv used for region + lifetime_value
     +- ๐Ÿ˜Ž Index Scan using idx_orders_customer_id on orders
           ๐ŸŽฏ customer_id = c.id (5,144 probes, ~0.01ms each)

Compare the customers side to the forced-index plan from earlier. The SSD-tuned plan uses idx_customers_region_ltv. The forced plan uses customers_pkey and filters 95% of rows. Same query, same data, same indexes available. The only difference is which approach the planner was allowed to choose. Force the planner with enable_seqscan = off and you get a hack plan that wins by accident. Fix the cost model and you get the correct plan.

This is why "just force the index" is the wrong mental model. enable_seqscan = off and friends are diagnostic tools. They prove that the planner could have picked a better plan. They are not a tuning strategy. You don't ship them. You fix the reason the planner didn't pick the right plan in the first place.

In this case the reason was a 2002 default. One configuration value. Every query in the database.

The Whole Story in One Grid

Same workload, same indexes (unless noted), 200K customers, 2M orders, PostgreSQL 17, warm cache.

Configurationrandom_page_costPlan chosenExecution TimeSpeedup
Default (vendor-shipped)4.0Parallel Hash Join + Seq Scan orders125.98 ms1.0x (baseline)
Forced index (seqscan off, bitmapscan off)4.0Nested Loop + Index Scan via pkey74.44 ms1.69x
SSD-tuned, no schema change1.1Nested Loop + Index Scan orders68.69 ms1.83x
Covering index added4.0Nested Loop + Index Only Scan41.16 ms3.06x

Read top to bottom. The default plan is the slowest. Forcing the index with planner overrides gets you most of the way, but for the wrong reason and with the wrong customers-side index. Fixing the cost model gets you to roughly the same wall-clock with the correct plan on both sides. The covering index closes the rest of the gap by eliminating heap I/O entirely, but at the cost of ~100MB of storage and write amplification on total, status, and shipped_at.

The row that should bother you is the third one. No schema change. No new index. No application change. One configuration value, ranked against the same query, and the planner makes a better decision across the board.

What This Doesn't Cover

random_page_cost = 1.1 is not universally correct. It is right for SSD-backed Postgres where the working set fits comfortably in shared_buffers and the OS page cache. If your database is much larger than RAM, with frequent cold reads, 1.1 will over-favor index scans and you may regress on queries that genuinely benefit from sequential I/O. Measure. Don't copy the heuristic blindly.

Your cache hit rate matters more than the default. EXPLAIN (ANALYZE, BUFFERS) shows you shared hit vs read counts. If your hot queries are 99% hits, 1.1 is safe. If they're 50/50 against disk, you want something higher. ExoBench can run the same benchmark with different random_page_cost values in preBenchmarkSql to find your inflection point.

Synthetic data is not your data. ExoBench generated 200K customers with 4 regions evenly split and ~10% above the $5K lifetime_value threshold. Your real data might have 20 regions, with northeast representing 35% of customers, and the $5K threshold matching 2% of them. The plan you get depends on selectivity. Run the benchmark with the cardinalities and distributions that match your production statistics.

The AI infers things and can be wrong. I gave the AI a schema sketch and it filled in the missing primary key on orders, picked a NUMERIC type for lifetime_value, and chose a 4-value region domain. If your production schema has triggers, partial indexes, or partitioning, the benchmark won't account for them automatically. Always verify with "show me the schema you used."

random_page_cost is one of many planner GUCs. effective_cache_size, effective_io_concurrency, work_mem, cpu_index_tuple_cost. Tuning the cost model is a whole discipline. This post fixed one parameter for one workload shape. It is not a comprehensive tuning guide.

Reads only. This post measured read performance. Adding a covering index pays a write penalty (HOT updates disabled when any indexed column changes, including INCLUDE columns). For a high-write orders table where status and shipped_at change frequently, you may prefer fixing random_page_cost without adding the covering index. The hierarchy of trade-offs depends on your write rate.

ExoBench limits. Max 5 scale points per benchmark request, 3M rows per scale point.

So What Do You Actually Do

First, check what your cluster is using:

SHOW random_page_cost;
SHOW seq_page_cost;

If random_page_cost returns 4, you have the 2002 default. RDS, Aurora, Azure Database for PostgreSQL, GCP Cloud SQL, most managed Neon: I'd expect them all to ship 4.0 unless someone tuned the parameter group. They run on SSDs. They are using a number calibrated for spinning disks.

If your working set fits in RAM, set it to 1.1 at the session, role, or cluster level. For most OLTP workloads this is the single highest-leverage one-line change you can make. It will not hurt cold queries that legitimately benefit from sequential scans (the planner will still pick Seq Scan when it's actually cheaper). It will stop the planner from rejecting good index plans because it thinks the orders table is on a 5400 RPM disk.

If you cannot change the cluster setting, you can set it per-session or per-role:

-- Cluster-wide (preferred):
ALTER SYSTEM SET random_page_cost = 1.1;
SELECT pg_reload_conf();

-- Per-role:
ALTER ROLE app_user SET random_page_cost = 1.1;

-- Per-session (diagnostic only):
SET random_page_cost = 1.1;

If you can't change it at any level (some managed services lock the parameter), the covering-index path from earlier still works. It's just more invasive.

Try It On Your Query

Find the worst query in your system:

SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

The top result is your candidate. Sort by total_exec_time because that's calls ร— cost. The query that runs every second at 50ms beats the query that runs once a day at 60 seconds.

Then point ExoBench at it. You already have Claude or Cursor or ChatGPT. Head to exobench.ai to connect it, then paste a prompt like the one I started this analysis with:

Here's a slow query and the schema [insert schema]. Run it at realistic scale. Show me the plan. Then try a few variations, including random_page_cost = 1.1, and tell me which one wins.

That's it. The AI translates that into ExoBench tool calls. You read the output. The query that's been bugging you for six months gets a real answer in ninety seconds.

Find your worst query. Point ExoBench at it. Find out what year your vendor's defaults are from.