How One Extra JOIN Reliably Breaks Postgres

Adding a 5-row lookup table to a 9-table billing query flips the plan from 0.3ms nested loops to 467ms hash joins. The cause is join_collapse_limit = 8, a default Postgres has shipped since 2005. SQL Server and MySQL don't have this cliff. Measured across three engines using ExoBench.

June 2, 2026

Summary

Adding a single 5-row lookup table (client_status) to a 9-table Postgres billing query changes the execution plan from nested loops (0.3ms) to hash joins (467ms at 50K merchants), a 1,500x regression caused entirely by join_collapse_limit = 8. At 8 tables, Postgres exhaustively searches all join orders and finds the optimal plan. At 9 tables, it falls back to a heuristic that builds hash joins across the entire dataset, then filters to 30 result rows at the end. SET join_collapse_limit = 12 restores the nested loop plan and sub-millisecond execution. SQL Server and MySQL were benchmarked with the identical 9-table query and do not exhibit this cliff. both chose nested loops at every scale point without tuning. Measured on PostgreSQL 17, SQL Server 2022, and MySQL 8 using ExoBench with EXPLAIN ANALYZE across five scale points from 500 to 50,000 merchants.

The Query I Almost Shipped

Here's a billing summary query. You've seen this query. I've seen this query. It pulls VIP merchant accounts, joins through the usual chain of lookup tables, and returns a handful of rows.

SELECT count(*)
FROM account_core a
JOIN account_balance ab  ON ab.number = a.number
JOIN account_status ast  ON ast.status_id = a.status_id
JOIN account_region ar   ON ar.region_id = a.region_id
JOIN dedicated_accounts ded ON ded.account_number = a.number
JOIN merchant_info m     ON m.alias = ded.client_alias
JOIN client_core c       ON c.alias = m.alias
JOIN reg_flat reg        ON reg.alias = m.alias
                         AND reg.market_code = 'VIP'
                         AND reg.type_code = 'M';

Eight tables. VIP filter on reg_flat narrows to six merchants. Everything else is PK lookups. The plan is nested loops from start to finish. Execution time: 0.3ms. Fast. Correct. Ships clean.

Then someone on the team adds a lookup to client_status so we can show whether each client is active or inactive. Five rows in the table. A single join on a primary key. The kind of change you approve in code review in thirty seconds:

JOIN client_status cs ON cs.status_id = c.status_id

CI flags a regression. 0.5ms → 5ms.

Here's where I tell you what I would have done. I would have called it a fluke. Or at worst, a 10x degradation on a query that runs in half a millisecond. Even if it got 10x worse in prod, that's 5ms. Even if it got 100x worse, that's 50ms. That's a backlog problem, not a pager problem. Call me lazy if you want, but given a baseline of 0.5ms, I would have shipped it.

I would have shipped a logarithmic bomb.

The regression is not 10x. It is not 100x. At 50,000 merchants, the query with client_status takes 467ms. At 75,000 merchants on a single connection without parallel workers, which is what your connection pool is actually serving, it takes 1.28 seconds. To return 30 rows. The query without client_status takes 0.3ms at every scale. Same data. Same indexes. Same everything except one 5-row lookup table. ExoBench would have caught this before it ever shipped, because nothing else in the normal workflow can. Not code review, not dev-scale tests, not staging. The failure is invisible at every scale you can cheaply test at. That is what makes it dangerous.

How I Measured This

I didn't write any of these benchmarks by hand. I described the query and schema to my AI assistant, the assistant called ExoBench on my behalf, and ExoBench spun up real database instances (Postgres, SQL Server, and MySQL), generated synthetic data at the scales I asked for, and ran real execution plans. I watched. The numbers in this post are real plans against real data. I just didn't have to write the scaffolding.

ExoBench does not connect to your database. Each benchmark is a fresh instance with synthetic data matching whatever schema and distribution you specified. Real plans. Real timings. Your production data never leaves your systems.

I'll quote my actual prompts inline so you can see the shape of the asks that drove this analysis.

If you want the architecture in full, here's how ExoBench works, and here's how it compares to chatbots and other tools.

What I Asked First

The originating prompt:

I have a billing summary query that pulls VIP merchant accounts. There are only a half-dozen VIP merchants in the system, so the result is small but perf important. Then we added a tiny lookup to client_status so we could show if each client is active or inactive. CI flagged a regression from 0.5ms to 5ms and sounds like a fluke!

CREATE TABLE client_status (status_id INT PRIMARY KEY, status_name TEXT);
CREATE TABLE client_core (alias INT PRIMARY KEY, name TEXT, status_id INT);
CREATE TABLE merchant_info (alias INT PRIMARY KEY, code TEXT, account_tag INT);
CREATE TABLE account_status (status_id INT PRIMARY KEY, status_name TEXT);
CREATE TABLE account_region (region_id INT PRIMARY KEY, region_name TEXT);
CREATE TABLE account_core (number INT PRIMARY KEY, type INT, status_id INT, region_id INT);
CREATE TABLE account_balance (number INT PRIMARY KEY, balance NUMERIC);
CREATE TABLE dedicated_accounts (account_number INT, client_alias INT);
CREATE TABLE reg_flat (alias INT, market_code TEXT, type_code TEXT);

CREATE INDEX idx_ded_client ON dedicated_accounts(client_alias);

The query is:

SELECT count(*)
FROM account_core a
JOIN account_balance ab  ON ab.number = a.number
JOIN account_status ast  ON ast.status_id = a.status_id
JOIN account_region ar   ON ar.region_id = a.region_id
JOIN dedicated_accounts ded ON ded.account_number = a.number
JOIN merchant_info m     ON m.alias = ded.client_alias
JOIN client_core c       ON c.alias = m.alias
JOIN client_status cs    ON cs.status_id = c.status_id
JOIN reg_flat reg        ON reg.alias = m.alias AND reg.market_code = 'VIP' AND reg.type_code = 'M';

Accounts is roughly 10x of merchants, dedicated links is roughly 5x of merchants. client_status, account_status are ~5 rows and account_region 50.

Is this regression even real?? Use ExoBench to figure it out.

The AI ran two benchmarks side by side. same query, same data, one with client_status and one without. At 500 merchants (5,000 accounts, 2,500 dedicated links, 6 VIP merchants), the without-client_status plan came back first:

😇 Aggregate (actual time=0.264)
  😇 Nested Loop
    😇 Nested Loop
      😇 Nested Loop
        😇 Nested Loop
          😇 Nested Loop
            😇 Nested Loop
              😇 Nested Loop
😇               Seq Scan on reg_flat → 6 VIP rows (filtered from 494)
😇               Index Only Scan on merchant_info_pkey → 1 row/loop, 6 loops
😇             Index Only Scan on client_core_pkey → 1 row/loop, 6 loops
😇           Index Scan on idx_ded_client → 5 rows/loop, 6 loops
😇         Index Only Scan on account_balance_pkey → 1 row/loop, 30 loops
😇       Index Scan on account_core_pkey → 1 row/loop, 30 loops
😇     Index Only Scan on account_status_pkey → 1 row/loop, 30 loops
😇   Index Only Scan on account_region_pkey → 1 row/loop, 30 loops
Execution Time: 0.307 ms

A note on the markers. I'm using emotive plans throughout this post: emoji-annotated query plans for the rest of us. 😇 good, 😎 clever win, 🤝 neutral structural, 😐 necessary cost, 😢 missed optimization, 🤮 avoidable cost, 😱 catastrophic. 🔴 highlights the operator the diagnosis is about. Skim the emojis. Read the operators when you want to verify.

That plan is beautiful. Postgres starts at reg_flat, finds 6 VIP rows, then nested-loops outward through primary keys. It touches 30 rows total regardless of how big the tables are.

Then the with-client_status plan:

😱 Aggregate (actual time=5.641)
🔴 Hash Join (m.alias = reg.alias) → VIP filter applied LAST
🤮   Hash Join (c.status_id = cs.status_id) → hashing ALL 2,500 merchants
🤮     Hash Join (m.alias = c.alias)
🤮       Hash Join (ded.client_alias = m.alias)
🤮         Hash Join (a.region_id = ar.region_id)
🤮           Hash Join (a.status_id = ast.status_id)
🤮             Hash Join (a.number = ab.number)
🤮               Hash Join (ded.account_number = a.number)
😱                 Seq Scan on dedicated_accounts → ALL 2,500 rows
😱                 Hash: Seq Scan on account_core → ALL 5,000 rows
😱               Hash: Seq Scan on account_balance → ALL 5,000 rows
🤝             Hash: Seq Scan on account_status → 5 rows
🤝           Hash: Seq Scan on account_region → 50 rows
🤮         Hash: Seq Scan on merchant_info → ALL 500 rows
🤮       Hash: Seq Scan on client_core → ALL 500 rows
🤝     Hash: Seq Scan on client_status → 5 rows
🔴   Seq Scan on reg_flat → 6 VIP rows (filtered LAST, after hashing everything)
Execution Time: 5.705 ms

The entire plan flipped. Instead of starting from 6 VIP rows and doing targeted lookups, Postgres is hash-joining the entire dataset. every account, every balance, every dedicated link, every merchant. and then, at the very last step, filtering to VIP. It builds hash tables for 5,000 accounts to produce 30 result rows.

At 500 merchants this is 5.7ms versus 0.3ms. "Only" a 19x regression. At this scale I might have called it acceptable.

At What Point Do I Have a Problem?

I'm not a believer in premature optimization so I pushed:

These numbers still look fast. At what point do I have a problem?

The AI ran the with-client_status query at 10,000 and 50,000 merchants. The without-client_status baseline was run at the same scales.

Merchants	Without client_status	With client_status	Regression
500	0.31 ms	5.7 ms	18x
1,000	0.30 ms	11.9 ms	40x
2,000	0.44 ms	23.5 ms	53x
10,000	0.77 ms	138.8 ms	180x
50,000	3.0 ms	466.6 ms	155x

The without-client_status column is essentially flat. It finds 6 VIP rows and does PK lookups. Whether the tables are 5,000 rows or 500,000, it touches 30 rows. The slight uptick at 50K is the reg_flat seq scan getting slower (filtering 50,000 rows to find 6).

The with-client_status column scales linearly. At 50,000 merchants, Postgres hash-joined 500,000 accounts, 250,000 dedicated links, and 50,000 merchants. It even launched parallel workers. All to produce 30 rows.

At whatever your production merchant count is, extrapolate linearly from the second column.

The Root Cause

Okay. Investigate how to fix it.

The AI's first hypothesis was an index on reg_flat. Created idx_reg_vip ON reg_flat(market_code, type_code) INCLUDE (alias). Ran the benchmark. Still hash joins everywhere. 132ms at 10K, 422ms at 50K. The index was used for the reg_flat scan itself, but the planner kept hash-joining the rest. The problem wasn't reg_flat visibility.

Then the AI hit on the real hypothesis: the query has 9 tables. Postgres's join_collapse_limit defaults to 8.

join_collapse_limit controls how many tables the Postgres planner will consider in its exhaustive join-order search. Below the limit, it evaluates every possible join permutation and picks the cheapest one. Above the limit, it falls back to a heuristic that walks the FROM clause in order and builds hash joins.

Eight tables: exhaustive search, finds the optimal nested loop plan from reg_flat inward. Nine tables: heuristic, produces a catastrophic hash-join-everything plan.

The AI set join_collapse_limit = 12 and ran the benchmark:

Merchants	Broken (jcl=8)	Fixed (jcl=12)
10,000	138.8 ms	0.85 ms
50,000	466.6 ms	3.19 ms

The nested loop plan came back. Sub-millisecond at 10K. The 3ms at 50K is just the reg_flat seq scan.

Adding the covering index on reg_flat alongside jcl=12 closed the last gap: 0.29ms at 10K, 0.34ms at 50K. Identical to the original 8-table query.

The fix is two lines:

SET LOCAL join_collapse_limit = 12;

Seriously?

Here's my "hot take" on the issue: the concept of a database planner that fails on one more table is absurd.

We are lectured constantly about building systems that degrade gracefully. Retry with backoff. Circuit breakers. Bulkheads. Timeouts. The entire discipline of reliability engineering exists because hard cliffs are unacceptable in production systems. Yet here is Postgres, the database we all praise for its query planner, with a hard cliff at exactly 8 tables that changes the optimizer's entire search strategy. Not a gradual increase in planning time. Not a slightly worse plan. A total plan flip, from optimal to catastrophic, triggered by adding a single 5-row lookup table.

The default of 8 was set in 2005. It has not changed. Every billing system, every reporting query, every star schema with a few dimension tables lives in the exact range where one more JOIN crosses the line.

So I had to know: does anyone else do this?

Does anyone else Have This Cliff?

You're saying join_collapse_limit causes Postgres to literally give up? Double check if SQL Server and MySQL have this kind of scaling cliff! Give me real numbers!

The AI called out to ExoBench and ran the identical 9-table query on all three engines (note that ExoBench does this literally in one prompt). Same schema shape. Same data distributions. Same scale points. Here are the results:

Engine	500	1,000	2,000	10,000	50,000
Postgres (default jcl=8)	5.7 ms	11.9 ms	23.5 ms	138.8 ms	466.6 ms
Postgres (jcl=12, fixed)	0.31 ms	0.30 ms	0.44 ms	0.85 ms	3.19 ms
SQL Server	0.5 ms	0.5 ms	1 ms	2 ms	10 ms
MySQL	0.55 ms	0.78 ms	1.31 ms	5.57 ms	26.6 ms

SQL Server chose nested loops at every scale point. At 10K merchants its plan metadata says StatementOptmEarlyAbortReason="GoodEnoughPlanFound". It found the nested loop plan quickly and stopped looking. At 50K it says "TimeOut", meaning it spent longer searching, but it still landed on nested loops. No cliff. No plan flip. No knob required.

MySQL's optimizer_search_depth defaults to 62. It exhaustively evaluated join orders for 9 tables without breaking a sweat. All nested loops, every scale point.

MySQL returns execution plans in a single text blob with actual-time annotations inline. SQL Server returns them in XML for some reason (last I checked it's 2026). Both of them found the right plan. Without being told.

The Chart That Should Annoy You

Here it is on a linear scale. Not log. Linear. Because log scale is kind to the broken line and I don't feel like being kind right now.

Postgres default: adding one 5-row lookup table turns a 0.3ms query into a 467ms query. SQL Server and MySQL stay flat.

Postgres default (jcl=8): the red line. Everything else: the lines you can't see because they're flat against the x-axis.

The three lines at the bottom are fixed Postgres, SQL Server, and MySQL. You can't tell them apart because the regression is so dominant it compresses everything else to zero.

Sure, Postgres is the fastest engine in the room when the plan is right. 0.3ms versus SQL Server's 2ms versus MySQL's 5.6ms. But I would rather have reliably graceful degradation any day of the week than fast queries with hidden scale-cliffs. SQL Server at 10ms is 50x slower than fixed Postgres at 0.2ms. I don't care. It found the right plan without me having to know a magic number. MySQL at 27ms is 90x slower than fixed Postgres. I don't care about that either. It found the right plan.

The worst case is not "my query is 10ms slower." The worst case is "my query is 1,500x slower and I cannot tell from dev, staging, code review, or a generic chatbot." That is the cliff. That is what breaks trust.

What This Doesn't Cover

join_collapse_limit is one planner GUC. This post fixed one threshold for one workload shape. It is not a comprehensive Postgres tuning guide.

ExoBench catches measurable, scale-dependent query behavior. Plan flips caused by table count thresholds. Cardinality misestimates under skewed data. Index-usage cliffs. Crossover points where the planner switches strategies. These are real, and this is the class the join_collapse_limit cliff belongs to.

ExoBench does not catch everything. Logic errors, wrong requirements, race conditions, application-layer failures, data-correctness issues, anything that does not show up in an execution plan at scale. ExoBench would not have caught a wrong JOIN condition that returns plausible-but-wrong rows. If the client_status join had been on the wrong column and returned incorrect active/inactive labels but happened to run fast, ExoBench would have waved it through. It benchmarks performance, not correctness of intent.

This would not have caught every way that feature could have hurt you. It would have caught this way, the invisible-at-scale way, which happens to be the way you can least afford and least easily test for.

Synthetic data is not production data. ExoBench generated merchants with deterministic VIP assignment and uniform account distribution. Your production data might have 200 VIP merchants instead of 6, different account-per-merchant ratios, or skewed region distributions. The plan you get depends on these distributions. Run the benchmark with cardinalities that match your production statistics.

The AI infers things and can be wrong. I gave the AI a schema sketch and it filled in realistic data generation patterns. If your production schema has triggers, partitioning, or partial indexes, the benchmark won't account for them automatically. Always verify with "show me the schema you used."

The cross-engine comparison used ExoBench's ephemeral instances. Postgres on Neon, SQL Server on Azure SQL Database Serverless, MySQL on Aurora Serverless v2. Production deployments on different hardware, with different memory configurations and buffer pool sizes, will produce different absolute timings. The plan shapes and the presence or absence of the cliff are the finding, not the specific millisecond values.

ExoBench limits. Max 5 scale points per benchmark request, 3M rows per scale point. The 50K-merchant benchmark used approximately 850K total rows (accounts + dedicated + merchants + lookups), well within limits.

So What Do You Actually Do

First, count your tables:

-- How many tables does your worst query join?
-- If the answer is 9 or more, check your join_collapse_limit.
SHOW join_collapse_limit;

If it returns 8 (the default), every query with 9+ tables is past the cliff. You're one lookup table away from a plan flip you cannot predict.

The fix is straightforward. Set join_collapse_limit to 12 at the session, role, or cluster level:

-- Cluster-wide (preferred):
ALTER SYSTEM SET join_collapse_limit = 12;
SELECT pg_reload_conf();

-- Per-role:
ALTER ROLE billing_user SET join_collapse_limit = 12;

-- Per-session (diagnostic):
SET LOCAL join_collapse_limit = 12;

Planning time increases slightly. at 50K merchants, the exhaustive search for 9 tables added roughly 2ms of planning time. That buys you a 1,000x improvement in execution time. Worth it.

Then find your worst query. You already have Claude or Cursor or ChatGPT. Head to exobench.ai to connect it, then paste a prompt like the one I started this analysis with:

Here's a billing query that joins 9 tables. It's fast in dev at low row counts. Run it at realistic scale (10K, 50K merchants with 10x accounts each). Show me the plan. Does the plan change shape as scale grows? If so, tell me why and what to do about it.

That's it. The AI translates that into ExoBench tool calls. You read the output. The query that passed review and passed staging gets a real answer at real scale in ninety seconds.

Find your worst query. Count the tables. Check the limit. Find out what year your vendor's defaults are from.