JSONB Paths or GIN or Columns?

Columns beat JSONB+GIN at everything except multi-key containment on PostgreSQL 17

May 7, 2026

Summary

For an orders table with ten known nullable properties, normalized columns beat JSONB+GIN at every workload except multi-key containment. At 1M rows, count(*) WHERE units_shipped > 950 runs in 5.72ms normalized (Index Only Scan, 43 buffers) versus 78.0ms GIN (parallel seq scan, 37,196 buffers). ORDER BY top-N drops from 155ms to sub-millisecond with a typed btree expression index. jsonb_path_ops cannot answer >, <, BETWEEN, or ORDER BY. Measured on PostgreSQL 17.8 using ExoBench from 10K to 1M rows.

The Slow Query You Inherited

Here's a table you might have had the misfortune of seeing:

CREATE TABLE orders (
  id           varchar PRIMARY KEY,
  purchaser_id bigint NOT NULL,
  created_at   timestamptz NOT NULL,
  total_cost   int NOT NULL,
  details      jsonb
);

The order-level facts live as proper columns. The variable per-order stuff lives in details. The team needed to ship something quickly, the schema for that variable stuff was "in flux," and JSONB was right there. Someone added a GIN index on details and called it done.

details has ten known fields. They are all nullable. Some are stable (country, currency), some are categorical (region, status, channel), and some have meanings that nobody quite trusts yet. units_shipped is one of those: does it count returns? do partial returns count? what about bundles, one item or N? Six product managers ago, someone shipped it as "best-effort" and it stuck. The instinct, when a field's definition is in flux, is to keep it loose. JSONB is loose. So in it went.

Two example rows so you can picture what's actually in there:

id           | ord_42
purchaser_id | 91827
created_at   | 2026-04-12 14:32:08+00
total_cost   | 1247
details      | {"region": "us-west", "status": "fulfilled", "channel": "web",
             |  "customer_tier": "gold", "units_shipped": 847,
             |  "payment_method": "card", "country": "US", "currency": "USD",
             |  "device": "desktop", "item_count": 3}

id           | ord_91
purchaser_id | 91827
created_at   | 2026-04-13 09:11:44+00
total_cost   | 312
details      | {"region": "eu-central", "status": "pending",
             |  "units_shipped": 42, "payment_method": "paypal",
             |  "country": "DE", "currency": "EUR"}

Some details fields filled. Some null and stripped. Normal JSONB stuff.

Six months later the queries are slow. Not all of them. Just the ones that filter on units_shipped > 950, or sort by units_shipped to get the top 100, or aggregate over a range. The eq-lookup queries are fine. The range queries are minute-long parallel seq scans. The DBA shrugs. The original author shrugs. You're staring at a 53-line EXPLAIN ANALYZE output and one of the lines says Parallel Seq Scan on orders (cost=0.00..42393.82 rows=138880 width=0) and you don't know what to do with it.

Welcome to life on the semi-structured spectrum.

How I Measured This

I didn't write any of this benchmark code by hand. I described the problem to my AI assistant in plain English, the assistant called ExoBench (an MCP server) on my behalf, and ExoBench spun up an ephemeral PostgreSQL instance, generated synthetic data at the scales I asked for, and ran EXPLAIN ANALYZE. I watched. The numbers in this post are real plans against real data. I just didn't have to write the scaffolding.

ExoBench does not connect to your database. Each scale-point benchmark is a fresh ephemeral Postgres instance with synthetic data matching whatever schema and distribution you specified. Real plans. Real timings. Real buffer counts. Your production data never leaves your systems.

It runs at multiple scale points in a single request. 10K, 100K, 1M (one million) rows in one shot. This is how you catch the moment the planner switches strategies, which is invisible in any single-scale benchmark.

The AI reads the results, forms a hypothesis, calls ExoBench again. Baseline → hypothesis → test → validate → iterate. You watch.

I'll quote a few of my actual prompts inline as we go, so you can see the shape of the asks that drove each piece of the analysis. I'm not pasting every prompt. The point is that I never wrote a CREATE TABLE for this article, never ran a VACUUM ANALYZE, never hand-crafted a single benchmark query. The AI did. ExoBench measured. I read the output.

What I Tried

Four attempts, in the order anyone reasonable would try them. Each is benchmarked against the same three workloads at the same scales:

Eq lookup: WHERE region = 'us-west' (~18K matching orders at 1M rows, the normal lookup-by-field query)
Range aggregate: count(*) WHERE units_shipped > 950 (~44K matches, the killer)
ORDER BY top-N: ORDER BY units_shipped DESC LIMIT 100 (give me the top 100 highest-value orders)

Attempt 1: The Default GIN Index

I kicked it off with the originating ask:

I want to actually see what's better, using jsonb fields with GIN indexes or extracting out the object into a table. Both jsonb fields have 10 properties, and all of them can be null. Use ExoBench to test at scale and figure out what performs best and what the delta is.

That single prompt is the one that produced Attempt 1 and Attempt 2 below. The AI ran the canonical GIN approach first, but the open-ended phrasing ("test at scale and figure out what performs best") gave it room to try the obvious next thing too. So it kept going. By the time it came back to me, it had both the GIN results and a first cut at a btree expression index on the JSONB key. I'm presenting them as two Attempts here because they fail for different reasons, but they came out of the same ask.

This is what GIN looked like. The first thing anyone does:

CREATE INDEX idx_orders_details_gin ON orders USING GIN (details jsonb_path_ops);

This is the recommended default. It's even what the Postgres docs lead with. For containment queries it's genuinely fast.

Rows	GIN
10K	0.13 ms
100K	0.76 ms
1M	7.9 ms

Eq lookup at 1M, the plan looks healthy. The query:

SELECT id FROM orders WHERE details @> '{"region":"us-west"}';

Here's the raw EXPLAIN (ANALYZE, BUFFERS) output ExoBench handed back:

Bitmap Heap Scan on orders  (cost=147.75..33140.51 rows=20204 width=10) (actual time=1.018..6.995 rows=17999 loops=1)
  Recheck Cond: (details @> '{"region": "us-west"}'::jsonb)
  Heap Blocks: exact=819
  Buffers: shared hit=826
  ->  Bitmap Index Scan on idx_orders_details_gin  (cost=0.00..142.70 rows=20204 width=0) (actual time=0.940..0.940 rows=17999 loops=1)
        Index Cond: (details @> '{"region": "us-west"}'::jsonb)
        Buffers: shared hit=7
Planning Time: 0.103 ms
Execution Time: 7.893 ms

That's the actual artifact. For the rest of this post I'll use a simplified, annotated version of the same plan that strips the cost estimates and tags each operator with an emoji verdict:

😇 Bitmap Heap Scan on orders → 17,999 rows in 7.0ms
   |  Recheck Cond: details @> '{"region":"us-west"}'
   |  📚 819 heap blocks
   +- 😇 Bitmap Index Scan on idx_orders_details_gin (0.9ms)
         🎯 Index Cond: details @> '{"region":"us-west"}'

That's the workload GIN was built for. If your access pattern is exclusively @> containment, you can stop reading. You're done.

A note on the markers above. The 😇 is the emotive-plan style I'm using throughout this post. Postgres EXPLAIN ANALYZE is dense and operator-heavy, so I'm tagging each operator with a one-emoji verdict: 😇 good, 😢 missed optimization, 🤮 avoidable cost, 😱 catastrophic. 🔴 highlights the line that owns the problem, 💡 is the concrete fix. The original operator names and numbers stay on every line. Skim the emojis. Read the operators only when you want to verify.

But the moment a different workload shows up, GIN reveals its limits. The range aggregate query:

SELECT count(*) FROM orders
WHERE (details->>'units_shipped')::int > 950;

🤝 Finalize Aggregate
   +- 🤝 Gather (Workers Planned: 2)
🔴    +- **Parallel Seq Scan on orders → 14,910 rows/loop in 69.7ms**
🔴       **Filter: ((details ->> 'units_shipped')::int) > 950**
🔴       **Rows Removed by Filter: 318,423 per worker**
🔴       **📚 37,196 buffers, entire table is being read**
🔴       **💡 FIX: typed btree expression index, see Attempt 3**

The plan is not subtle. Postgres scanned the entire 1M-row orders table three times in parallel because what the GIN index had was the wrong shape for the question being asked. jsonb_path_ops is a containment operator class. It does not support >, <, BETWEEN, or ORDER BY. The planner ignored the GIN index and fell back to a parallel seq scan.

ORDER BY is the same story:

SELECT id, (details->>'units_shipped')::int AS amt
FROM orders
ORDER BY (details->>'units_shipped')::int DESC LIMIT 100;

~155ms, parallel seq scan, top-N heapsort. Same reason. GIN can't sort.

Okay, GIN is the wrong shape for two of three workloads. Try the next obvious thing.

Attempt 2: The Path Index

Same prompt, same run. After GIN flopped on range and ORDER BY, the AI tried the next obvious thing on its own: a btree expression index on the JSONB key. The fix the internet usually suggests:

CREATE INDEX ON orders ((details->>'units_shipped'));

A btree expression index on the JSONB key. Looks fine. It's a btree, it's an expression index, it's even on the right field. People do this all the time.

Rows	Path index, untyped (range agg)
10K	~1.9 ms (seq scan)
100K	~12 ms (seq scan)
1M	78.0 ms (seq scan)

Wait, what.

Here's what's happening. The index stores the value as text, because ->> returns text. But the query has to write the cast:

SELECT count(*) FROM orders
WHERE (details->>'units_shipped')::int > 950;

The cast is required because '1000' < '950' lexicographically and you don't want wrong answers. The moment that cast appears in the predicate, the planner can no longer match it to the text-typed index.

🤝 Finalize Aggregate
   +- 🤝 Gather (Workers Planned: 2)
🔴    +- **Parallel Seq Scan on orders → ignores idx_units_text**
🔴       **Filter: ((details ->> 'units_shipped')::int) > 950**
🔴       **📚 37,196 buffers, same as no index at all**
🔴       **💡 FIX: rebuild the index with the matching type cast**

The index exists. The index is irrelevant. Same speed as Attempt 1.

You could "match" the index by writing (details->>'units_shipped') > '950' without the cast, getting a fast index scan, and getting the wrong answer (Postgres compares strings, '9' > '1000'). Don't.

The trap is invisible without measurement. The EXPLAIN plan shows a plain Parallel Seq Scan with no mention of why the index didn't help. Without a side-by-side comparison against the typed version, you'd never know what the fix was.

Note that for text fields like region, the untyped path index works fine. Eq lookup on details->>'region' = 'us-west' runs at 3.6ms because there's no type mismatch. The index returns text, the query compares text, planner is happy. Untyped is only a footgun for numeric fields.

What the AI handed back to me at this point was a table like this:

Workload	A: GIN	B: btree expr	C: normalized
eq lookup	~28 ms	~20 ms	~18 ms
range agg (`> 950`)	~72 ms	n/a	~6 ms
ORDER BY top-100	~132 ms	n/a	~78 ms

Notice the n/a cells. The AI had run the untyped path index (column B) on the eq lookup and gotten a number. But for the range and ORDER BY workloads, the untyped index couldn't be used by the planner, so the rows came back as seq scans against the underlying jsonb table. Rather than report those as B results, the AI marked them as not applicable. The implicit question: what does B look like when the index actually works?

That's the question that opened the next round.

Attempt 3: The Path Index (Typed Correctly)

I read the table, saw the n/a cells, and gave the AI a one-line follow-up:

*Yes! Rerun [the path-index version] with proper typed expression!*

Same idea, but cast the index expression to match the query:

CREATE INDEX ON orders (((details->>'units_shipped')::int));
CREATE INDEX ON orders (((details->>'units_shipped')::int) DESC NULLS LAST);

Now the planner can use it.

Rows	Range agg	ORDER BY
10K	~0.6 ms	~0.1 ms
100K	~4 ms	~0.1 ms
1M	35.2 ms	0.10 ms

The range aggregate, same query as before:

SELECT count(*) FROM orders
WHERE (details->>'units_shipped')::int > 950;

🤝 Aggregate
   +- 😇 Bitmap Heap Scan on orders → 44,532 rows in 32.4ms
      |  Recheck Cond: ((details ->> 'units_shipped')::int) > 950
      |  📚 26,353 heap blocks (still has to recheck against jsonb)
      +- 😇 Bitmap Index Scan on idx_units_int (2.8ms)
            🎯 Index Cond: ((details ->> 'units_shipped')::int) > 950

Index used. Bitmap heap scan still has to recheck against the heap, because the index expression isn't directly part of the row. Think of the typed expression index as a sticky note on the front of a folder saying "contains big number." The note speeds up finding candidate folders. You still have to open each one to confirm the number is actually big, because notes lie and folders get edited. Still 2x faster than GIN.

The ORDER BY win is dramatic. With a DESC NULLS LAST typed expression index, the planner walks the btree in reverse order, takes the first 100 entries, stops. The query:

SELECT id, ((details->>'units_shipped')::int) AS amt
FROM orders
ORDER BY ((details->>'units_shipped')::int) DESC NULLS LAST LIMIT 100;

🤝 Limit (100 rows in 0.10ms)
   +- 😎 Index Scan using idx_units_int_desc on orders
         📚 102 buffers total
         (walks the index in DESC order, takes the first 100, stops)

A 1500x speedup over GIN, from one index. But notice what made it work:

Index built as ((details->>'units_shipped')::int) DESC NULLS LAST
Query written as ORDER BY ((details->>'units_shipped')::int) DESC NULLS LAST LIMIT 100
No WHERE x IS NOT NULL predicate on a different expression

If the WHERE clause uses details->>'units_shipped' IS NOT NULL (text-typed extraction) and the index is on ((details->>'units_shipped')::int) (int-typed extraction), the planner can't prove they describe the same set of rows. Falls back to seq scan. The index sits unused. The query takes 150ms.

This is not a Postgres bug. It is the planner being correct about something the human (i.e. ME!) got subtly wrong. Without ExoBench, you find this by running the query, getting a slow result, staring at EXPLAIN, and eventually realizing the predicate and the index are not actually the same Venn diagram.

The catch: now you're maintaining N typed expression indexes, one per filterable JSONB key, each with the right cast baked in. Same indexing cost as columns, plus the JSONB row-width overhead. You've reinvented columns with extra steps.

There's also one structural ceiling Attempt 3 cannot break. Look at the buffer counts on the range aggregate: 26,353 heap blocks at 1M rows. That number cannot go to zero against a JSONB column. The planner has to recheck the JSON extraction against the actual row (remember the sticky note: every match opens the folder), which means a Bitmap Heap Scan with thousands of page fetches. An expression index gets you most of the way. It cannot get you to zero heap fetches.

That last 6x gap to "real" performance is what Attempt 4 closes.

Attempt 4: The Hybrid (Columns + Extra JSONB)

If you really want the best performance from traditional databases, you need to embrace the suck and promote your fields into actual columns. Yes, what your DBA has been telling you all along is actually true.

The AI had already been pushing me toward this design, in the recommendation paragraphs at the end of earlier rounds:

Personally I'd go normalized and add a single extra jsonb column for the rare unknown-shape case. Best of both, no GIN tax.

Recommendation but no numbers. I wanted numbers. So I pushed back:

*Make it a table with the known fields as columns and the DYNAMIC parts out into JSON. If you do not have performance numbers for this RUN ExoBench and FIND OUT what they are!!!*

The AI then promoted the ten known fields out of details and into real columns keeping one extra jsonb column for the genuinely dynamic stuff that's actually unknown:

CREATE TABLE orders (
  id              varchar PRIMARY KEY,
  purchaser_id    bigint NOT NULL,
  created_at      timestamptz NOT NULL,
  total_cost      int NOT NULL,
  -- promoted from details:
  region          varchar,
  status          varchar,
  channel         varchar,
  customer_tier   varchar,
  units_shipped    int,
  payment_method  varchar,
  country         varchar,
  currency        varchar,
  device          varchar,
  item_count      int,
  -- the actually-dynamic part:
  extra           jsonb
);
CREATE INDEX ON orders (region);
CREATE INDEX ON orders (units_shipped);
CREATE INDEX ON orders (units_shipped DESC NULLS LAST);

Two example rows (showing the columns most relevant to the queries below; seven other columns omitted for width):

id	purchaser_id	created_at	total_cost	region	status	units_shipped	extra
ord_42	91827	2026-04-12 14:32:08+00	1247	us-west	fulfilled	847	`{"gift_msg": "happy bday!"}`
ord_91	91827	2026-04-13 09:11:44+00	312	eu-central	pending	42	`{"affiliate_id": "xyz123"}`

Known fields are columns. The unknown stuff (gift messages, affiliate IDs, plugin-specific extension data, whatever the schema doesn't yet know about) lives in extra. The known queries hit columns. The exploratory queries hit extra. And yes, the promoted columns stay nullable (Postgres default), so rows that didn't have a value before don't suddenly need one. The 10% null rate from the JSONB version carries over directly.

Rows	Eq lookup	Range agg	ORDER BY
10K	0.06 ms	0.09 ms	0.08 ms
100K	0.36 ms	0.67 ms	0.08 ms
1M	3.62 ms	5.72 ms	0.09 ms

Range aggregate at 1M, the moment of payoff. The query collapses, no ->> extraction, no cast, just a column:

SELECT count(*) FROM orders WHERE units_shipped > 950;

😎 Aggregate (5.72ms total)
   +- 😎 Index Only Scan using idx_units_desc → 44,201 rows in 3.2ms
         🎯 Index Cond: units_shipped > 950
         **Heap Fetches: 0**
         📚 43 buffers total

Forty-three buffers. Not 26,353. Not 37,196. Forty-three.

The query never touches the heap. The btree on the int column is enough on its own to count the matching rows, because Postgres tracks row visibility in a separate map and the index has units_shipped directly. This is the Index Only Scan win, and it is structurally impossible against JSONB no matter how cleverly you index the path. Even with a typed expression index, Postgres still has to recheck the JSON extraction against the heap.

The eq lookup runs in 3.62ms, comparable to the typed path index but with cleaner mechanics. The extra jsonb column does add some row-width overhead, 807 buffers vs the ~570 a pure-normalized table without extra would use, but it's a small price for keeping the JSONB escape hatch.

ORDER BY top-N is sub-millisecond at every scale. The planner walks the btree, takes 100, stops. Nothing in the row width matters because nothing in the row is being read.

This is the answer. Promote what you know. Keep one column for what you don't.

Same Query, Two Schemas, Side by Side

The same question, paired with the schema-determined plan it produces. Annotations baked into the SQL so the eye binds query and plan in one scan.

Hybrid (Attempt 4):

SELECT count(*)
FROM orders                  /* 🤝 hybrid table */
WHERE units_shipped > 950;   /* 😎 43 buffers */

GIN (Attempt 1):

SELECT count(*)
FROM orders                                     /* 🤝 jsonb table */
WHERE (details->>'units_shipped')::int > 950;   /* 🔴 37,196 buffers */

Same question. Different table shape. ~865x difference in buffer reads. This isn't an index problem, it's a schema problem.

What This Doesn't Cover

Synthetic data is not your data. ExoBench generated 1M rows with 10% nulls per field, 50 distinct region values, and uniform random units_shipped between 0 and 1000. Your production data has different cardinality, different skew, and different correlations. The benchmark is as realistic as the spec you give it. If you tell ExoBench "90% of orders are status=fulfilled" it will generate that. If you don't tell it, it will assume uniform.

The AI infers things and can be wrong. I gave the AI my schema and it inferred reasonable index recommendations. If your real schema has triggers, generated columns, partial indexes, FK cascades, or partitioning, the AI may not account for them. Always verify: ask "show me the schema you used."

ExoBench optimizes what you give it. If your query has a fundamental design problem (a SELECT * FROM orders with no WHERE, an N+1 pattern hidden in application code), ExoBench will give you a fast version of the wrong query. It is not a redesign tool.

Reads only. This post measured read performance. Write performance is its own story. GIN indexes are notoriously expensive to maintain on writes due to the pending list flush behavior. Adding N typed expression indexes (Attempt 3) compounds write cost. ExoBench can benchmark writes. This post didn't.

Probabilistic, not optimal. Two ExoBench sessions on the same query may explore different optimization paths. Both may be correct. Neither is guaranteed globally optimal. Use it as a measurement tool, not as a proof system.

Ephemeral database limits. Max 5 scale points per benchmark request, 3M rows per scale point.

The recommendation in this post applies to a specific shape. Ten known properties, all nullable, mostly queried by individual fields. If your shape is "100 fields, 90 of which are null in any given row, schema changes monthly, queries are unknown until users make them up," JSONB earns its keep and Attempt 1 might be the right answer for you. The technical answer changes with the shape of the problem.

Baseline Plus Four Attempts, Side by Side

After the Attempts were done I went back and asked for the floor. What does this table look like with no indexes at all? The frame for everything else:

I want a "0: Baseline - No Indexes." It doesn't count as an Attempt but it should be there to frame everything else. Run that test now.

ExoBench measured. The whole story in one grid. 1 million rows, warm cache, PostgreSQL 17.8.

Attempt	Eq lookup	Range agg `> 950`	ORDER BY top-100
0: Baseline, no indexes	63.1 ms (seq scan)	76.8 ms (seq scan)	154.6 ms (seq scan)
1: GIN on the JSONB column	7.9 ms	78.0 ms (seq scan)	~155 ms (seq scan)
2: Path index, untyped	3.6 ms	78.0 ms (seq scan)	seq scan
3: Path index, typed	3.6 ms	35.2 ms	0.10 ms
4: Hybrid (columns + extra jsonb)	3.6 ms	5.72 ms	sub-ms

Bold is "the index doesn't help (or there isn't one), planner falls back to scanning the whole table."

Read the table top-down and the story tells itself. With nothing at all, every query scans the whole table. GIN buys you the eq lookup, ~9x faster than baseline, and does literally nothing for the other two. The untyped path index doesn't even buy you that on the numeric workloads. A typed path index closes most of the gap but never gets to zero heap fetches against jsonb. The hybrid wins on every workload that matters and only pays a small tax on eq lookups, in exchange for keeping a real escape hatch for genuinely unstructured data.

So What Do You Actually Do

If your fields are known and your queries filter or sort on them, go to Attempt 4. Promote the known fields to columns, keep one extra jsonb column for the genuinely unknown shape. Index Only Scans on the known fields, JSONB flexibility for the rest, no GIN tax, no per-field expression-index zoo.

If you already have JSONB+GIN in production and can't migrate, go to Attempt 3. Build typed btree expression indexes on the fields you actually filter or sort on. Match the type cast in the index to the type cast in the query. Do not use untyped text indexes for numeric fields. Verify the planner is using the index with EXPLAIN ANALYZE. If you see "Parallel Seq Scan" on a query that should be using an index, the predicate and the index are in different namespaces.

Try It On Your Own Query

Find the worst query in your system. Postgres has a built-in tool for this:

SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

The top result is your candidate. Sort by total_exec_time because that's calls × cost. The query that runs every second and takes 50ms beats the query that runs once a day and takes 60 seconds.

Then point ExoBench at it. You already have Claude or Cursor or ChatGPT. Head to exobench.ai to connect it, then paste a prompt like the one I started this analysis with:

Here's a slow query and the schema [insert schema here]. Run it at 10K, 100K, and 1M rows. Show me the plan. Then try a few index variations and tell me which one wins.

That's it. The AI translates that into ExoBench tool calls. You read the output.

Find your worst query. Point ExoBench at it. See what happens.