Getting Started

How ExoBench Works

MCP, the optimization loop, and data safety

ExoBench is an MCP-powered SQL benchmarking service. You connect it to your AI tool, describe your SQL performance problem in plain English, and the AI autonomously spins up real databases, generates test data, benchmarks your queries at different scales, and iterates on indexes and rewrites until it converges on the fastest solution.

You don't call ExoBench directly. You describe your problem to your AI assistant and it uses ExoBench behind the scenes.


MCP: How Your AI Tool Talks to ExoBench

MCP (Model Context Protocol) is a protocol that lets AI tools communicate with external servers. Your AI assistant (Claude, ChatGPT, Cursor, Gemini, Windsurf) is the client. ExoBench is the server. When you describe a SQL performance problem, the AI calls ExoBench's MCP tools to spin up databases, generate data, run benchmarks, and return results. You don't need to understand MCP to use ExoBench. You just talk to your AI assistant.


The Optimization Loop

Here's what happens when you say "this query is slow at 500K rows":

  1. You describe the problem. Paste your query, describe your schema, mention your row counts and any data distributions that matter. The more you tell the AI, the less it guesses.

  2. The AI calls ExoBench. Behind the scenes, the AI sends your query and schema to ExoBench's MCP tools.

  3. ExoBench spins up a fresh Postgres instance. Not your database. A fresh, ephemeral instance that exists only for this benchmark.

  4. ExoBench generates synthetic data. At the scale points you specified (e.g., 10K, 100K, 500K rows), matching any data distributions you described (e.g., "90% of rows are status='completed'").

  5. ExoBench runs EXPLAIN ANALYZE. Real execution plans. Real timings. Real row counts. On real Postgres.

  6. The AI reads the results. It identifies the bottleneck: missing index, suboptimal join, bad access path.

  7. The AI calls ExoBench again. This time with a modification: a new index, a rewritten query, a different schema.

  8. ExoBench benchmarks the modified version. Same scales, same distributions. Returns comparison results.

  9. Repeat until convergence. The AI iterates until performance stops improving or it runs out of ideas.

  10. You get the results. The optimized query, the indexes to create, and the EXPLAIN ANALYZE proof at every scale point.

The optimization loop diagram


Multi-Scale Testing

ExoBench tests your query at multiple data volumes in a single request. You can test at 10K, 100K, and 500K rows and see the results side by side.

This is ExoBench's most important capability that you can't easily replicate manually. It reveals:

  • Whether your optimization holds at scale. An index that helps at 10K rows might be ignored by the planner at 500K rows because the cost model changes.
  • Where the planner switches strategies. Postgres might use a nested loop at 10K rows, switch to a hash join at 100K, and go parallel at 500K. You see all three plans in one benchmark.
  • Whether execution time scales linearly or stays flat. A flat line across scale points means your index is working. A rising line means the planner is falling back to a scan.

Doing this manually requires generating synthetic data at each scale, loading it, running the query, capturing the plan, and comparing. Per scale point, that's minutes to hours of setup. Nobody does this for every query, developers rarely do this for any query. ExoBench does it on every benchmark.

Real example (verified on ExoBench):

RowsWithout IndexWith Composite IndexSpeedup
10K0.56 ms0.08 ms7x
50K2.50 ms0.10 ms25x
100K4.35 ms0.07 ms62x
250K18.96 ms0.07 ms271x
500K33.47 ms0.08 ms418x

The "without index" curve climbs linearly. At 250K, Postgres spins up parallel workers to compensate. The "with index" line stays flat. Five scale points. One request.


Data Distribution Control

Real databases have skewed data. Maybe 90% of your transactions are status='completed'. Maybe one user has 500K rows while everyone else has 100. These distributions directly affect which query plan Postgres picks.

ExoBench lets you specify these distributions. Tell your AI assistant "90% of rows are status='completed', 9.5% pending, 0.5% failed" and ExoBench generates data matching that skew. The results show how the planner responds to your actual cardinality, not a uniform random distribution that doesn't match production.

Real example (verified on ExoBench):

FilterMatching rowsPlanTime
status = 'failed' (~0.5%)~2,500Index Scan (index used)3.9 ms
status = 'completed' (~90%)~450,000Parallel Seq Scan (index ignored)280.5 ms

Same query. Same index. 71x performance difference. The planner ignores the index when 90% of rows match because scanning the whole table is cheaper than 450,000 index lookups. A chatbot would say "add an index on status." ExoBench shows you that the index only helps for the minority values.


What ExoBench Does NOT Do

  • Does not connect to your database. ExoBench spins up ephemeral Postgres instances with synthetic data. It never touches your production, staging, or development databases.
  • Does not access your production data. Your actual data never leaves your systems. ExoBench generates synthetic data matching the distributions you describe.
  • Does not apply changes to your database. ExoBench benchmarks optimizations in its ephemeral environment. You decide what to apply to your own systems.
  • Does not guarantee globally optimal results. The AI explores the optimization space and finds better points. Two runs may explore different paths and arrive at different, both good, solutions.
  • Does not fix infrastructure issues. Network latency, memory pressure, connection pool exhaustion, deployment topology. ExoBench benchmarks query execution, not your infrastructure.
  • Does not replace understanding. ExoBench proves which optimization works and shows you the EXPLAIN ANALYZE plan. You should still understand why it works before deploying it.

Your Data Never Leaves Your Systems

ExoBench has a strong data safety story because of its ephemeral architecture.

What ExoBench sees:

  • The query text you want to optimize
  • The schema you provide (CREATE TABLE statements), or the schema the AI infers from your query
  • Scale point specifications (e.g., "test at 10K, 100K, 500K rows")
  • Distribution specifications (e.g., "90% of rows are status='completed'")

What ExoBench never sees:

  • Your actual production data
  • Your database credentials or connection strings
  • Any real row data from your tables

How benchmarks work: ExoBench creates ephemeral Postgres instances for each benchmark. Data is synthetically generated inside the ephemeral database based on the schema and distribution specs you provide. After the benchmark completes, the ephemeral database is destroyed. No data persists between benchmarks.

Unlike tools that require connecting to your actual database, ExoBench never touches your production data. The optimization recommendations are based on real Postgres execution behavior but use zero production data.


Supported Databases

  • PostgreSQL — Full support. Uses generate_series() for fast data generation directly inside Postgres.
  • SQLite — Full support. Data is auto-generated based on column types.
  • MySQL — Coming soon.
  • SQL Server — Coming soon.

See Supported Databases for more details.


Get Started

ExoBench plugs into the AI tool you already use. No new application to install. No new interface to learn.

30-second quickstartHow ExoBench compares to other approachesFull documentation