Providing Context
Schema info, data characteristics, and effective prompts
ExoBench spins up real databases from the information you give it. You don't have to provide every detail upfront. If you give it just a query, the AI will infer a plausible schema on its own. That's often good enough to get useful initial results.
But here's the thing: the AI is guessing. It might add indexes you don't have, omit ones you do, use different column types, or create a table when you actually have a view. The less you tell it, the more it guesses, and the more likely the benchmark won't reflect your actual production behavior.
The recommended approach:
- Start with what you have. Paste your query and whatever schema info is convenient, even just the relevant table definitions. Let ExoBench run.
- Check if the results make sense. If the AI suggests an index and it doesn't help when you try it on your real database, something in the benchmark didn't match reality.
- Diagnose and refine. See When ExoBench Gets It Wrong for how to figure out what went wrong and give ExoBench better information.
What helps the most (best practice, not required)
Your table definitions. The query planner considers row width, column count, and statistics, not just the columns in your WHERE clause. A 3-column table benchmarks differently than a 30-column table with the same query. If you have your CREATE TABLE statements handy, paste them.
Your indexes. The planner might be using an index you didn't expect, or ignoring one you thought was critical. If you know your indexes, include them. If you don't, ExoBench will run without them and you can add them in follow-up benchmarks.
Your constraints. Foreign keys, NOT NULL, CHECK constraints, UNIQUE constraints. These affect planner decisions. Include them if you have them.
Non-obvious structures
If something isn't a plain table, SAY SO. ExoBench creates whatever you tell it to create. If you say "CREATE TABLE," it creates a table. It won't magically know your "table" is actually something else.
Things to call out explicitly:
- Views. "This is actually a view over orders JOIN customers."
- Partitioned tables. "This table is partitioned by created_at monthly."
- Materialized views. "This is a materialized view refreshed hourly."
- Inherited tables. (Postgres table inheritance)
- Generated columns. "This column is computed, not stored."
Your data characteristics
The AI generates random data by default. If your data has specific distributions that matter (e.g., 90% of rows have the same status, or one customer has 500K orders while everyone else has 10), those distributions affect which query plan the optimizer picks. As the data distribution example showed, the same query on the same index can be 71x slower depending on how many rows match your filter. If your initial benchmark results look off, data distribution is usually the reason. See Step 2: Question the Data Generation for how to diagnose and fix this.
Bad vs. Good Prompts
Bad: "Optimize this query." No schema, no context, no constraints. The AI will guess everything. It might still give you useful results, but you're rolling the dice.
Good: "I have this query [paste] running against this schema [paste]. It takes 30 seconds at 2M rows. 90% of items have status=1. I think a covering index might help. Benchmark the current setup vs. adding CREATE INDEX idx_items_cover ON items(created_at DESC, id DESC) WHERE status = 1. Test at 100K, 500K, and 1M rows."
Bad: "Is this query fast?"
Good: "Will this query's plan change between 10K and 1M rows? Run EXPLAIN ANALYZE at 10K, 50K, 100K, 500K, and 1M. I expect it to use the idx_orders_customer index, tell me if it stops using it at any scale."
Bad: "Test my query."
Good: "Benchmark this query but make sure the test data reflects our actual distribution: we have ~30 categories (constant), ~5000 products (slow growth), and orders scale from 10K to 1M. Each order has 2-3 line items on average."
Bad: "Why is my database slow?"
Good: "Here are the 3 slowest queries from our pg_stat_statements. Benchmark each one and tell me which has the worst scaling behavior."
The Golden Rule
More context = better benchmarks. At minimum, give ExoBench your query and let it infer the rest. For high-confidence results, also provide your schema (tables, indexes, constraints) and your data characteristics (row counts, distributions, skew). The more you give, the less the AI guesses.