Your AI Doesn't Need More Tokens. It Needs Context.
Schema Context Is Eating Your Token Budget
Some of the largest costs in AI analytics start before the model answers a single question. To query enterprise data directly, LLMs first need the database explained to them. Every table, column, relationship, and schema detail becomes part of the prompt, adding token cost before the actual analysis begins. Even the labs building these models flag the cost:
"Every line is loaded into context on every request, so each one should be worth its cost." - Anthropic, the company behind Claude
This is the cost Mosaic is built to attack. Depending on the deployment, Mosaic reduces this cost sharply or removes it entirely.
With text-to-SQL, the model receives the raw database schema and uses that context to generate a query. This means table names, columns, data types, and foreign key relationships must be loaded into the model’s context on every request.
In a whitepaper benchmarking AI token cost and accuracy, Strategy’s team ran 17 analytics questions against a real-world, 28-table insurance data model to measure how many tokens that schema context requires. The result was roughly 4,428 tokens per query before the model did any analysis. For a production schema of 100 to 500+ tables, that can grow to 100,000+ tokens per query in just schema context. The cost builds quietly as schemas get larger and more users begin asking questions.

How Mosaic Reduces Token Usage
Mosaic sits between the AI and the database, providing the business context that raw schema alone can't. The cost savings depend on which deployment path you choose.
With Mosaic MCP, the LLM still writes the SQL, but it works from compressed business context instead of your raw schema. Rather than receiving the full schema on every query, the model gets only the definitions relevant to the question being asked. In the benchmark, schema context dropped from 4,428 to 2,073 tokens per query, a 53% reduction. SQL token count dropped 24%. Together, estimated per-query cost came down 37%.
Strategy AI Agents take this further. They remove the LLM from query generation entirely. The LLM handles the natural language question while Mosaic does the rest. It looks up the relevant business definitions, generates the SQL, executes the query, and returns the result. Eliminating schema context entirely is what drives the ~98% reduction against the PostgreSQL baseline.
Context Makes AI More Accurate
The business context that compresses token usage also prevents a more dangerous failure: a number that looks correct but is not. When direct text-to-SQL has to infer join logic from raw schema alone, it can choose a path that looks structurally valid but applies the wrong business logic. The database still returns a number, and nothing flags that the result is wrong. Across the benchmark’s 17 questions, direct PostgreSQL missed two. Mosaic answered all 17 correctly: 100% versus 88.2%. The difference comes from architecture. Mosaic defines canonical join paths, approved relationships, and fan-out protections once in the semantic model, then applies them consistently to every query. That makes accuracy a property of the architecture, not something you have to prompt your way toward on every query.
Pilots Look Cheap, Production Isn't
AI analytics pilots are often misleading for exactly the reasons this benchmark illustrates. A pilot starts with a small schema and manageable questions, so direct querying looks both accurate and cheap. Neither advantage survives production, where schemas are larger, questions span more tables and time periods, and every query carries the full schema tax. The failures are quiet and there is no obvious sign that anything went wrong. Just a number that looks right but isn't, and a token bill that climbs with every table and every user you add.
Mosaic addresses both cost and accuracy before the query runs. The business logic is already defined and the join paths are already encoded so the model never has to receive the schema or reconstruct rules on its own. For organizations moving AI analytics into production, the real question is whether the architecture can scale without token costs rising every time the schema grows or another user starts asking questions.
If your team is evaluating AI analytics for production, we'd welcome the chance to show you how Mosaic performs against your own schema. Join the world's largest deployed semantic layer and give AI the business context it needs to deliver trusted answers at scale.
For the full methodology, detailed results, and cost analysis discussed in this blog, read Strategy's AI Token Cost and Accuracy Benchmark.
Frequently Asked Questions
Q: What was Strategy's AI token cost and accuracy benchmark? What did it reveal?
A: Strategy ran 17 analytics questions against a real-world, 28-table insurance data model to measure how AI analytics performs with and without a semantic layer. The benchmark found that direct text-to-SQL querying required roughly 4,428 tokens of schema context per query and answered 15 of 17 questions correctly (88.2%). Mosaic reduced schema context by 53% and answered all 17 questions correctly (100%).
Q: Why do LLMs need so many tokens to query a database directly?
A: To generate SQL against enterprise data, an LLM first needs to understand the database structure including table names, columns, data types, and foreign key relationships. All of that schema detail has to be loaded into the model's context on every single query, adding token cost before any actual analysis happens. As schemas grow to 100-500+ tables, this overhead can climb to 100,000+ tokens per query.
Q: How does a semantic layer reduce AI token usage?
A: A semantic layer like Mosaic predefines business logic, join paths, and approved relationships once so the model doesn't have to receive or reconstruct the schema on every request. With Mosaic MCP, the LLM gets only the business definitions relevant to the question, cutting schema context by 53% in the benchmark. With Strategy AI Agents, the LLM is removed from query generation entirely, cutting token usage by roughly 98% compared to the PostgreSQL baseline.
Q: Why are AI analytics pilots misleading about cost and accuracy?
A: Pilots typically run on small schemas with manageable questions which makes direct querying look both accurate and cheap. Neither holds up in production where schemas are larger, questions span more tables and time periods, and every query carries the full schema tax. The failures are quiet. There's no error message, just a number that looks right but isn't and a token bill that climbs with every table and every user added.
Q: What's the difference between Mosaic MCP and Strategy AI Agents for reducing token costs?
A: Mosaic MCP still has the LLM write the SQL but from compressed business context instead of the raw schema, reducing per-query cost by about 37% in the benchmark. Strategy AI Agents go further by removing the LLM from query generation altogether. Mosaic looks up the relevant business definitions, generates the SQL, executes the query, and returns the result with no schema context reaching the LLM at all. That architecture is what drives the roughly 98% token reduction.
See how Mosaic performs against your own schema
Explore the complete methodology and results comparing Strategy Mosaic with direct text-to-SQL across 17 analytics questions, including query accuracy, failure modes, and token cost reduction.

.png&w=750&q=75)

_(2).png&w=750&q=75)



.png&w=750&q=75)