We analyzed every query across our customer base. Here's what we found.
Nearly half of all queries in production never touch Snowflake or Databricks. Here's what that means and what it costs companies that haven't figured this out yet.
Most ROI claims from data tooling are theoretical. Modeled numbers, analyst estimates, back-of-envelope math dressed up in a case study. We wanted to know what was actually happening across our customer base, so we looked.
We pulled query-level data from every cloud customer we have. Not a sample. Not a survey. Every query, across the entire fleet.
What we found was striking enough that we think every data leader running Snowflake or Databricks workloads should understand it.
The number: 48.9%
Across our entire customer base, Mosaic's in-memory cache deflects 48.9% of all queries. That means nearly half of all queries that would have hit Snowflake or Databricks , consuming compute, racking up costs, are instead served directly from cache.
They never touch the warehouse. The result arrives instantly. The bill doesn't move.
48.9%
fleet-wide query deflection rate
Across all cloud customers — real query data, not modeled estimates
This isn't a best-case scenario or a cherry-picked customer. It's the average across all of our cloud customers, from small analytics teams to large enterprise data organizations.
Why does this happen?
Most analytics workloads quietly split into two types of queries:
- Fresh queries — complex computations, new date ranges, ad hoc exploration, genuine “what if?” work. These need the warehouse.
- Repeat queries — the same metric, the same dashboard tile, the same KPI that 10 different people have loaded today. These don’t.
Think about the “Revenue by Region” tile on your executive dashboard. Finance opens it before close. Sales opens it before pipeline reviews. Leadership opens it before the board deck. It’s the same computation, on the same grain of data, over and over again.
Without a semantic layer, every one of those repeat queries hits the warehouse anyway. Same computation. Same cost. Every time.
Mosaic sits between your BI tools, AI applications, productivity tools, and your warehouse. When a query comes in, it checks the cache first. If the result is already there and still fresh, it returns immediately: no warehouse call, no compute charge. When the data genuinely needs to be fresh, it goes to Snowflake or Databricks and does so efficiently.
The result is a data stack where Snowflake and Databricks do what they're actually great at: heavy computation on fresh, complex data. Not serving the same cached KPI for the hundredth time that day.
What 48.9% is worth
Query deflection has a direct dollar value. We segmented our customers by their Snowflake or Databricks consumption spend and calculated average savings across each tier. These are savings on the consumption and query layer specifically, not total platform spend.
$63K/yr
average annual savings — Small ($250K–$1M warehouse spend)
$609K/yr
average annual savings — Medium ($1M–$5M warehouse spend)
$4M/yr
average annual savings — Large ($5M–$25M warehouse spend)
The pattern is clear: the bigger the analytics workload, the bigger the savings. That's not a coincidence. Larger workloads have more repeat query patterns, more concurrent users hitting the same dashboards, and more accumulated inefficiency from treating every query the same way.
This is a design problem, not a spending problem
Most data teams approach warehouse cost as a procurement challenge. Negotiate better rates. Buy more reserved capacity. Optimize the biggest queries.
That's not wrong, but it misses the structural issue. The reason 48.9% of queries shouldn't be hitting the warehouse in the first place is that most data architectures treat every query identically, regardless of whether the result already exists, is still valid, and could be returned in milliseconds from cache.
A semantic layer fixes this at the architectural level. Metrics and business logic are defined once. Results are cached intelligently. The warehouse gets called when it needs to be called, not by default.
This is why the savings scale with workload size. A larger analytics operation doesn't just have higher warehouse bills. It has more repeated queries, more concurrent users, and more accumulated waste from an architecture that wasn't designed to distinguish between query types.
What this means if you're running Snowflake or Databricks
Snowflake and Databricks are exceptional platforms. The point isn't to use them less: it's to use them well.
If nearly half of your current warehouse queries are repeat patterns that could be served from cache, that compute is going toward answering questions you've already answered. That's not a great use of a powerful engine.
The question worth asking isn't "how do we cut Snowflake spend?" It's "are we using Snowflake for the things it's actually best at?"
Intelligent query routing, serving cacheable queries from a semantic layer and sending genuinely fresh, complex work to the warehouse, is how data teams that have figured this out are getting more value from their entire stack, not less.
The takeaway
48.9% deflection isn't a product metric. It's a signal about how most analytics architectures are built and what's left on the table when you don't distinguish between query types.
These are live production workloads from paying customers, not modeled scenarios.
If you're spending between $250K and $25M on Snowflake or Databricks consumption, there's a meaningful conversation worth having about how much of that spend is doing necessary work and how much of it is answering questions you've already answered.





