
The Semantic Layer Is Back. Here's What We're Doing About It.
There's a new wave of enthusiasm around semantic layers, and for once, I think the hype is justified.
Since the very early days of analytics, self-service has been the holy grail, yet it's still largely unsolved. We built increasingly powerful tools. We democratized SQL. We made dashboards beautiful and fast. And still, most business users couldn't reliably get the answers they needed without pinging an analyst.
Trust is a very fragile thing, and the sheer availability and eagerness of AI exposes the need for context, structure and guardrails in order to make the user/agent combo succeed.
This is why semantic layers are back. Not as a vendor feature. As a necessity.
What a semantic layer actually is
The restaurant/kitchen analogy for data warehousing was first articulated by Ralph Kimball (back in 2004!), and I think it's worth revisiting because it really helps structure our thinking here. Thanks, Ralph.
Your data warehouse is a kitchen. Your table / columns are ingredients. Your BI tool is the dining room.
Now, there are two ways to feed people from that kitchen:
The buffet. Customers walk up, see what's available, and assemble their own plates. Huge flexibility. Scales to hundreds of diners. Works great if people know what they want and how to combine ingredients without food poisoning. This is essentially what the modern data stack has been: powerful, fast, self-serve — if you know SQL and understand the data.
The menu. A curated list of dishes. The kitchen has figured out what combinations work, what's safe, what's consistent. You order the Caesar salad, you get the Caesar salad. Every time. This is what a semantic layer provides: a contract between data producers and data consumers.
For the past decade, we've been running a buffet. And honestly? Buffets are great for feeding data-hungry organizations at scale. People can build exactly what they want. Data teams can move fast, iterate, ship new ingredients without redesigning the whole menu.
But buffets have problems. Dishes go stale. Food poisoning happens. You have to actually get up and survey what's available — often multiple trips before you figure out what's actually good. And without guidance, people make questionable combinations: kids will absolutely put jello and lasagna on the same plate. (Honestly, some data teams aren't even running a good buffet — they're stuck in a competition cooking show, pulling random ingredients and hoping for the best. Results often look more like Nailed It than Top Chef.)
The buffet assumes a decent understanding of what's served. It assumes the person holding the plate knows what tartare is and what it combines nicely with.
Both users and AI agents might not have that expertise. Menus prevent a lot of undesirable combinations and do provide a curated experience.
Why menus failed the first time
Semantic layers aren't new. Business Objects had one. MicroStrategy had one. SSAS had one. In many ways, the previous generation of BI tools was more sophisticated about semantics than what modern data practitioners have been exposed to.
So why did we abandon them?
The data team as bottleneck. Maintaining a semantic layer meant someone had to curate every metric, every dimension, every join path. Menu contention was constant. It takes a long time for a restaurant to perfect a dish to the point where it belongs on the menu — and in data, the appetite for new dishes never stops.
The waiter getting in the way. In a traditional restaurant, the waiter helps you navigate the menu. In analytics, that was the analyst. But when you have hundreds of customers wanting slightly different things, all at once, the waiter becomes a chokepoint, not a help.
The data horizon problem. Here's the thing about data that makes it fundamentally different from restaurants: most of the interesting questions live at the frontier. People aren't just ordering the Caesar salad — they're foodies trying to discover insights that have never been served before. They want to push the horizon, explore new territory, find something new. A curated menu, by definition, only covers known territory. In data, most of the appetite is for discovery and experimentation, not repetition.
Tool lock-in. The semantic layer lived inside the BI tool. If you wanted to use a different visualization tool, or run a one-off query in a notebook, you were back to raw SQL. The semantics weren't portable.
Front-of-house vs. back-of-house drift. The data warehouse evolved. The semantic layer didn't keep up. Soon the menu was describing dishes the kitchen no longer made.
The menus didn't fail because the idea was wrong. They failed because they were owned by the wrong layer of the stack — locked inside BI tools, tightly coupled to vendors, and moving at the wrong velocity.
The shift to transform-layer semantics
So what happened? Data teams moved semantics into the transform layer.
dbt models. SQL views. Datasets. The logic that used to live in BI tools migrated upstream, into version-controlled, testable, CI/CD-driven code owned by data engineers and analytics engineers.
This was a huge win. Transformations became:
- Versioned and auditable
- Testable
- Owned by the people who understood the data
- Portable across tools
At Preset, we leaned hard into this with our dataset-centric approach. The dataset is a semantic interface. It defines what's queryable, what joins are valid, what metrics are pre-calculated. It scales because it lives in code, not in a GUI maintained by one architect.
But here's the thing: SQL models are not a semantic layer. They're cleaner buffet tables. They still require the consumer — human or AI — to know what to do with them.
Why semantic layers are back now
Three things changed:
AI agents are here, and they benefit enormously from structure. Humans tolerate ambiguity. If a column is named rev_ttm_adj_v2, an experienced analyst can probably figure out what it means. An LLM cannot. It needs explicit definitions, valid dimension combinations, and guardrails. It needs a menu — when one is available.
The warehouse vendors want to enable AI and deliver on guardrails and trust. Snowflake has a semantic layer. Databricks has one. They're not doing this for fun — they're responding to real customer demand for governed, trustworthy AI-ready data access.
A new generation of semantic tools emerged. dbt's semantic layer. Cube. Malloy. MetricFlow. Open standards like SDF. Projects like DataJunction. The ecosystem is exploding.
The common thread: semantics are moving outside the BI tool, into an open, composable, standards-based layer that can serve multiple consumers — dashboards, notebooks, APIs, and agents.
Building AI trust is probably the greatest challenge of this era. If you want to catch the AI wave early and well, you need mechanisms for people to trust their agents. Semantic layers are a big part of that answer.
The chicken-and-egg problem
Here's the issue: there's a logjam.
Customers want to implement a semantic layer. They see the value, especially with AI on the horizon. But they're stuck:
- Which one do I pick?
- Does that lock me into a database engine?
- Will my many BI and data-consumption tools support it?
- Can I manage it as code, atomically with my transform layer?
Meanwhile, BI tool vendors are waiting:
- Which semantic layer will win?
- Should we build our own?
- Let's see what customers actually adopt.
So nobody moves. Customers don't invest because tools don't support the options. Tools don't support because customers haven't committed. The market is frozen.
This is a classic coordination failure. And it's holding back real value.
Our bet: break the logjam
At Preset, we've decided to stop waiting.
Our position:
- Semantic layers are getting renewed interest for the right reasons. AI agents, governance, trust, consistency — these aren't buzzwords. They're real problems that semantic layers solve.
- Customers want to invest, but they're blocked. The tooling support isn't there. The standards haven't converged. The path forward isn't clear.
- Convergence is uncertain, but value is obvious. Will everyone standardize on one semantic layer? Maybe. Maybe not. Does it matter? Not really — what matters is that people can start getting value now.
So here's what we're doing: we're building Superset with deep support for creating and consuming any type of menu.
We're rolling out deep support for Snowflake, dbt, and Cube as a first wave. But as we build out our compatibility foundation, stay tuned for announcements on — hopefully — all the emerging and relevant semantic layers. And with tools like sup!, these semantic layers become even more powerful for automation and AI workflows.
Why? Because waiting helps no one. The semantic layer space needs demand signals. Customers need to be able to experiment and invest without fear of dead ends. If Superset supports all of them, customers can move. And when customers move, the market unfreezes.
What stays the same
Let me be clear about what we're not changing.
Superset is still a buffet. We still believe in dataset-centric visualization. We still think SQL matters. We still think self-service means letting people build what they need, not just consume what's been pre-built.
The buffet isn't going away. We're adding deep support for menus.
Want to create a Michelin-star experience for the finance team with fully governed metrics? Go for it. Want to stay buffet-first for the data science team so they can explore freely? Absolutely. The point is: you get to decide, per team, per domain, per use case.
This reflects something I've thought a lot about: the difference between data engineering and software engineering when it comes to rigor. In software, you generally want consistent, high standards everywhere. Ship tested code. Follow the patterns. Maintain quality uniformly.
In data, that's not realistic — and arguably not even desirable. You'll have different levels of maturity across different subject areas. Your core revenue metrics? Battle-tested, governed, menu-ready. That new product you just launched? Still in discovery mode. Operational reporting? Maybe somewhere in between.
If you force the same level of rigor everywhere, you move too slow to explore new territory. Curation is great for mature areas. It's a burden for the frontier. The right architecture lets you run the full spectrum.
Where this is going
I don't know which semantic layer standard will win. I don't know if there will be one winner or five. I don't think it matters as much as people think.
What I do know:
- AI agents are here, and they benefit enormously from structure when it's available
- Self-service still matters, but it now includes agents
- The semantic layer belongs outside the BI tool
- Open standards will beat proprietary lock-in
- The market is ready to move — it just needs permission
We're giving it permission.
If you're evaluating semantic layers, stop waiting for Superset to pick one. We're supporting all of them. Go invest in the one that fits your stack, your team, and your use cases. We'll meet you there.
We're rolling out semantic layer integrations starting with Snowflake, dbt, and Cube. More coming soon — stay tuned for announcements.