SUPERSET DEVELOPERS

AI in BI: the Path to Full Self-Driving Analytics

Maxime Beauchemin

Learn even more about what's coming next, as Max presents Preset's very first AI-generated podcast! You just might be surprised at how good this stuff is getting.

There’s a lot of energy right now around bringing AI into analytics and business intelligence. The promise is appealing: ask a question in natural language, get back the right chart or metric — no SQL, no drag and drop required.

And to some extent, that’s already working in controlled environments, around 80–90% of the time. But in real-world analytics and critical decision-making — where precision matters and people rely on the output — 90% right simply doesn’t cut it.

A helpful analogy here is self-driving cars. We're not at full autonomy. Right now, it’s more like lane assist: AI can help, suggest, and automate parts of the flow, but it still needs someone paying attention, ready to take over at any time. And even when full autonomy becomes possible, many people will still prefer to drive. Same goes for BI — you need someone with the right knowledge and skill to review and validate what AI comes up with.

At Preset, we’ve been building AI features like text-to-SQL on top of Apache Superset, and we've learned a lot about where this technology shines — and where it doesn’t. This post shares some of those lessons, and outlines what we think matters most when bringing AI into analytics workflows: trust, visibility, and user choice.

Two wide categories of use case

1) Where Accuracy Is Critical

Most questions in BI demand precision and accuracy. Anything involving key business metrics or dimensions — revenue, churn, ARR, user growth, retention, margin — needs to be 100% accurate. These numbers drive decisions, shape strategy, and often carry real business impact.

If the AI gets it wrong, even slightly, trust erodes fast. And once trust is broken, users won’t keep engaging — they’ll go back to their analysts, spreadsheets, or whatever previous method or people they trust.

That’s why these use cases require a human in the loop — someone who knows the data model, understands the business, and can validate what the AI is doing.

But here’s the key: if AI is going to be supervised, it needs to work in the same environment where that human operates. You can’t expect analysts to evaluate AI output in an unfamiliar interface, or worse, a chat-like black box that hides the details.

The validation needs to happen in context — in the actual BI UI the supervisor is familiar with. That means:

  • The AI’s output should land in the same interface users already use to explore and verify data
  • It should expose what dataset, metrics, dimensions, filters, or logic it applied
  • It should make it easy to intervene, adjust, and re-run with minimal friction

It’s like assisted driving: if the human needs to take over, the controls need to be right there. Not hidden. Not in another system. Seamless transition is key. In practice, the best AI workflows in BI would:

  • Show their reasoning and surface all relevant metadata
  • Let the user jump in midstream and take control
  • Support iterative back-and-forth without forcing users to start over

Getting to “trustable enough” doesn’t just depend on model quality — it depends on how well the system supports the humans supervising it.

2) Where Suggestions Are Useful

Not every question in BI needs a perfect answer. In fact, there’s a whole category of use cases where AI doesn’t need to be right — it just needs to be helpful. Think along the lines of:

  • Figuring out what are the right questions to ask
  • Generating a few “suggested charts” from a simple result set
  • Suggesting a way to visualize a new metric
  • Writing a short description of a given chart or dashboard
  • Searching dashboards, metrics or datasets with a set of keywords

In these moments, users aren’t expecting a precise, board-ready number. They’re looking for inspiration, speed, and momentum. AI is great at this. It can surface patterns, highlight outliers, or summarize results in seconds — and even if it misses something, that’s okay. These are creative tasks, not mission-critical ones.

We’ve seen this play out in our own research at Preset. Features like “suggested charts” or chart descriptions don’t need 100% accuracy to be valuable. If they save a few clicks, help a user notice something interesting, or get them closer to the answer — that’s a win. These are also the kinds of features that tend to resonate more with casual users, who may not want to build everything from scratch but still want to explore the data.

And because the cost of a wrong suggestion is low, trust builds more easily. Users are more forgiving here — they’re not auditing the AI, they’re just using it as a shortcut.

In short:

  • These use cases don’t require deep metadata or perfect context
  • A “mostly right” answer is often good enough
  • The feedback loop is fast — users can quickly accept, ignore, or refine the suggestion

This is where AI can really shine in BI today — as a fast, low-friction assistant for discovery and exploration.

The Path to FSD (Full Self-Driving) BI

It’s tempting to imagine a future where AI handles the entire analytics workflow end-to-end — from question to insight, no human in the loop. That’s the Full Self-Driving (FSD) version of BI.

But just like in cars, getting to FSD in BI requires more than just a better model. The road is long, and the system needs to handle a wide range of conditions: outdated data, unclear logic, missing context, shifting definitions, and constantly evolving questions.

So what would it actually take to trust AI with truly autonomous BI?

  1. Richer Context and Metadata
    FSD requires context — and today, that context is often scattered, missing, or hard to retrieve. The full picture might include:

    • Basic metadata — tables, columns, types, descriptions, sample data
    • Extended metadata — lineage, ETL logic, column-level stats, freshness, owners
    • Usage patterns — how often datasets are used, in which dashboards, by whom
    • Soft knowledge — Notion pages, Slack threads, tribal context that never made it into a schema

    A model can’t answer well without knowing the definitions behind the data. That’s where semantic layers and metadata curation help — but stitching it all together is still a heavy lift. This knowledge lives across systems, docs, code, and human memory. That makes context retrieval — at prompt time or fine-tuning time — a major challenge.

  2. Seamless Human-AI Handoff
    In assisted-driving systems, the handoff between AI and human has to be instant and intuitive. Same in BI. Users need to be able to:

    • See what the AI actually did — which dataset, filters, and logic were used
    • Take over easily — edit the SQL, adjust the filters, re-run the query
    • Re-engage AI without losing context or progress

    If taking control means starting over, users won’t trust the system. The goal isn’t to avoid human intervention — it’s to make it feel natural.

  3. Stronger Feedback Loops
    AI doesn’t improve in a vacuum — it needs feedback. Some of that feedback can be explicit:

    • Thumbs up/down on results
    • Live metadata enrichment and annotations
    • Corrections to SQL or chart config
    • Prompts marked as “worked well” or “missed the mark”

    But much of it will be behavioral:

    • Did the user add the chart to a dashboard?
    • Did they edit it further or discard it?
    • Did they revisit the result later?

    Capturing and using this feedback — even passively — is key to moving from lane-assist to something closer to autonomy.

  4. Confidence Gating
    Today’s AI often answers confidently, even when it’s unsure. In BI, that’s dangerous. Sometimes, “I don’t know” is the right answer.

    We’d rather the system flag uncertainty — or offer clarifying questions — than risk quietly delivering a wrong number. It’s better to momentarily hand back control than to steer into a wall. Knowing when not to answer is a feature, not a flaw.

Together, these are some of the core building blocks on the path to FSD in BI. Not just smarter models — but better context, tighter feedback loops, and interfaces that support collaboration between humans and AI.

Lessons from Building Text-to-SQL at Preset

At Preset, we’ve been experimenting with and shipping text-to-SQL features built on top of Apache Superset’s SQL Lab. It’s one of the most exciting — and also one of the most complex — areas of AI in BI. Here’s what we’ve learned so far.

We’re not in the age of custom training (yet)

Training per-customer models sounds great in theory, but in practice it’s still out of reach. The complexity and cost of fine-tuning at the tenant level is high, and retraining every time a new table or field is added just isn’t feasible. Even if the compute cost dropped, latency and deployment logistics would still be blockers. Today, segmentation is a must, but we’re stuck with trade-offs around scale, speed, and quality.

RAG helps — until it doesn’t.

We rely heavily on retrieval-augmented generation (RAG), and for the most part, it works well. But it’s far from perfect.

Finding the right datasets to use as context for a given prompt is a hard problem — especially in large environments with lots of tables and inconsistent metadata. If the retriever surfaces the wrong dataset, even a great model will give a bad answer. And if you feed in too much context, the LLM can get confused — generating hallucinations, mixing sources, or picking the wrong columns.

More context isn’t always better. Precision in retrieval matters more than scale.

Privacy concerns are real — and growing

Some customers are fine with their metadata flowing into a hosted LLM. Others aren’t. BYOLLM ("bring your own LLM") is becoming a must-have for many teams, and we’re starting to think about how to support that. This adds complexity — now we’re dealing with not just prompt quality, but model routing, local inference, and more nuanced permissioning.

Magic moments exist — but trust is fragile

When it works, it really works. Seeing a natural language prompt instantly turn into valid SQL is genuinely impressive. But it only takes a couple wrong queries to lose a user’s trust entirely.

That’s the paradox: the people who can validate the SQL are the ones who need the least help. And the ones who most need help — non-technical users — are the ones who can’t catch when the AI gets it wrong.

So we end up serving the users who were already closer to self-serve, and falling short of the real goal: making analytics truly accessible to everyone.

Text-to-SQL isn’t a solved problem — but it’s not a dead end either. We’re learning where it fits, what support structures matter, and where the real ceiling is today. The next steps aren’t just about better prompts or smarter models — they’re about trust, context, and knowing when to ask for help.

Real-world data warehouses are messy

This is something we saw over and over again. In controlled environments — like academic benchmarks or internal testing datasets — everything is labeled cleanly, schemas are coherent, and structure makes sense.

But in real customer environments? Tables are massive, undocumented, and poorly named. Column names are cryptic. Some datasets are technically available but not actually trusted. Others are high quality but hard to discover.

This isn’t a minor edge case — it’s the norm. Even expert data scientists take time to orient themselves in these environments. Expecting an LLM to do better, without context, is unrealistic.

We used a homegrown toolkit called Promptimize to iterate and test prompts using Spider (a well-known benchmark dataset), and while that helped us tune some behaviors, Spider isn’t the real world. It’s neat, scoped, and predictable. Production data environments rarely are.

The lesson: No amount of clever prompting can overcome messy, ambiguous data structures. AI needs help — from metadata, from humans, from design.

Designing for Copilots

There’s a lot of talk in the industry about full automation — dashboards that build themselves, natural language questions that just “work,” insights that surface on their own. But based on everything we've seen, the real opportunity right now isn’t full autonomy — it’s collaboration.

The best version of AI in BI - in this current phase - is a copilot; something that helps you move faster, gives you ideas, saves you time — but still keeps you in control.

Different users, different needs

Analysts want control. They’re comfortable in SQL, they know the datasets, and they want to tweak and optimize. For them, AI can act as a jumpstart — turning a vague question into a decent first draft, or exploring options they hadn’t considered.

Stakeholders, on the other hand, often want speed and clarity. They may not know where the data lives or how to structure a query — but they do know what business question they’re trying to answer. For them, AI should reduce friction without hiding too much. Show the path taken, let them trace the steps, and give them confidence in what they’re seeing.

A one-size-fits-all AI experience won’t work. The system needs to adapt based on who’s using it, what they’re trying to do, and how confident they are in the results.

The best copilots are transparent

A good AI copilot doesn’t pretend to be invisible. It shows its work. It tells you what dataset it used, what filters were applied, what logic it followed. It gives you a chance to catch issues before they become problems. Even simple things help:

  • Showing the generated SQL
  • Highlighting assumptions or guesses
  • Letting the user make small corrections without starting over

It’s not about replacing trust with automation — it’s about earning trust through visibility.

Let people opt in — and out

Sometimes, users want help. Sometimes they don’t. The ideal experience lets people:

  • Ask for suggestions when they’re stuck
  • Take control when they know what they want
  • Seamlessly switch between manual and assisted modes

If AI tries to take over too much, too soon, it backfires. But if it’s there when needed — with just the right amount of context and confidence — it becomes a real asset.

AI won’t replace BI users. But it can absolutely support them — if we design the experience to empower, not override.

Beyond RAG: Unlocking BI through MCP

Over the past few years, everyone has retrofitted their apps with LLMs using prompt engineering and retrieval-augmented generation (RAG). And it works well—especially when the required context happens to be contained neatly inside a single app.

But reality is messier. Your data, insights, and context don't just live in one place. You might have metrics and dashboards in Superset, pipeline details in dbt Cloud, team notes in Notion, and project tasks in Jira. This scattered context leaves users stuck copying and pasting between a chatbot and their tools.

That's why MCP (Model Context Protocol)—a new framework recently announced by Anthropic—matters. MCP flips the script: instead of your app calling the LLM, the LLM calls your app—it's essentially an API built specifically for LLMs. It gives LLMs structured, typed, and safe ways to fetch information, perform actions, and interact on behalf of the user. Importantly, the LLM acts within boundaries defined by user permissions, OAuth scopes, and company policy, ensuring it never exceeds the user's own access level.

Semantics Beyond the Semantic Layer

Semantic layers promise a lot: consistent metrics, unified definitions, shared business logic — all in service of one big goal: enabling true self-serve analytics. The idea is that business users can explore data safely by picking from a curated set of metrics and dimensions, without having to understand schemas, joins, or SQL. When this works, it’s a huge win: faster answers, no middle-man, and more confident decision-making.

But in practice, that ideal is still rare — and the coverage is limited.

As I noted in The Case for Dataset-Centric Visualisation, there are a few reasons why semantic layers have struggled to fully deliver on their promises:

  • They’re heavy upfront. You often need to define everything before you can use anything. That slows teams down.
  • They’re hard to maintain. As datasets evolve, the semantic layer often drifts out of sync.
  • They assume alignment. A shared “source of truth” only works if everyone agrees on the truth — which isn’t always the case.
  • They don’t cover enough. Even in mature orgs, the semantic layer often represents just a small slice of the full data landscape.
  • There’s no universal standard. No open-source, widely adopted semantic layer spans the modern data stack today — most are proprietary and lead to vendor lock-in.

A helpful way to think about this: the semantic layer is like the menu at a restaurant.

It’s curated, clear, and easy to use. You sit comfortably in the front room and order à la carte, confident that the kitchen will deliver something consistent and well-prepared. But that menu only scratches the surface of what the kitchen can actually make.

In reality, there’s a fully stocked kitchen in the back: a warehouse full of ingredients (with varying levels of freshness…), tools, appliances, and skilled cooks. Some ingredients are documented, some aren’t. Some dishes are repeatable, others are one-offs. The point is: the kitchen can do way more than the menu suggests.

And that’s where most organizations live today — in and around the kitchen. Even for companies that have invested heavily in a semantic layer, it often covers just the most common use cases. There’s still a massive long tail of ad hoc analysis, experimental data, and questions that don’t fit neatly into predefined metrics.

Some orgs embrace the culinary arts of data with different operational models: a data buffet with lots of pre-made options, a counter deli where you can easily grab & go, a self-serve cafeteria where users can mix and match from semi-curated ingredients, or even micro-kitchens embedded in teams, where local experts serve up custom insights on demand. These models reflect the reality that not everyone wants — or needs — fine dining. The goal is to meet users where they are, not force them into a single rigid experience.

If we force users — or AIs — to stick strictly to what’s on the menu, we’re leaving a lot of value on the table. That’s why AI needs to operate both on-menu and off-menu. It should understand the curated layer when it exists, but also be capable of venturing deeper into the warehouse when needed — responsibly, transparently, and with human oversight.

Even better: AI can help refine the menu over time. It can surface commonly asked questions, suggest new metrics, flag inconsistencies, propose joins, or identify ambiguous field names. In that way, AI doesn’t just use semantics — it helps build and maintain them.

So yes, semantics matter. But they don’t have to live inside a rigid abstraction layer. What matters more is building systems that support semantic understanding across the entire spectrum — structured or not — and that give users (and models) the tools to navigate it all, menu and kitchen alike.

Side note: If you're curious about my thoughts on what a modern semantic layer could look like, I wrote up a design exercise a while back — you can find it here.

Conclusion

We’re still early in the journey of AI in BI — somewhere between cruise control and lane assist. The potential is real, and in many areas, already delivering value. But the road to Full Self-Driving BI is longer and bumpier than it may seem from the outside — and if it follows the same playbook as the car industry, expect it to be “just around the corner” for many years to come.

We’ve learned a lot building text-to-SQL and other AI-powered features at Preset. Most of the challenges aren’t about model quality — they’re about trust, context, and the messiness of real-world data. The systems that succeed won’t be the ones that try to remove humans from the loop. They’ll be the ones that support different levels of confidence, let users step in (or out) easily, and show their work every step of the way.

Some users want to drive. Some just want help navigating. The best tools will meet them where they are — and help them get further, faster, without taking away control.

And while a perfect semantic layer might help, what matters more is good data structure, naming, documentation, and a system that can learn, adapt, and support the evolution of the environment over time — with AI playing a role on both sides of the loop.

The future of BI isn’t hands-free. It’s collaborative.

The real milestone isn’t when AI can answer any question — it’s when people can trust the system enough to keep asking.

Preset is the easiest way to run Superset, any way you want, whether that's a free Starter plan, or a full Enterprise plan hosted wherever you need it. Reach out to learn more!

Subscribe to our blog updates

Receive a weekly digest of new blog posts

Close