Accelerating Apache Superset Dashboards With Materialized Views

Editorial Note: This post is not just the latest helping to optimize performance in Apache Superset (see our prior post) — it's also a guest blog from our friends at CelerData. Celerdata and Preset (or open-source Superset) happen to work beautifully together, and their team has been kind enough to contribute to Superset's codebase, as well as appearing on our Preset Podcast and as speakers at OSA Con. If you'd like to get involved in similar ways, please reach out!

Interactive analytics is a cornerstone of modern business intelligence (BI), enabling teams to make data-driven decisions in real time. Apache Superset is a leader in this space, providing powerful capabilities for building dynamic dashboards and exploring data. However, ensuring fast, consistent performance can be challenging as datasets grow in complexity and scale.

In this blog, we’ll explore the common hurdles users can face when accelerating dashboards in Apache Superset, and how recent advancements in materialized views can materially improve your user experience.

Today's Pre-computation Pipelines Are Not Built For Modern BI

Pre-computation pipelines have traditionally been the preferred method to accelerate OLAP queries. These pipelines minimize expensive on-the-fly operations by building denormalized and pre-aggregated tables with tools like Apache Spark. However, this approach introduces several drawbacks that make it less than ideal for modern BI workflows:

CelerData 1

Complexity for Data Consumers: Data users are forced to rewrite SQL queries or reconfigure dashboards to access pre-computed tables, adding friction to their workflow.
Heavy Engineering Overhead: Platform engineers must design and maintain pipelines before applications are built, often resulting in unused pre-computed tables and wasted resources.
Slower Development Cycles: Pre-computation pipelines significantly extend development timelines and inflate costs, delaying value delivery.

Accelerating Apache Superset Queries with StarRocks Materialized Views

The challenges with traditional pre-computation pipelines underscore a pressing need for a new approach to query acceleration—one that simplifies workflows, reduces engineering overhead, and adapts dynamically to the demands of modern BI tools like Apache Superset.

What is StarRocks?

This is where StarRocks comes in. StarRocks is a high-performance, next-generation MPP (Massively Parallel Processing) fully vectorized OLAP database designed for interactive analytics. It delivers sub-second query performance for BI dashboards, even when working with complex datasets or high-concurrency workloads.

What is StarRocks Materialized View

StarRocks' materialized views are designed to accelerate queries on demand, offering high performance across both internal tables and external sources like Apache Hive, Apache Hudi, Apache Iceberg, and Delta Lake. What sets these materialized views apart is their seamless query rewrite capability, which ensures that dashboards and workflows remain unchanged.

Using StarRocks' cost-based optimizer (CBO), query rewrite dynamically identifies and applies the most efficient materialized view during query execution. This saves users from having to manually rewrite SQL or make adjustments to leverage performance improvements. With StarRocks, performance gains are fully automated and easily integrate into your existing BI workflows.

CelerData 2

By addressing the limitations of traditional pre-computation pipelines, StarRocks Materialized View enables:

Building Dashboards Directly on Raw Data: Users do not need to depend on pre-computed tables upfront.
On-Demand Query Optimization: Adding materialized views on demand and experiencing performance improvements immediately.
Faster Development Cycles: Reduces the complexity and overhead of pipeline design, allowing teams to focus on delivering insights quickly.

Practical Solutions and Best Practices

In real-world Apache Superset scenarios, data is often pre-processed before serving dashboards. This pre-processing might involve data cleaning or denormalization to ensure the data is ready to serve dashboards. Typically, this is managed in one of two ways:

Stored as Views: Logic and transformations are stored as views within the underlying engine or database, ensuring reusability and consistency.
Defined as Virtual Datasets: Superset allows users to define virtual datasets that act as an abstraction layer for query logic, making it easy to build dashboards without directly referencing raw tables.

Both approaches are commonly used in Superset workflows, and StarRocks enables query acceleration for either option without requiring changes to the architecture or dashboards.

View-Based Rewrite

CelerData 3

For pre-processed data stored as views in the underlying database, StarRocks’ materialized views can seamlessly accelerate your dashboards:

Modular Design: Encapsulate your query logic as standard StarRocks views, ensuring your workflows remain flexible and maintainable.
Transparent Optimization: When you convert these views into StarRocks materialized views, the system's query rewrite functionality will automatically optimize queries executed on the views.
No Dashboard Changes: Dashboards referencing these views will automatically benefit from the performance improvements, even outside of the SPJG (Select-Project-Join-GroupBy) constraints.

Text-Based Rewrite with Superset Virtual Datasets

When Apache Superset generates SQL from the virtual dataset, it directly passes the SQL to the underlying engine. For virtual datasets defined in Superset, text-based query rewrite can provide equally seamless optimization:

Only Set It Up Once: Since Superset dashboards are built on top of these virtual datasets, the underlying SQL typically remains unchanged.
Efficient Integration: StarRocks can apply materialized views to accelerate queries referenced by virtual datasets without modifying Superset workflows or user logic.
No Dashboard Changes: Just like with database views, dashboards referencing virtual datasets automatically benefit from accelerated query performance, ensuring a smooth user experience.

Demo: Accelerating Dashboards with StarRocks and Apache Superset

To see these techniques in action, let’s explore a demo using CelerData Cloud BYOC (Powered by StarRocks) alongside Preset (Powered by Apache Superset).

Datasets:

TPC-H (100GB)
SSB (1TB)

Key Highlights:

On-Demand Dashboard Acceleration: Learn how to identify and resolve slow dashboard queries using materialized views.
View-Based Rewrite in Action: See how seamlessly converting views to materialized views can accelerate queries without SQL modifications, even with non-spjg queries.

Try StarRocks + Preset Today!

Interactive analytics shouldn’t be limited by slow query performance or cumbersome workflows. With StarRocks' materialized views and its integration with Apache Superset, you can unlock the full potential of your BI dashboards—without reconfiguring your architecture or rewriting SQL.
Want to get started? Join the conversation on Slack to connect with the StarRocks and Superset communities

Join StarRocks Slack
Join Apache Superset Slack
Contact Preset Sales, or try Preset for free today!

Preset Cloud

Managed Private Cloud

Preset Certified Superset

Preset Embedded Dashboards

Preset API

Business Intelligence (BI)

Internal Tooling

Customer-facing Apps

Blog

Documentation

Events

Podcast

What is Superset?

Customers

Accelerating Apache Superset Dashboards With Materialized Views

Today's Pre-computation Pipelines Are Not Built For Modern BI