
What IBM’s acquisition of Confluent means for Apache Superset users
Yesterday, IBM announced that it will acquire Confluent — the open-source streaming data platform built around Apache Kafka — in a deal valued at roughly $11 billion.
Confluent’s infrastructure is all about “data in motion”: real-time ingestion, processing, event streaming, and the plumbing needed to power AI, analytics, and hybrid-cloud applications. Under IBM, Confluent is slated to play a key role in their hybrid-cloud + AI strategy, helping enterprises ensure clean, connected, real-time data flow across applications, clouds, APIs, and data centers.
For Superset — a flexible, open-source BI and data-visualization platform capable of connecting to nearly any SQL-compatible backend — that acquisition unlocks exciting potential for a new wave of users: enterprises adopting Confluent/Kafka for streaming will need dashboards, dashboards that update in real time, dashboards that bring together event-streams, historical data, and business logic.
We expect to see a growing community of Kafka/Confluent users who want to visualize their streaming data — and Superset (along with Preset’s managed service) is ready for them.
How Superset can integrate with Confluent / Kafka-based architectures
If you’re coming from the Kafka/Confluent world—or newly exploring it now that IBM is investing heavily in streaming data—there are several proven patterns for bringing real-time event streams into Superset dashboards. Whether you're using Confluent Platform, Confluent Cloud, or IBM Event Streams, the architectural patterns described here work identically — Superset connects to whatever SQL-friendly store you use downstream.
Direct ksqlDB / SQL-on-Kafka integration
Superset can query real-time data that has been transformed in ksqlDB, Confluent’s streaming database for defining SQL-based streams and materialized tables. While Superset does not (yet) connect directly to ksqlDB via a native SQLAlchemy dialect, you can use ksqlDB as the processing layer and then deliver its output into a SQL-friendly system that Superset can query, such as:
- ksqlDB → Kafka Connect Sink → ClickHouse, Apache Druid, Apache Pinot, Postgres, or BigQuery → Superset
- ksqlDB → Materialized Table → JDBC/ODBC-accessible store → Superset
This gives you continuously updated tables defined by streaming SQL, while Superset provides visualization, slicing, filtering, and dashboarding. Confluent’s Kafka Summit talk on Streaming Analytics with ksqlDB and Superset walks through a complete example, or you can find many other blog posts highlighting similar workflows.
Many teams are also adopting Apache Flink or managed Flink services for advanced stream processing; the integration model remains the same: Flink SQL → analytic store → Superset dashboards.
Sink into an analytic datastore + visualize
Many teams prefer to land real-time events into a fast analytic store—whether for time-series workloads, OLAP, or near-real-time dashboards. Popular choices include ClickHouse, Apache Druid, and Pinot, all of which support Kafka ingestion natively or via Kafka Connect.
Once events land in these systems, Superset connects using standard SQL and can visualize streaming metrics alongside historical or reference data. Tools like SQL Flow can orchestrate these pipelines.
Superset supports a wide range of SQL backends commonly used in streaming architectures, including ClickHouse, Druid, Pinot, Trino, BigQuery, DuckDB, PostgreSQL, MySQL, and many more.
Check out the more comprehensive list in the Superset database documentation.
Hybrid batch + real-time analytics in one dashboard
Because Superset speaks SQL to virtually any backend, it’s straightforward to combine batch-loaded data (e.g., nightly ETLs or lakehouse tables) with continuously updated streaming tables. This gives users a unified view of trends, operations, and live activity without switching BI tools.
We’ve even been exploring lightweight streaming helpers like DuckStreams for teams interested in blending event streams with DuckDB-backed analytics—reach out if this is relevant to your use case.
Observability, operations, and real-time business metrics
Common use cases include operational monitoring (latency, throughput, error rates), user behavior analytics (clickstreams, events), fraud detection, IoT/sensor metrics, and other workloads that rely on rapid event ingestion. Superset’s auto-refresh dashboards, cross-filtering, time-series charts, and alerting/reporting (via Preset) make it a natural front end for these real-time systems.
What this means for old and new users of Confluent / Kafka
For existing Confluent / Kafka users: If you’ve already built streaming pipelines — now might be a great time to try Superset (or revisit it) and build dashboards on top of your streams. You don’t need to reshape your architecture: Superset plays nicely whether you consume Kafka via ksqlDB or through a downstream analytic store.
For new users, just starting with Confluent on IBM’s stack: As Confluent becomes more prominent under IBM’s umbrella, expect many organizations to begin with streaming-first deployments. Preset + Superset offers a low-friction, open-source-friendly way to add visualization to that strategy — no vendor lock-in, open tooling, and plenty of flexibility.
For the broader IBM community: This deal signals IBM’s commitment to data streaming as a first-class component of enterprise data architecture. If you’re building AI, ML, hybrid-cloud, or real-time applications on IBM platforms (especially around zSystems, legacy modernization, or hybrid cloud), Superset can serve as the visualization layer that bridges traditional data stores, streaming data, and real-time dashboards.
A word from Preset / Superset: We’re here, and excited to learn your use cases
To all current — and future — Confluent or Kafka users within the IBM ecosystem: we at Preset are excited about this new chapter. We believe Superset offers a powerful, open, and flexible visualization layer that complements a streaming-first data architecture, with a bunch of key features:
- Dashboard auto-refresh (per-chart or global)
- Low-latency rendering with async queries
- Time-series visualizations optimized for fast-changing data
- Cross-filters to explore event dimensions
- Alerts & Reports for threshold-based notifications via Slack or Email
- Works with any SQL-speaking data store (no vendor lock-in)
If you’re exploring how to plug Kafka or Confluent into your analytics stack, or building real-time dashboards, we’d love to hear from you. Whether you’re already running Superset, or just evaluating it, reach out — let’s talk about your use-cases, challenges, and how we can help.
And if you’re planning to attend the next IBM TechXchange in Atlanta — look us up! Let’s chat about combining Confluent + Kafka + Superset for real-time insights. Or you can talk to the team today.
Final thoughts
IBM’s acquisition of Confluent marks a strong signal: streaming data is moving from “nice-to-have” to “core infrastructure,” especially in AI + hybrid-cloud contexts. For the Superset community, that means an opportunity: more streaming data, more demand for real-time BI, and more organizations needing a flexible, open-source-friendly way to visualize, explore, and monitor that data. We’re here for it — and we look forward to what you build.