
Apache Superset MCP Service: A Technical Deep Dive
This is the technical reference for Apache Superset's MCP service. Bookmark it. Come back when you're integrating, deploying, or extending.
If you want the narrative first — what MCP is, why we built it, and what it unlocks for analytics — start with Apache Superset MCP Service: A Library-First Architecture with FastMCP.
Four design decisions shape everything that follows:
Library-first architecture — The MCP service imports Superset as a Python library rather than proxying its web API. Independent scaling, clean async support, and unit tests without a full web context.
Preview-first workflow — generate_chart and update_chart default to non-destructive previews. LLM agents explore and iterate freely without polluting the database. Persistence is always explicit.
Zero privilege escalation — The MCP service cannot grant permissions that don't exist in Superset. Every tool call flows through the same DAOs and Security Manager as the web UI — RBAC, RLS, and column restrictions apply identically.
Token budget management — Responses are bounded at 25K tokens by default. The middleware warns at 80% capacity and blocks at the limit, keeping LLM context windows from being overwhelmed.
Everything else in this post is detail. If you have any questions, please reach out.
Framework: FastMCP 3.1+ · Transport: Streamable HTTP · Protocol: JSON-RPC 2.0 · Tools: 20 · Auth: JWT + RBAC · Python: 3.10+
How MCP Connects AI to Superset
The Model Context Protocol is an open standard released by Anthropic that defines how AI applications discover and invoke external tools. Think of it as a USB-C port for AI — a universal interface that lets any compliant LLM client talk to any compliant tool server.
The Superset MCP service exposes the full analytics capability — dashboards, charts, datasets, SQL execution — as 20 discrete tools that LLM agents can discover, understand, and invoke through structured JSON-RPC calls.
System Architecture
The Library-First Decision
The most important architectural decision: Superset runs as an imported library, not as a Flask web app. The MCP service creates its own minimal Flask context and imports Superset's DAOs, models, and commands directly.
Original Approach
Embed MCP inside Flask via ASGI middleware.
- Tight coupling to request lifecycles
- Framework conflicts between sync and async
- Scaling required scaling the whole web app
- Testing required full web context
Current Approach
Standalone FastMCP service that imports Superset as a library.
- Independent process and scaling
- Clean async support via FastMCP
- Unit testing without web framework
- Separate deployment and monitoring
Full Architecture Diagram
Request Lifecycle
Every tool invocation follows this precise path through the system:
Complete Tool Catalog
| Tool | Domain | Purpose | Key Parameters |
|---|---|---|---|
list_charts |
Chart | List charts with filtering, search, pagination | page, page_size, filters, search, select_columns |
get_chart_info |
Chart | Get chart details by ID, UUID, or slug | identifier, form_data_key |
get_chart_data |
Chart | Retrieve chart query results as text | identifier, format, limit, use_cache |
get_chart_preview |
Chart | Generate visual PNG preview of chart | identifier, format, width, height |
generate_chart |
Chart | Create chart with optional save | dataset_id, config, save_chart, chart_name, generate_preview |
update_chart |
Chart | Modify existing chart configuration | identifier, config, chart_name, generate_preview |
update_chart_preview |
Chart | Preview changes without saving | form_data_key, dataset_id, config, generate_preview |
list_dashboards |
Dashboard | List dashboards with filtering and search | page, page_size, filters, search |
get_dashboard_info |
Dashboard | Get dashboard details and chart positions | identifier |
generate_dashboard |
Dashboard | Create new dashboard from chart IDs | dashboard_title, chart_ids |
add_chart_to_existing_dashboard |
Dashboard | Add a chart to an existing dashboard | dashboard_id, chart_id |
list_datasets |
Dataset | List datasets with metadata control | page, page_size, filters, search |
get_dataset_info |
Dataset | Get dataset schema, columns, metrics | identifier |
execute_sql |
SQL Lab | Execute SQL with RLS enforcement | database_id, sql, schema_name, limit |
open_sql_lab_with_context |
SQL Lab | Generate SQL Lab URL with pre-filled query | database_connection_id, sql, schema_name, title |
save_sql_query |
SQL Lab | Save a SQL query to Saved Queries | database_id, label, sql, description |
generate_explore_link |
Explore | Generate interactive Explore URL | dataset_id, config |
get_instance_info |
System | Instance metadata, user info, entity counts | none |
get_schema |
System | Discover filterable and sortable columns | model_type |
health_check |
System | Connectivity and version check | none |
Chart Tools — 7 tools
7 tools covering the complete chart lifecycle: discovery, retrieval, creation, modification, and preview.
The generate_chart tool implements a 4-layer validation pipeline:
Schema → Dataset → Runtime → Column Normalization
Key patterns:
get_chart_infosupports two identifier types: integer ID and UUID stringget_chart_datareturns CSV-like text optimized for LLM consumption, not raw JSONget_chart_previewuses a WebDriver pool for PNG screenshot generationgenerate_chartruns a compilation check withrow_limit=2before savingupdate_chart_previewis non-destructive — original chart remains unchanged
Dashboard Tools — 4 tools
4 tools for dashboard discovery, inspection, creation, and modification.
get_dashboard_inforeturns chart positions in the grid layout alongside chart metadatagenerate_dashboardauto-generates layout with proper grid positioningadd_chart_to_existing_dashboardauto-calculates the new chart position in the grid
Dataset Tools — 2 tools
2 tools for dataset discovery and schema inspection.
list_datasetsincludes cache control flags:use_cache,refresh_metadataget_dataset_inforeturns full column metadata including types, descriptions, and available metrics
SQL Lab and Explore Tools — 4 tools
4 tools bridging MCP and Superset's query engine.
execute_sqlenforces Row-Level Security automatically — the same RLS rules apply as in the web UIexecute_sqlsupports Jinja2 template parameters for dynamic queriesopen_sql_lab_with_contextgenerates URLs with pre-filled query statesave_sql_querypersists a query to Saved Queries with an optional label and descriptiongenerate_explore_linkcreates interactive Explore URLs for visual chart building
Core Framework Patterns
The MCP service is built on four reusable core classes that standardize tool behavior and eliminate boilerplate.
ModelListCore — The Workhorse
The most-used core class, powering all list_* tools. A single instantiation handles:
- Pagination: 0-based internally, 1-based in requests
- Filtering: JSON filter objects or structured filter types
- Full-text search: Across configured columns
- Column selection: Return only requested columns
- Sorting: Any sortable column, asc or desc
- RBAC enforcement: Automatic via the DAO layer
# How list_dashboards uses ModelListCore
tool = ModelListCore(
dao_class=DashboardDAO,
output_schema=DashboardInfo,
filter_type=DashboardFilter,
list_field_name="result",
search_columns=["dashboard_title", "slug", "uuid"],
default_columns=["id", "dashboard_title", "slug", "uuid"],
sortable_columns=["id", "dashboard_title", "slug", "changed_on"],
item_serializer=serialize_dashboard_object,
output_list_schema=DashboardList,
)
return tool.run_tool(page=request.page - 1, page_size=request.page_size)App Factory Pattern
The MCP service uses a factory function to create FastMCP instances:
def create_mcp_app(
name: str | None = None,
instructions: str | None = None,
auth: Any | None = None,
middleware: list[Middleware] | None = None,
config: dict | None = None,
**kwargs,
) -> FastMCP:
"""Create FastMCP instance with custom auth, middleware, etc."""
return FastMCP(**build_kwargs)
# Module-level singleton
mcp = create_mcp_app()The factory enables:
- Custom auth providers for different environments
- Middleware composition for rate limiting, size guards, logging
- Configuration inheritance from Superset's
app.config
Flask Singleton — Smart Initialization
The MCP service needs a Flask app context for database access but runs as a separate process. The singleton uses intelligent detection:
This enables the MCP service to work correctly whether started:
- Standalone: Creates its own minimal Flask app
- Embedded: Reuses the existing Superset Flask app
Authentication and Security
Authentication Flow
Authentication Modes
| Mode | Config | Use Case | Security |
|---|---|---|---|
| Development | MCP_DEV_USERNAME = "admin" |
Local development, testing | Single user, no token |
| RS256 + JWKS | MCP_JWKS_URI · MCP_JWT_ISSUER |
Production with IdP | Key rotation, standard OAuth2 |
| RS256 + Static Key | MCP_JWT_PUBLIC_KEY |
Production with fixed key pair | Asymmetric, no shared secret |
| HS256 | MCP_JWT_SECRET |
Simple deployments | Shared secret required |
Middleware Architecture
The middleware pipeline processes every request in order. Each layer can inspect, modify, or reject requests before they reach the tool function. There are 7 layers in total.
| Middleware | Purpose | Key Behavior | Default |
|---|---|---|---|
| RateLimitMiddleware | Prevent abuse | Sliding window algorithm, Redis or in-memory | Disabled |
| ResponseSizeGuardMiddleware | Protect LLM context | Estimates token count, warns at 80%, blocks at 25K | Enabled |
| FieldPermissionsMiddleware | Column-level security | Strips fields the user cannot access | Enabled |
| LoggingMiddleware | Audit trail | Logs tool calls with user, params, duration | Enabled |
| GlobalErrorHandlerMiddleware | Consistent errors | Sanitizes sensitive data from error messages | Enabled |
| PrivateToolMiddleware | Hide internal tools | Blocks tools tagged as private | Enabled |
| StructuredContentStripperMiddleware | FastMCP 3.x compat | Strips structuredContent from responses for Claude.ai bridge compatibility |
Enabled |
Error Sanitization
The GlobalErrorHandlerMiddleware automatically redacts sensitive information before it reaches the LLM client:
- Database connection strings — replaced with generic messages
- API keys and tokens — redacted from error traces
- File system paths — stripped to prevent information disclosure
- IP addresses — removed from error context
All regex patterns are bounded to prevent ReDoS attacks.
Token Budget Management
The ResponseSizeGuardMiddleware prevents overwhelming LLM context windows:
# Configuration in superset_config.py
MCP_RESPONSE_SIZE_CONFIG = {
"enabled": True,
"token_limit": 25000,
"warn_threshold_pct": 80,
"excluded_tools": ["health_check", "get_chart_preview", "generate_explore_link", "open_sql_lab_with_context"],
}When a response exceeds the limit, the middleware returns a structured error suggesting the client use pagination or column selection to reduce the response size.
Preview-First Workflow
A critical design pattern that respects the iterative, conversational nature of AI interactions.
Traditional API expects precise requirements upfront, persists every call to the database, and assumes minimal iteration. LLM conversations are exploratory — preview first, refine, then save, with no database clutter from exploration.
How it works in practice:
# Step 1: Explore (no database write)
response = generate_chart(dataset_id=42, config={...}, save_chart=False)
# Returns: explore URL + form_data_key for iteration
# Step 2: Iterate (still no database write)
response = update_chart_preview(chart_id, config_updates={...})
# See changes instantly without committing
# Step 3: Save (explicit persistence)
response = generate_chart(dataset_id=42, config={...}, save_chart=True)
# NOW the chart is persisted to the databaseConfiguration Reference
| Parameter | Default | Description | Required |
|---|---|---|---|
MCP_SERVICE_HOST |
localhost |
Host to bind the MCP server | No |
MCP_SERVICE_PORT |
5008 |
Port for the MCP server | No |
MCP_DEV_USERNAME |
none | Username for development mode auth | Yes in dev |
MCP_DEBUG |
False |
Enable debug logging | No |
MCP_AUTH_ENABLED |
False |
Enable JWT authentication | No |
MCP_JWT_ALGORITHM |
RS256 |
JWT signing algorithm | No |
MCP_JWKS_URI |
none | JWKS endpoint for key rotation | If RS256 |
MCP_JWT_PUBLIC_KEY |
none | Static RSA public key | If RS256 static |
MCP_JWT_SECRET |
none | HMAC shared secret | If HS256 |
MCP_JWT_ISSUER |
none | Expected JWT issuer claim | Production |
MCP_JWT_AUDIENCE |
none | Expected JWT audience claim | Production |
MCP_PARSE_REQUEST_ENABLED |
True |
Enable flexible JSON/object input parsing | No |
MCP_RESPONSE_SIZE_CONFIG |
25K tokens | Token limit for response size guard | No |
MCP_STORE_CONFIG |
disabled | Redis config for multi-pod sessions | Multi-pod |
MCP_CACHE_CONFIG |
disabled | Response caching with TTL | No |
Deployment Guide
Standalone Process
The simplest deployment — run alongside the Superset web server:
# Terminal 1: Superset web server
superset run -p 9001 --with-threads --reload
# Terminal 2: MCP service
superset mcp run --port 5008 --debugDocker Compose
Add an MCP service to your Docker Compose setup:
# Add to your docker-compose.yml
mcp:
image: apache/superset:latest
command: ["superset", "mcp", "run", "--host", "0.0.0.0", "--port", "5008"]
ports:
- "5008:5008"
environment:
- PYTHONPATH=/app
- SUPERSET_CONFIG_PATH=/app/superset_config.pyMulti-Pod with Redis
For high-availability deployments, Redis provides shared session state across pods:
# superset_config.py
MCP_STORE_CONFIG = {
"enabled": True,
"CACHE_REDIS_URL": "redis://redis:6379/0",
"event_store_prefix": "mcp_events_",
"event_store_max_events": 100,
"event_store_ttl": 3600,
}Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: superset-mcp
spec:
replicas: 3
template:
spec:
containers:
- name: mcp-service
image: apache/superset:latest
command: ["superset", "mcp", "run", "--host", "0.0.0.0", "--port", "5008"]
ports:
- containerPort: 5008
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 5008
initialDelaySeconds: 30Extension System
Extensions can register custom MCP tools using the plugin decorator system — new tools automatically inherit the authentication, middleware, and security model.
from superset_core.mcp.decorators import tool
@tool(name="my_extension.custom_analytics")
def custom_analytics(dataset_id: int, metric: str) -> dict:
"""Run custom analytics on a dataset.
Computes specialized metrics not available
in standard Superset charts.
"""
from superset.daos.dataset import DatasetDAO
dataset = DatasetDAO.find_by_id(dataset_id)
# Custom logic here...
return {"result": computed_value}Checklist for adding a new tool:
- Create tool file in
superset/mcp_service/{module}/tool/{name}.py - Add Apache license header
- Decorate with
@toolfromsuperset_core.mcp.decorators - Create Pydantic request/response schemas
- Use
@parse_requestfor flexible input handling - Add dataset validation for chart-related tools
- Import the tool in
app.py— tools won't register otherwise - Write unit tests in
tests/unit_tests/mcp_service/
Codebase Map
superset/mcp_service/
├── app.py # FastMCP factory + tool registration
├── server.py # Server runner (uvicorn)
├── auth.py # JWT auth + user resolution
├── mcp_core.py # Generic core classes
├── middleware.py # 7 middleware layers
├── flask_singleton.py # Smart Flask app initialization
├── mcp_config.py # Default configuration
├── jwt_verifier.py # JWT token validation
├── caching.py # Response caching
├── storage.py # Redis store management
│
├── chart/
│ ├── schemas.py # Pydantic models
│ ├── chart_utils.py # Validation helpers
│ ├── preview_utils.py # Screenshot generation
│ ├── validation/ # 4-layer validation pipeline
│ └── tool/ # 7 chart tools
│
├── dashboard/
│ ├── schemas.py
│ └── tool/ # 4 dashboard tools
│
├── dataset/
│ ├── schemas.py
│ └── tool/ # 2 dataset tools
│
├── sql_lab/
│ ├── schemas.py
│ └── tool/ # 3 SQL Lab tools
│
├── explore/
│ └── tool/ # 1 explore tool
│
├── system/
│ ├── schemas.py
│ └── tool/ # 3 system tools
│
├── common/ # Shared error schemas
└── utils/ # Schema parsing, permissions, cachingPerformance and Observability
Connection Pooling
Each MCP service process maintains a SQLAlchemy connection pool. For multi-worker deployments, total connections = workers × pool size.
# superset_config.py
SQLALCHEMY_POOL_SIZE = 5
SQLALCHEMY_MAX_OVERFLOW = 10
SQLALCHEMY_POOL_TIMEOUT = 30
SQLALCHEMY_POOL_RECYCLE = 3600Response Caching
Optional response caching reduces database load for repeated queries:
- In-memory: Default backend, single-process only
- Redis: For multi-pod deployments
- TTL: 300 seconds default, configurable per operation
Tool Search Optimization
For context-constrained LLM clients, tools can be loaded in deferred categories:
core— Always loaded: list_charts, list_dashboards, list_datasets, get_instance_info, health_checkdiscovery— Load on demand: get info, get schemadata— Load on demand: previews, chart datamutate— Load on demand: generate, update, executeexplore— Load on demand: URL generators
This provides up to 85% token savings in the initial tool discovery phase, reducing from 15–20K tokens down to 4–5K.
For the full narrative on architectural decisions, see the companion article: Apache Superset MCP Service: A Library-First Architecture with FastMCP
Running Preset? See Preset MCP: From Open Source to Enterprise — how Preset extends this architecture with multi-tenant isolation, OAuth 2.0, and built-in Chatbot.
Source code: github.com/apache/superset · MCP Protocol: modelcontextprotocol.io · FastMCP: gofastmcp.com