Apache Iceberg Support in OpenMetadata
Apache Iceberg is an open table format that makes it easier to store and query large amounts of data in a data lake. If you’re running analytics at any meaningful scale, Iceberg is likely somewhere in your stack — whether you placed it there intentionally or inherited it through a platform like Snowflake or Databricks. OpenMetadata’s approach to Iceberg is worth explaining, because the design decision isn’t intuitive at first glance: there’s no standalone Iceberg connector. Instead, Iceberg tables are surfaced through the native query engine connectors you’re already using. A companion video walking through this design and a live demo is available here:Background: Early Iceberg Catalog Support
In the early days of Iceberg adoption, teams needed special tools just to read Iceberg tables. Here’s how the story evolved:- Early access via query engines — Tools like Trino were among the first to let teams query Iceberg data without working directly with the raw files.
- Cloud catalogs joined in — AWS Glue became a popular way to manage Iceberg tables for teams on AWS.
- A standard emerged — The Iceberg REST Catalog API became the common way for engines to connect to Iceberg, and platforms like Athena, Dremio, Snowflake, and BigQuery all added their own support.
The Design Shift: Use the Compute Engine, Not the Format
The fix came from a simple observation: nobody works directly with raw Iceberg data. You’re always going through a query engine — Snowflake, Trino, Databricks, Athena, ClickHouse, Doris, StarRocks, or another platform — that handles all the complexity of reading Iceberg tables for you. Those engines already know how to:- Connect to your Iceberg catalog
- Read table and column details
- Filter and scan data efficiently
Iceberg type in the catalog.
| Connector | Iceberg Tables Ingested | Auto-classified as Iceberg |
|---|---|---|
| Snowflake | ✅ | ✅ |
| Trino | ✅ | ✅ |
| BigQuery | ✅ | ✅ |
| Athena | ✅ | ✅ |
| Glue | ✅ | ✅ |
| Presto | ✅ | ✅ |
| StarRocks | ✅ | ✅ |
| Doris | ✅ | ✅ |
| Databricks | ✅ | ✅ |
| ClickHouse | ✅ | Not yet supported |
What You See After Ingestion
Once your query engine connector is configured and ingestion has run, Iceberg tables appear on the OpenMetadata Explore page alongside all other data assets for that connection. Tables are tagged with their table type, so it’s easy to spot which ones are Iceberg-backed. From there, every standard OpenMetadata feature is available:- Data observability and freshness monitoring
- Data profiling delegated to the query engine
- Data quality tests running against live Iceberg data
- Lineage tracking across schemas and downstream assets
- Usage statistics from query logs
- Schema-level ER diagrams with manually defined relationships
What This Means in Practice
If you have Iceberg tables in your environment and you’re already using one of OpenMetadata’s connectors, you likely have Iceberg support without any additional configuration. Just:- Connect your query engine using the standard connector setup.
- Run metadata ingestion.
- Find your Iceberg tables in Explore, ready to use.
Next Steps
- Connect your query engine to start ingesting Iceberg tables
- Set up data quality tests on your Iceberg-backed tables
- Configure lineage workflows across your data lake
- Try the Product Sandbox with demo data