Skip to main content

Metadata Ingestion: Incremental Extraction — Unity Catalog

This page describes how OpenMetadata implements incremental extraction for Unity Catalog, including the required permissions and the queries used to detect changed and deleted tables.

Approach

To implement incremental extraction for Unity Catalog, OpenMetadata queries two system tables at the start of incremental processing for each catalog, using a watermark derived from the previous successful run timestamp:
  • Changed tables: information_schema.tables is filtered by last_altered to find tables modified since the last successful run. Each changed table is then fetched individually via the Databricks SDK.
  • Deleted tables: system.access.audit is queried for deleteTable events scoped to the configured catalog. If the system.access schema is unavailable, delete detection is skipped with a warning and only changed-table detection runs.
A table that appears in both the changed and deleted sets is kept, not deleted. This covers tables that were dropped and recreated within the window, since information_schema only lists currently existing tables.

Prerequisites

Before enabling incremental extraction, ensure the ingestion user has the necessary permissions on the Unity Catalog system tables. The following grants are the minimum required for each detection mode.

Changed table detection

The ingestion user must have SELECT on the catalog’s information_schema.tables:
GRANT SELECT ON `<catalog_name>`.information_schema.tables TO `<user>`;

Deleted table detection (optional)

To detect dropped tables, the ingestion user also needs access to system.access.audit:
GRANT USE CATALOG ON CATALOG system TO `<user>`;
GRANT USE SCHEMA ON SCHEMA system.access TO `<user>`;
GRANT SELECT ON system.access.audit TO `<user>`;
If these grants are absent, delete detection is skipped automatically and a warning is logged.

Queries

The following queries run when OpenMetadata prepares the changed and deleted table maps for incremental processing. OpenMetadata replaces {start_timestamp} with the millisecond epoch watermark derived from the previous successful run timestamp (for example, 1718000000000).

Changed tables

OpenMetadata filters information_schema.tables by the last_altered column to identify tables that were created or modified within the ingestion window.
SELECT
    table_schema,
    table_name
FROM `{catalog}`.information_schema.tables
WHERE last_altered >= timestamp_millis({start_timestamp})

Deleted tables

OpenMetadata queries the Unity Catalog audit log for deleteTable events to identify tables that were dropped within the ingestion window.
SELECT DISTINCT request_params.full_name_arg AS table_full_name
FROM system.access.audit
WHERE service_name = 'unityCatalog'
    AND action_name = 'deleteTable'
    AND event_date >= date(timestamp_millis({start_timestamp}))
    AND event_time >= timestamp_millis({start_timestamp})
    AND substring_index(request_params.full_name_arg, '.', 1) = '{catalog}'