connectors

No menu items for this category

Database Filter Patterns

One can configure the metadata ingestion filter for database source using four configuration fields which are Database Filter Pattern, Schema Filter Pattern, Table Filter Pattern & Use FQN For Filtering. In this document we will learn about each field in detail along with many examples.

Filters can be configured in UI while adding an ingestion pipeline through Add Metadata Ingestion page.

Database Filter Pattern Fields

Database Filter Pattern Fields

Filters can be configured in CLI in connection configuration within source.sourceConfig.config field as described below.

This flag set when you want to apply the filter on fully qualified name (e.g service_name.db_name.schema_name.table_name) instead of applying the filter to raw name of entity (e.g table_name). This Flag is useful in scenario when you have schema with same name in different databases or table with same name in different schemas and you want to filter out one of them. This will be explained further in detail in this document.

Database filter patterns to control whether or not to include database as part of metadata ingestion.

  • Include: Explicitly include databases by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all databases with names matching one or more of the supplied regular expressions. All other databases will be excluded.
  • Exclude: Explicitly exclude databases by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all databases with names matching one or more of the supplied regular expressions. All other databases will be included.

Let's say we want to ingest metadata from a snowflake instance which contains multiple databases as described above. In this example we want to ingest all databases which contains SNOWFLAKE in name, then the filter pattern applied would be .*SNOWFLAKE.* in the include field. This will result in ingestion of database SNOWFLAKE, SNOWFLAKE_SAMPLE_DATA and TEST_SNOWFLAKEDB.

Database Filter Pattern Example 1

Database Filter Pattern Example 1

In this example we want to ingest all databases which starts with SNOWFLAKE in name, then the filter pattern applied would be ^SNOWFLAKE.* in the include field. This will result in ingestion of database SNOWFLAKE & SNOWFLAKE_SAMPLE_DATA.

Database Filter Pattern Example 2

Database Filter Pattern Example 2

In this example we want to ingest all databases for which the name starts with SNOWFLAKE OR ends with DB , then the filter pattern applied would be ^SNOWFLAKE & DB$ in the include field. This will result in ingestion of database SNOWFLAKE, SNOWFLAKE_SAMPLE_DATA, TEST_SNOWFLAKEDB & DUMMY_DB.

Database Filter Pattern Example 3

Database Filter Pattern Example 3

In this example we want to ingest only the SNOWFLAKE database then the filter pattern applied would be ^SNOWFLAKE$.

Database Filter Pattern Example 4

Database Filter Pattern Example 4

Schema filter patterns are used to control whether or not to include schemas as part of metadata ingestion.

  • Include: Explicitly include schemas by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all schemas with names matching one or more of the supplied regular expressions. All other schemas will be excluded.
  • Exclude: Explicitly exclude schemas by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all schemas with names matching one or more of the supplied regular expressions. All other schemas will be included.

In this example we want to ingest all schema within any database with name PUBLIC, then the schema filter pattern applied would be ^PUBLIC$ in the include field. This will result in ingestion of schemas SNOWFLAKE.PUBLIC & SNOWFLAKE_SAMPLE_DATA.PUBLIC

Schema Filter Pattern Example 1

Schema Filter Pattern Example 1

In this example we want to ingest all schema within any database except schema with name PUBLIC available in SNOWFLAKE_SAMPLE_DATA. Notice that we have two schemas available with name PUBLIC one is available in database SNOWFLAKE_SAMPLE_DATA.PUBLIC and other is SNOWFLAKE.PUBLIC. As per the constraint of this example all the schemas including SNOWFLAKE.PUBLIC but we need to skip SNOWFLAKE_SAMPLE_DATA.PUBLIC. to do that we will need to set useFqnForFiltering flag to true by doing this the filter pattern will be applied to fully qualified name instead of raw table name. A fully qualified name(FQN) of schema is combination of service name, database name & schema name joined with .. In this example fully qualified name of the SNOWFLAKE_SAMPLE_DATA.PUBLIC schema will be Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC, so we will need to apply a exclude filter pattern ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.PUBLIC$ and set useFqnForFiltering to true.

Schema Filter Pattern Example 2

Schema Filter Pattern Example 2

In this example we want to ingest SNOWFLAKE.PUBLIC & all the schemas in SNOWFLAKE_SAMPLE_DATA that starts with TPCH_ i.e SNOWFLAKE_SAMPLE_DATA.TPCH_1, SNOWFLAKE_SAMPLE_DATA.TPCH_10 & SNOWFLAKE_SAMPLE_DATA.TPCH_100. To achieve this an include schema filter will be applied with pattern ^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$ & ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*, we need to set useFqnForFiltering as true as we want to apply filter on FQN.

Schema Filter Pattern Example 3

Schema Filter Pattern Example 3

Table filter patterns are used to control whether or not to include tables as part of metadata ingestion.

  • Include: Explicitly include tables by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded.
  • Exclude: Explicitly exclude tables by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included.

In this example we want to ingest table with name CUSTOMER within any schema and database. In this case we need to apply include table filter pattern ^CUSTOMER$. This will result in ingestion of tables Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER, Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.INFORMATION.CUSTOMER & Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER

Table Filter Pattern Example 1

Table Filter Pattern Example 1

In this example we want to ingest table with name CUSTOMER within PUBLIC schema of any database. In this case we need to apply include table filter pattern .*\.PUBLIC\.CUSTOMER$ this will also require to set the useFqnForFiltering flag as true as we want to apply filter on FQN. This will result in ingestion of tables Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER & Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER

Table Filter Pattern Example 2

Table Filter Pattern Example 2