connectors

No menu items for this category

Great Expectations

For Data Quality tests the open source python package Great Expectations stands out from the crowd. For those of you who don't know, Great Expectations is a shared, open standard for data quality. It helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Learn more about the product in their documentation. With this tutorial, we show you how to configure Great Expectations to integrate with OpenMetadata and ingest your test results to your table service page.

You will to have OpenMetadata version 0.12 or later.

To deploy OpenMetadata, follow the procedure to Try OpenMetadata in Docker.

Before ingesting your tests results from Great Expectations you will need to have your table metadata ingested into OpenMetadata. Follow the instruction in the Connectors section to learn more.

We have support for Python versions 3.8-3.11

You will need to install our Great Expectations submodule

Great Expectations integration leverage custom actions. To be able to execute custom actions you will need to run a checkpoint file.

In your checkpoint yaml file, you will need to add the above code block in action_list section.

Properties:

  • module_name: this is OpenMetadata submodule name
  • class_name: this is the name of the class that will be used to execute the custom action
  • config_file_path: this is the path to your config.yaml file that holds the configuration of your OpenMetadata server
  • database_service_name: [Optional] this is an optional parameter. If not specified and 2 tables have the same name in 2 different OpenMetadata services, the custom action will fail
  • database_name: [Optional] only required for RuntimeDataBatchSpec execution (e.g. run GX against a dataframe).
  • schema_name: [Optional] only required for RuntimeDataBatchSpec execution (e.g. run GX against a dataframe).
  • table_name: [Optional] only required for RuntimeDataBatchSpec execution (e.g. run GX against a dataframe).

Note

If you are using Great Expectation DataContext instance in Python to run your tests, you can use the run_checkpoint method as follows:

To ingest Great Expectations results in OpenMetadata, you will need to specify your OpenMetadata security configuration for the REST endpoint. This configuration file needs to be located inside the config_file_path referenced in step 2 and named config.yaml.

You can use environment variables in your configuration file by simply using {{ env('<MY_ENV_VAR>') }}. These will be parsed and rendered at runtime allowing you to securely create your configuration and commit it to your favorite version control tool. As we support multiple security configurations, you can check out the Enable Security section for more details on how to set the securityConfig part of the yaml file.

Great Expectations config file

With everything set up, it is now time to run your checkpoint file.

Run Great Expectations checkpoint

We currently only support a certain number of Great Expectations tests. The full list can be found in the Tests section.

If a test is not supported, there is no need to worry about the execution of your Great Expectations test. We will simply skip the tests that are not supported and continue the execution of your test suite.