connectors

No menu items for this category

Airflow Lineage Operator

Another approach to extract Airflow metadata only for the DAGs you want is to use the OpenMetadataLineageOperator.

When the task executes, it will ingest:

  • The Pipeline Service if it does not exist
  • The DAG as a Pipeline if it does not exist.
  • The status of the tasks. We recommend running this Operator as the last step if you want up-to-date statuses.
  • The lineage from inlets and outlets.

The Lineage Operator can be directly installed to the Airflow instances as part of the usual OpenMetadata Python distribution:

Where x.y.z is the version of your OpenMetadata server, e.g., 0.13.0. It is important that server and client versions match.

It requires the version 0.13.1 or higher.

An example DAG looks like follows:

In 0.13.1 we have also added an OpenMetadataHook, which can be configured from the UI to safely store the parameters to connect to OpenMetadata.

Go to the Airflow UI > Admin > Connection and create a new OpenMetadata connection as follows:

Airflow Connection

Testing the connection will validate that the server is reachable and the installed client can be instantiated properly.

Once the connection is configured, you can use it in your DAGs without creating the OpenMetadataConnection manually

If the OpenMetadata server connection needs to happen through HTTPS, update the Schema accordingly to https.

For SSL parameters we have two options:

You can add the Extra value as the following JSON to create the connection that will ignore SSL.

Otherwise, you can use the validate value and add the path to the certificate. It should be reachable locally in your Airflow instance.