connectors

No menu items for this category

Extract MWAA Metadata

To extract MWAA Metadata we need to run the ingestion from MWAA, since the underlying database lives in a private network.

To learn how to run connectors from MWAA, you can take a look at this doc. In this guide, we'll explain how to configure the MWAA ingestion in the 3 supported approaches:

  1. Install the openmetadata-ingestion package as a requirement in the Airflow environment. We will then run the process using a PythonOperator
  2. Configure an ECS cluster and run the ingestion as an ECS Operator.
  3. Install a plugin and run the ingestion with the PythonVirtualenvOperator.

As the ingestion process will be happening locally in MWAA, we can prepare a DAG with the following YAML configuration:

After setting up the ECS Cluster, you'll need first to check the MWAA database connection

To extract MWAA information we will need to take a couple of points in consideration:

  1. How to get the underlying database connection info, and
  2. How to make sure we can reach such database.

The happy path would be going to the Airflow UI > Admin > Configurations and finding the sql_alchemy_conn parameter.

However, MWAA is not providing this information. Instead, we need to create a DAG to get the connection details once. The DAG can be deleted afterwards. We want to use a Python Operator that will retrieve the Airflow's Session data:

After running the DAG, we can store the connection details and remove the dag file from S3.

Note that trying to log the conf.get("core", "sql_alchemy_conn", fallback=None) details might either result in:

  1. An empty string, depending on the Airflow version: If that's the case, you can use update the line to be conf.get("database", "sql_alchemy_conn", fallback=None).
  2. The password masked in ****. If that's the case, you can use sqlalchemy_conn = list(conf.get("core", "sql_alchemy_conn", fallback=None)), which will return the results separated by commas.

Then, prepare the YAML config with the information you retrieved above. For example:

This will be similar as the first step, where you just need the simple Backend connection YAML: