Extract MWAA Metadata
To extract MWAA Metadata we need to run the ingestion from MWAA, since the underlying database lives in a private network.
To learn how to run connectors from MWAA, you can take a look at this doc. In this guide, we'll explain how to configure the MWAA ingestion in the 3 supported approaches:
- Install the openmetadata-ingestion package as a requirement in the Airflow environment. We will then run the process using a
PythonOperator
- Configure an ECS cluster and run the ingestion as an ECS Operator.
- Install a plugin and run the ingestion with the
PythonVirtualenvOperator
.
1. Extracting MWAA Metadata with the PythonOperator
As the ingestion process will be happening locally in MWAA, we can prepare a DAG with the following YAML configuration:
2. Extracting MWAA Metadata with the ECS Operator
After setting up the ECS Cluster, you'll need first to check the MWAA database connection
Getting MWAA database connection
To extract MWAA information we will need to take a couple of points in consideration:
- How to get the underlying database connection info, and
- How to make sure we can reach such database.
The happy path would be going to the Airflow UI > Admin > Configurations
and finding the sql_alchemy_conn
parameter.
However, MWAA is not providing this information. Instead, we need to create a DAG to get the connection details once. The DAG can be deleted afterwards. We want to use a Python Operator that will retrieve the Airflow's Session data:
After running the DAG, we can store the connection details and remove the dag file from S3.
Note that trying to log the conf.get("core", "sql_alchemy_conn", fallback=None)
details might either result in:
- An empty string, depending on the Airflow version: If that's the case, you can use update the line to be
conf.get("database", "sql_alchemy_conn", fallback=None)
. - The password masked in
****
. If that's the case, you can usesqlalchemy_conn = list(conf.get("core", "sql_alchemy_conn", fallback=None))
, which will return the results separated by commas.
Preparing the metadata extraction
Then, prepare the YAML config with the information you retrieved above. For example:
2. Extracting MWAA Metadata with the Python Virtualenv Operator
This will be similar as the first step, where you just need the simple Backend
connection YAML: