Airflow Troubleshooting & Advanced
This page covers installation validation, Git Sync guidance, SSL configuration, and troubleshooting for Airflow-based ingestion pipelines. For setup and configuration, see the OpenMetadata Ingestion Overview.Validating the installation
What we need to verify here is that the OpenMetadata server can reach the Airflow APIs endpoints (wherever they live: bare metal, containers, k8s pods…). One way to ensure that is to connect to the deployment hosting your OpenMetadata server and running a query against the/health endpoint. For example:
curl -XGET ${PIPELINE_SERVICE_CLIENT_ENDPOINT}/api/v1/openmetadata/health)
and allowing the environment to do the substitution for you. That’s the only way we can be sure that the setup is
correct.
More validations in the installation
If you have an existing DAG in Airflow, you can further test your setup by running the following:- There is an Airflow instance running at
localhost:8080, - There is a user
adminwith passwordadmin - There is a DAG named
example_bash_operator.
Git Sync?
One recurrent question when setting up Airflow is the possibility of using git-sync to manage the ingestion DAGs. Let’s remark the differences betweengit-sync and what we want to achieve by installing our custom API plugins:
git-syncwill use Git as the source of truth for your DAGs. Meaning, any DAG you have on Git will eventually be used and scheduled in Airflow.- With the
openmetadata-managed-apiswe are using the OpenMetadata server as the source of truth. We are enabling dynamic DAG creation from the OpenMetadata into your Airflow instance every time that you create a new Ingestion Workflow.
git-sync?
- If you have an existing Airflow instance, and you want to build and maintain your own ingestion DAGs then you can go for it. Check a DAG example here.
- If instead, you want to use the full deployment process from OpenMetadata,
git-syncwould not be the right tool, since the DAGs won’t be backed up by Git, but rather created from OpenMetadata. Note that if anything would to happen where you might lose the Airflow volumes, etc. You can just redeploy the DAGs from OpenMetadata.
SSL
If you want to learn how to set up Airflow using SSL, you can learn more here:Airflow SSL
Learn how to configure Airflow with SSL.
Troubleshooting
Ingestion Pipeline deployment issues
Airflow APIs Not Found
Validate the installation, making sure that from the OpenMetadata server you can reach the Airflow host, and the call to/health gives us the proper response:
openmetadata-ingestion client version installed in Airflow.
GetServiceException: Could not get service from type XYZ
In this case, the OpenMetadata client running in the Airflow host had issues getting the service you are trying to deploy from the API. Note that once pipelines are deployed, the auth happens via theingestion-bot. Here there are
a couple of points to validate:
- The JWT of the ingestion bot is valid. You can check services such as https://jwt.io/ to help you review if the token is expired or if there are any configuration issues.
- The
ingestion-botdoes not have the proper role. If you go to<openmetadata-server>/bots/ingestion-bot, the bot should present theIngestion bot role. You can validate the role policies as well to make sure they were not updated and the bot can indeed view and access services from the API. - Run an API call for your service to verify the issue. An example trying to get a database service would look like follows:
If, for example, you have an issue with the roles you would be getting a message similar to:
AirflowException: Dag ‘XYZ’ could not be found
If you’re seeing a similar error toClientInitializationError
The main root cause here is a version mismatch between the server and the client. Make sure that theopenmetadata-ingestion
python package you installed on the Airflow host has the same version as the OpenMetadata server. For example, to set up
OpenMetadata server 0.13.2 you will need to install openmetadata-ingestion~=0.13.2. Note that we are validating
the version as in x.y.z. Any differences after the PATCH versioning are not taken into account, as they are usually
small bugfixes on existing functionalities.
401 Unauthorized
If you get this response during aTest Connection or Deploy:
AIRFLOW_USERNAME and AIRFLOW_PASSWORD allow you to
authenticate to the instance.
CentOS / Debian - The name ‘template_blueprint’ is already registered
If you are using a CentOS / Debian system to install theopenmetadata-managed-apis you might encounter the following issue
when starting Airflow:
venv
lib64 symlink: rm lib64.