OpenMetadata
Search…
OpenMetadata & Prefect
This page provides instructions on how to install OpenMetadata and Prefect on your local machine.

Requirements (OSX and Linux)

Please ensure your host system meets the requirements listed below. Then continue to the procedure for installing OpenMetadata.
OSX and Linux
Windows

Installation Process

This documentation page will walk you through the process of configuring OpenMetadata and Prefect 2.0. It is intended as a minimal viable setup to get you started using both platforms together. Once you want to move to a production-ready deployment, check the last two sections of this tutorial.

1. Clone the prefect-openmetadata repository

First, clone the latest version of the prefect-openmetadata Prefect Collection.
Then, navigate to the directory openmetadata-docker containing the docker-compose.yml file with the minimal requirements to get started with OpenMetadata.

2. Start OpenMetadata containers

You can start the containers with OpenMetadata components using:
1
docker compose up -d
Copied!
This will create a docker network and containers with the following services:
  • openmetadata_mysql - Metadata store that serves as a persistence layer holding your metadata.
  • openmetadata_elasticsearch - Indexing service to search the metadata catalog.
  • openmetadata_server - The OpenMetadata UI and API server allowing you to discover insights and interact with your metadata.
Wait a couple of minutes until the setup is finished.
To check the status of all services, you may run the docker compose ps command to investigate the status of all Docker containers:
1
NAME COMMAND SERVICE STATUS PORTS
2
openmetadata_elasticsearch "/tini -- /usr/local…" elasticsearch running 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp
3
openmetadata_mysql "/entrypoint.sh mysq…" mysql running (healthy) 33060-33061/tcp
4
openmetadata_server "./openmetadata-star…" openmetadata-server running 0.0.0.0:8585->8585/tcp
Copied!

3. Confirm you can access the OpenMetadata UI

Visit the following URL to confirm that you can access the UI and start exploring OpenMetadata:
1
http://localhost:8585
Copied!
You should see a page similar to the following as the landing page for the OpenMetadata UI.

4. Install prefect-openmetadata

Before running the commands below to install Python libraries, we recommend creating a virtual environment with a Python virtual environment manager such as pipenv, conda or virtualenv.
You can install the Prefect OpenMetadata package using a single command:
1
pip install prefect-openmetadata
Copied!
This will already include Prefect 2.0 - both the client library, as well as an embedded API server and UI, which can optionally be started using:
1
prefect orion start
Copied!
If you navigate to the URL, you’ll be able to access a locally running Prefect Orion UI:
1
http://localhost:4200
Copied!
Apart from Prefect, prefect-openmetadata comes prepackaged with the openmetadata-ingestion[docker] library for metadata ingestion. This library contains everything you need to turn your JSON ingestion specifications into workflows that will:
  • scan your source systems,
  • figure out which metadata needs to be ingested,
  • load the requested metadata into your OpenMetadata backend.

5. Prepare your metadata ingestion spec

If you followed the first step of this tutorial, then you cloned the prefect-openmetadata repository. This repository contains a directory example-data which you can use to ingest sample data into your OpenMetadata backend using Prefect.
This documentation page contains an example configuration you can use in your flow to ingest that sample data.

6. Run ingestion workflow locally

Now you can paste the config from above as a string into your flow definition and run it. This documentation page explains in detail how that works.
In short, we only have to:
  1. 1.
    Import the flow function,
  2. 2.
    Pass the config as a string.
You can run the workflow as any Python function. No DAGs and no boilerplate.
After running your flow, you should see new users, datasets, dashboards, and other metadata in your OpenMetadata UI. Also, your Prefect UI will display the workflow run and will show the logs with details on which source system has been scanned and which data has been ingested.
If you haven't started the Prefect Orion UI yet, you can do that from your CLI:
1
prefect orion start
Copied!
If you navigate to the URL http://localhost:4200, you’ll be able to:
  • access a locally running Prefect Orion UI
  • see all previously triggered ingestion workflow runs.
Congratulations on building your first metadata ingestion workflow with OpenMetadata and Prefect! In the next section, we'll look at how you can run this flow on schedule.

7. Schedule and deploy your metadata ingestion flows with Prefect

Ingesting your data via manually executed scripts is great for initial exploration, but in order to build a reliable metadata platform, you need to run those workflows on a regular cadence. That’s where you can leverage Prefect schedules and deployments.
This documentation page demonstrates how you can configure a DeploymentSpec to deploy your flow and ensure that your metadata gets refreshed on schedule.

8. Deploy the execution layer to run your flows

So far, we’ve looked at how you can create and schedule your workflow; but where does this code actually run? This is a place where the concepts of storage, work queues, and agents become important. But don’t worry - all you need to know to get started is running one CLI command for each of those concepts.
1) Storage
Storage is used to tell Prefect where your workflow code lives. To configure storage, run:
1
prefect storage create
Copied!
The CLI will guide you through the process to select the storage of your choice - to get started you can select the Local Storage and choose some path in your file system. You can then directly select it as your default storage.
2) Work Queue
Work queues collect scheduled runs and agents pick those up from the queue. To create a default work queue, run:
1
prefect work-queue create default
Copied!
3) Agent
Agents are lightweight processes that poll their work queues for scheduled runs and execute workflows on the infrastructure you specified on the DeploymentSpec’s flow_runner. To create an agent corresponding to the default work queue, run:
1
prefect agent start default
Copied!
That’s all you need! Once you have executed those three commands, your scheduled deployments (such as the one we defined using ingestion_flow.py above) are now scheduled, and Prefect will ensure that your metadata stays up-to-date.
You can observe the state of your metadata ingestion workflows from the Prefect Orion UI. The UI will also include detailed logs showing which metadata got updated to ensure your data platform remains healthy and observable.

9. Using Prefect 2.0 in the Cloud

If you want to move beyond this local installation, you can deploy Prefect 2.0 to run your OpenMetadata ingestion workflows by:
For various deployment options of OpenMetadata, check the “Deploy” section of this documentation.

10. Questions about using OpenMetadata with Prefect

If you have any questions about configuring Prefect, post your question on Prefect Discourse or in the Prefect Community Slack.
And if you need support for OpenMetadata, get in touch on OpenMetadata Slack.

Troubleshooting

Could not find a version that satisfied the requirement

1
ERROR: Could not find a version that satisfies the requirement openmetadata-ingestion[docker] (from versions: none)
2
ERROR: No matching distribution found for openmetadata-ingestion[docker]
Copied!
If you see the above when attempting to install prefect-openmetadata, this can be due to using an older version of Python and pip. Please check the Requirements section above and confirm that you have supported versions installed.

Next Steps

  1. 1.
    Visit the Features overview page and explore the OpenMetadata UI.
  2. 2.
    Visit the Connectors documentation to see what services you can integrate with OpenMetadata.
  3. 3.
    Visit the API documentation and explore the OpenMetadata APIs.