Prefect

This page provides instructions on how to install OpenMetadata and Prefect on your local machine.

Please ensure your host system meets the requirements listed below. Then continue to the procedure for installing OpenMetadata.

chevron_rightOS X and Linux
chevron_rightWindows

This documentation page will walk you through the process of configuring OpenMetadata and Prefect 2.0. It is intended as a minimal viable setup to get you started using both platforms together. Once you want to move to a production-ready deployment, check the last two sections of this tutorial.

First, clone the latest version of the prefect-openmetadata Prefect Collection.

Then, navigate to the directory openmetadata-docker containing the docker-compose.yml file with the minimal requirements to get started with OpenMetadata.

You can start the containers with OpenMetadata components using:

docker compose up -d

This will create a docker network and containers with the following services:

  • openmetadata_mysql - Metadata store that serves as a persistence layer holding your metadata.
  • openmetadata_elasticsearch - Indexing service to search the metadata catalog.
  • openmetadata_server - The OpenMetadata UI and API server allowing you to discover insights and interact with your metadata.

Wait a couple of minutes until the setup is finished.

To check the status of all services, you may run the docker compose ps command to investigate the status of all Docker containers:

NAME                         COMMAND                  SERVICE               STATUS              PORTS
openmetadata_elasticsearch   "/tini -- /usr/local…"   elasticsearch         running             0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp
openmetadata_mysql           "/entrypoint.sh mysq…"   mysql                 running (healthy)   33060-33061/tcp
openmetadata_server          "./openmetadata-star…"   openmetadata-server   running             0.0.0.0:8585->8585/tcp

Visit the following URL to confirm that you can access the UI and start exploring OpenMetadata:

http://localhost:8585

You should see a page similar to the following as the landing page for the OpenMetadata UI.

Landing page of OpenMetadata UI

Before running the commands below to install Python libraries, we recommend creating a virtual environment with a Python virtual environment manager such as pipenv, conda or virtualenv.

You can install the Prefect OpenMetadata package using a single command:

pip install prefect-openmetadata

This will already include Prefect 2.0 - both the client library, as well as an embedded API server and UI, which can optionally be started using:

prefect orion start

If you navigate to the URL, you’ll be able to access a locally running Prefect Orion UI:

http://localhost:4200

Apart from Prefect, prefect-openmetadata comes prepackaged with the openmetadata-ingestion[docker] library for metadata ingestion. This library contains everything you need to turn your JSON ingestion specifications into workflows that will:

  • scan your source systems,
  • figure out which metadata needs to be ingested,
  • load the requested metadata into your OpenMetadata backend.

If you followed the first step of this tutorial, then you cloned the prefect-openmetadata repository. This repository contains a directory example-data which you can use to ingest sample data into your OpenMetadata backend using Prefect.

This documentation page contains an example configuration you can use in your flow to ingest that sample data.

Now you can paste the config from above as a string into your flow definition and run it. This documentation page explains in detail how that works. In short, we only have to:

  1. Import the flow function,
  2. Pass the config as a string.

You can run the workflow as any Python function. No DAGs and no boilerplate.

After running your flow, you should see new users, datasets, dashboards, and other metadata in your OpenMetadata UI. Also, your Prefect UI will display the workflow run and will show the logs with details on which source system has been scanned and which data has been ingested.

If you haven't started the Prefect Orion UI yet, you can do that from your CLI:

prefect orion start

If you navigate to the URL http://localhost:4200, you’ll be able to:

  • access a locally running Prefect Orion UI
  • see all previously triggered ingestion workflow runs.

Congratulations on building your first metadata ingestion workflow with OpenMetadata and Prefect! In the next section, we'll look at how you can run this flow on schedule.

Ingesting your data via manually executed scripts is great for initial exploration, but in order to build a reliable metadata platform, you need to run those workflows on a regular cadence. That’s where you can leverage Prefect schedules and deployments.

This documentation page demonstrates how you can configure a DeploymentSpec to deploy your flow and ensure that your metadata gets refreshed on schedule.

So far, we’ve looked at how you can create and schedule your workflow; but where does this code actually run? This is a place where the concepts of storage, work queues, and agents become important. But don’t worry - all you need to know to get started is running one CLI command for each of those concepts.

1) Storage

Storage is used to tell Prefect where your workflow code lives. To configure storage, run:

prefect storage create

The CLI will guide you through the process to select the storage of your choice - to get started you can select the Local Storage and choose some path in your file system. You can then directly select it as your default storage.

2) Work Queue

Work queues collect scheduled runs and agents pick those up from the queue. To create a default work queue, run:

prefect work-queue create default

3) Agent

Agents are lightweight processes that poll their work queues for scheduled runs and execute workflows on the infrastructure you specified on the DeploymentSpec’s flow_runner. To create an agent corresponding to the default work queue, run:

prefect agent start default

That’s all you need! Once you have executed those three commands, your scheduled deployments (such as the one we defined using ingestion_flow.py above) are now scheduled, and Prefect will ensure that your metadata stays up-to-date. You can observe the state of your metadata ingestion workflows from the Prefect Orion UI. The UI will also include detailed logs showing which metadata got updated to ensure your data platform remains healthy and observable.

If you want to move beyond this local installation, you can deploy Prefect 2.0 to run your OpenMetadata ingestion workflows by:

For various deployment options of OpenMetadata, check the Deployment section.

If you have any questions about configuring Prefect, post your question on Prefect Discourse or in the Prefect Community Slack. And if you need support for OpenMetadata, get in touch on OpenMetadata Slack.

Troubleshooting

Could not find a version that satisfied the requirement

ERROR: Could not find a version that satisfies the requirement openmetadata-ingestion[docker] (from versions: none)
ERROR: No matching distribution found for openmetadata-ingestion[docker]

If you see the above when attempting to install prefect-openmetadata, this can be due to using an older version of Python and pip. Please check the Requirements section above and confirm that you have supported versions installed.

  1. Visit the overview page and explore the OpenMetadata UI.
  2. Visit the documentation to see what services you can integrate with OpenMetadata.
  3. Visit the documentation and explore the OpenMetadata APIs.

Still have questions?

You can take a look at our Q&A or reach out to us in Slack

Was this page helpful?

editSuggest edits