Prefect

This page provides instructions on how to install OpenMetadata and Prefect on your local machine.

Please ensure your host system meets the requirements listed below. Then continue to the procedure for installing OpenMetadata.

To check what version of Python you have, please use the following command.

Docker is an open source platform for developing, shipping, and running applications. It enables you to separate your applications from your infrastructure, so you can deliver software quickly using OS-level virtualization. It helps deliver software in packages called Containers.

To check what version of Docker you have, please use the following command.

If you need to install Docker, please visit Get Docker.

Note: You must allocate at least 6GB of memory to Docker in order to run OpenMetadata. To change the memory allocation for Docker, please visit:

Preferences -> Resources -> Advanced

compose command for Docker (version v2.1.1 or greater)

The Docker compose package enables you to define and run multi-container Docker applications. The compose command integrates compose functions into the Docker platform, making them available from the Docker command-line interface (CLI). The Python packages you will install in the procedure below use compose to deploy OpenMetadata.

To verify that the docker compose command is installed and accessible on your system, run the following command.

Upon running this command you should see output similar to the following.

Note: In previous releases of Docker compose functions were delivered with the docker-compose tool. OpenMetadata uses Compose V2. Please see the paragraphs above for instructions on installing Compose V2.

Install Docker Compose Version 2.0.0 on Linux

Follow the instructions here to install docker compose version 2.0.0

  1. Run the following command to download the current stable release of Docker Compose

This command installs Compose V2 for the active user under $HOME directory. To install Docker Compose for all users on your system, replace ~/.docker/cli-plugins with /usr/local/lib/docker/cli-plugins.

  1. Apply executable permissions to the binary
  1. Test your installation
  1. Install WSL2
  2. Install Ubuntu 20.04
  3. Install Docker for Windows

Follow the instructions.

This documentation page will walk you through the process of configuring OpenMetadata and Prefect 2.0. It is intended as a minimal viable setup to get you started using both platforms together. Once you want to move to a production-ready deployment, check the last two sections of this tutorial.

First, clone the latest version of the prefect-openmetadata Prefect Collection.

Then, navigate to the directory openmetadata-docker containing the docker-compose.yml file with the minimal requirements to get started with OpenMetadata.

You can start the containers with OpenMetadata components using:

This will create a docker network and containers with the following services:

  • openmetadata_mysql - Metadata store that serves as a persistence layer holding your metadata.
  • openmetadata_elasticsearch - Indexing service to search the metadata catalog.
  • openmetadata_server - The OpenMetadata UI and API server allowing you to discover insights and interact with your metadata.

Wait a couple of minutes until the setup is finished.

To check the status of all services, you may run the docker compose ps command to investigate the status of all Docker containers:

Visit the following URL to confirm that you can access the UI and start exploring OpenMetadata:

You should see a page similar to the following as the landing page for the OpenMetadata UI.

Landing page of OpenMetadata UI

Before running the commands below to install Python libraries, we recommend creating a virtual environment with a Python virtual environment manager such as pipenv, conda or virtualenv.

You can install the Prefect OpenMetadata package using a single command:

This will already include Prefect 2.0 - both the client library, as well as an embedded API server and UI, which can optionally be started using:

If you navigate to the URL, you’ll be able to access a locally running Prefect Orion UI:

Apart from Prefect, prefect-openmetadata comes prepackaged with the openmetadata-ingestion[docker] library for metadata ingestion. This library contains everything you need to turn your JSON ingestion specifications into workflows that will:

  • scan your source systems,
  • figure out which metadata needs to be ingested,
  • load the requested metadata into your OpenMetadata backend.

If you followed the first step of this tutorial, then you cloned the prefect-openmetadata repository. This repository contains a directory example-data which you can use to ingest sample data into your OpenMetadata backend using Prefect.

This documentation page contains an example configuration you can use in your flow to ingest that sample data.

Now you can paste the config from above as a string into your flow definition and run it. This documentation page explains in detail how that works. In short, we only have to:

  1. Import the flow function,
  2. Pass the config as a string.

You can run the workflow as any Python function. No DAGs and no boilerplate.

After running your flow, you should see new users, datasets, dashboards, and other metadata in your OpenMetadata UI. Also, your Prefect UI will display the workflow run and will show the logs with details on which source system has been scanned and which data has been ingested.

If you haven't started the Prefect Orion UI yet, you can do that from your CLI:

If you navigate to the URL http://localhost:4200, you’ll be able to:

  • access a locally running Prefect Orion UI
  • see all previously triggered ingestion workflow runs.

Congratulations on building your first metadata ingestion workflow with OpenMetadata and Prefect! In the next section, we'll look at how you can run this flow on schedule.

Ingesting your data via manually executed scripts is great for initial exploration, but in order to build a reliable metadata platform, you need to run those workflows on a regular cadence. That’s where you can leverage Prefect schedules and deployments.

This documentation page demonstrates how you can configure a DeploymentSpec to deploy your flow and ensure that your metadata gets refreshed on schedule.

So far, we’ve looked at how you can create and schedule your workflow; but where does this code actually run? This is a place where the concepts of storage, work queues, and agents become important. But don’t worry - all you need to know to get started is running one CLI command for each of those concepts.

1) Storage

Storage is used to tell Prefect where your workflow code lives. To configure storage, run:

The CLI will guide you through the process to select the storage of your choice - to get started you can select the Local Storage and choose some path in your file system. You can then directly select it as your default storage.

2) Work Queue

Work queues collect scheduled runs and agents pick those up from the queue. To create a default work queue, run:

3) Agent

Agents are lightweight processes that poll their work queues for scheduled runs and execute workflows on the infrastructure you specified on the DeploymentSpec’s flow_runner. To create an agent corresponding to the default work queue, run:

That’s all you need! Once you have executed those three commands, your scheduled deployments (such as the one we defined using ingestion_flow.py above) are now scheduled, and Prefect will ensure that your metadata stays up-to-date. You can observe the state of your metadata ingestion workflows from the Prefect Orion UI. The UI will also include detailed logs showing which metadata got updated to ensure your data platform remains healthy and observable.

If you want to move beyond this local installation, you can deploy Prefect 2.0 to run your OpenMetadata ingestion workflows by:

For various deployment options of OpenMetadata, check the Deployment section.

If you have any questions about configuring Prefect, post your question on Prefect Discourse or in the Prefect Community Slack. And if you need support for OpenMetadata, get in touch on OpenMetadata Slack.

Could not find a version that satisfied the requirement

If you see the above when attempting to install prefect-openmetadata, this can be due to using an older version of Python and pip. Please check the Requirements section above and confirm that you have supported versions installed.

  1. Visit the overview page and explore the OpenMetadata UI.
  2. Visit the documentation to see what services you can integrate with OpenMetadata.
  3. Visit the documentation and explore the OpenMetadata APIs.