connectors

No menu items for this category

Configure dbt workflow from OpenMetadata UI

Learn how to configure the dbt workflow from the UI to ingest dbt data from your data sources.

Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add the dbt information.

This will populate the dbt tab from the Table Entity Page.

dbt

dbt

We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.

From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add dbt Ingestion.

add-ingestion

Add dbt Ingestion

Here you can enter the configuration required for OpenMetadata to get the dbt files (manifest.json, catalog.json and run_results.json) required to extract the dbt metadata. Select any one of the source from below from where the dbt files can be fetched:

Only the manifest.json file is compulsory for dbt ingestion.

OpenMetadata connects to the AWS s3 bucket via the credentials provided and scans the AWS s3 buckets for manifest.json, catalog.json and run_results.json files.

The name of the s3 bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.

Follow the link here for instructions on setting up multiple dbt projects.

aws-s3-bucket

AWS S3 Bucket Config

OpenMetadata connects to the GCS bucket via the credentials provided and scans the gcp buckets for manifest.json, catalog.json and run_results.json files.

The name of the GCS bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.

GCS credentials can be stored in two ways:

1. Entering the credentials directly into the form

Follow the link here for instructions on setting up multiple dbt projects.

gcp-storage-bucket-form

GCS Bucket config

2. Entering the path of file in which the GCS bucket credentials are stored.

gcp-storage-bucket-path

GCS Bucket Path Config

For more information on Google Cloud Storage authentication click here.

OpenMetadata connects to the Azure Storage service via the credentials provided and scans the AWS s3 buckets for manifest.json, catalog.json and run_results.json files.

The name of the s3 bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.

Follow the link here for instructions on setting up multiple dbt projects.

azure-bucket

Azure Storage Config

Path of the manifest.json, catalog.json and run_results.json files stored in the local system or in the container in which openmetadata server is running can be directly provided.

local-storage

Local Storage Config

File server path of the manifest.json, catalog.json and run_results.json files stored on a file server directly provided.

file-server

File Server Config

Click on the the link here for getting started with dbt cloud account setup if not done already. The APIs need to be authenticated using an Authentication Token. Follow the link here to generate an authentication token for your dbt cloud account.

The Account Viewer permission is the minimum requirement for the dbt cloud token.

The dbt Cloud workflow leverages the dbt Cloud v2 APIs to retrieve dbt run artifacts (manifest.json, catalog.json, and run_results.json) and ingest the dbt metadata.

It uses the /runs API to obtain the most recent successful dbt run, filtering by account_id, project_id and job_id if specified. The artifacts from this run are then collected using the /artifacts API.

Refer to the code here

dbt-cloud

dbt Cloud config

The fields for Dbt Cloud Account Id, Dbt Cloud Project Id and Dbt Cloud Job Id should be numeric values.

To know how to get the values for Dbt Cloud Account Id, Dbt Cloud Project Id and Dbt Cloud Job Id fields check here.

After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.

schedule-and-deploy

Schedule dbt ingestion pipeline