Amundsen

In this page, you will learn how to use the metadata CLI to run a one-ingestion.

To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment.

To run the Amundsen ingestion, you will need to install:

pip3 install "openmetadata-ingestion[amundsen]"

Make sure you are running openmetadata-ingestion version 0.10.2 or above.

You need to create database services before ingesting the metadata from Amundsen. In the below example we have 5 tables from 3 data sources i.e., hive, dynamo & delta so in OpenMetadata we have to create database services with the same name as the source.

db-service

Amundsen dashboard

To create database service follow these steps:

The first step is ingesting the metadata from your sources. Under Settings, you will find a Services link an external source system to OpenMetadata. Once a service is created, it can be used to configure metadata, usage, and profiler workflows.To visit the Services page, select Services from the Settings menu.serv

db-service

Navigate to Settings >> Services

Click on the Add New Service button to start the Service creation.

db-service

Add a New Service from the Database Services Page

Select the service type which are available on the amundsen and create a service one by one. In this example we will need to create services for hive, dynamo db & deltalake. Possible service names are athena, bigquery, db2, druid, delta, salesforce, oracle, glue, snowflake or hive.

db-service
db-service

Note: Adding ingestion in this step is optional, because we will fetch the metadata from Amundsen. After creating all the database services, my service page looks like below, and we are ready to start with the Amundsen ingestion via the CLI.

db-service

All connectors are now defined as JSON Schemas. Here you can find the structure to create a connection to Amundsen.

In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server.

The workflow is modeled around the following JSON Schema.

This is a sample config for Amundsen:

source:
  type: amundsen
  serviceName: local_amundsen
  serviceConnection:
    config:
      type: Amundsen
      username: <username>
      password: <password>
      hostPort: bolt://localhost:7687
      maxConnectionLifeTime: <time in secs.>
      validateSSL: <true or false>
      encrypted: <true or false>
      modelClass: <modelclass>
  sourceConfig:
    config:
      enableDataProfiler: false
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  openMetadataServerConfig:
    hostPort: http://localhost:8585/api
    authProvider: no-auth

You can find all the definitions and types for the serviceConnection here.

  • username: Enter the username of your Amundsen user in the Username field. The specified user should be authorized to read all databases you want to include in the metadata ingestion workflow.
  • password: Enter the password for your amundsen user in the Password field.
  • hostPort: Host and port of the Amundsen Neo4j Connection.
  • maxConnectionLifeTime (optional): Maximum connection lifetime for the Amundsen Neo4j Connection
  • validateSSL (optional): Enable SSL validation for the Amundsen Neo4j Connection.
  • encrypted (Optional): Enable encryption for the Amundsen Neo4j Connection.
  • modelClass (Optional): Model Class for the Amundsen Neo4j Connection.

To send the metadata to OpenMetadata, it needs to be specified as "type": "metadata-rest".

The main property here is the openMetadataServerConfig, where you can define the host and security provider of your OpenMetadata installation. For a simple, local installation using our docker containers, this looks like:

workflowConfig:
  openMetadataServerConfig:
    hostPort: http://localhost:8585/api
    authProvider: no-auth

We support different security providers. You can find their definitions here. An example of an Auth0 configuration would be the following:

workflowConfig:
  openMetadataServerConfig:
    hostPort: http://localhost:8585/api
    authProvider: auth0
    securityConfig:
      clientId: <client ID>
      secretKey: <secret key>
      domain: <domain>

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:

metadata ingest -c <path-to-yaml>

Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources.

Still have questions?

You can take a look at our Q&A or reach out to us in Slack

Was this page helpful?

editSuggest edits