> ## Documentation Index
> Fetch the complete documentation index at: https://docs.open-metadata.org/llms.txt
> Use this file to discover all available pages before exploring further.

# External Ingestion Workflow Examples

> Code examples for running OpenMetadata ingestion workflows externally — metadata, lineage, usage, profiler, and data quality.

# External Ingestion Workflow Examples

This page contains code examples for running each workflow type externally. For framework setup, configuration, and JWT token handling, see the [External Ingestion Overview](/v1.12.x/deployment/ingestion/external).

Let's jump now into some examples on how you could create the function to run the different workflows. Note that this code
can then be executed inside a DAG, a GitHub action, or a vanilla Python script. It will work for any environment.

### Testing

You can easily test every YAML configuration using the `metadata` CLI from the Ingestion Framework.
In order to install it, you just need to get it from [PyPI](https://pypi.org/project/openmetadata-ingestion/).

In each of the examples below, we'll showcase how to run the CLI, assuming you have a YAML file that contains
the workflow configuration.

### Metadata Workflow

This is the first workflow you have to configure and run. It will take care of fetching the metadata from your sources,
be it Database Services, Dashboard Services, Pipelines, etc.

The rest of the workflows (Lineage, Profiler,...) will be executed on top of the metadata already available in the platform.

<Steps>
  <Step title="Adding the imports">
    **Adding the imports**

    The first step is to import the `MetadataWorkflow` class, which will take care of the full ingestion logic. We'll
    add the import for printing the results at the end.
  </Step>

  <Step title="Defining the YAML">
    **Defining the YAML**

    Then, we need to pass the YAML configuration. For this simple example we are defining a variable, but you can
    read from a file, parse secrets from your environment, or any other approach you'd need. In the end, it's just
    Python code.

    <Tip>
      You can find complete YAMLs in each connector [docs](/v1.12.x/connectors) and find more information about the available
      configurations.
    </Tip>
  </Step>

  <Step title="Preparing the Workflow">
    **Preparing the Workflow**

    Finally, we'll prepare a function that we can execute anywhere.

    It will take care of instantiating the workflow, executing it and giving us the results.
  </Step>
</Steps>

```python theme={null}
import yaml
from metadata.workflow.metadata import MetadataWorkflow

CONFIG = """
source:
  type: snowflake
  serviceName: <service name>
  serviceConnection:
    config:
      type: Snowflake
      ...
  sourceConfig:
    config:
      type: DatabaseMetadata
      markDeletedTables: true
      includeTables: true
      ...
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  openMetadataServerConfig:
    hostPort: "http://localhost:8585/api"
    authProvider: openmetadata
    securityConfig:
      jwtToken: "{bot_jwt_token}"
"""
def run():
    workflow = MetadataWorkflow.create(yaml.safe_load(CONFIG))
    workflow.execute()
    workflow.raise_from_status()
    workflow.print_status()
    workflow.stop()
```

<Tip>
  You can test the workflow via `metadata ingest -c <path-to-yaml>`.
</Tip>

### Lineage Workflow

This workflow will take care of scanning your query history and defining lineage relationships between your tables.

You can find more information about this workflow [here](/v1.12.x/connectors/ingestion/lineage).

<Steps>
  <Step title="Adding the imports">
    **Adding the imports**

    The first step is to import the `MetadataWorkflow` class, which will take care of the full ingestion logic. We'll
    add the import for printing the results at the end.

    Note that we are using the same class as in the Metadata Ingestion.
  </Step>

  <Step title="Defining the YAML">
    **Defining the YAML**

    Then, we need to pass the YAML configuration. For this simple example we are defining a variable, but you can
    read from a file, parse secrets from your environment, or any other approach you'd need.

    Note how we have not added here the `serviceConnection`. Since the service would have been created during the
    metadata ingestion, we can let the Ingestion Framework dynamically fetch the Service Connection information.

    If, however, you are configuring the workflow with `storeServiceConnection: false`, you'll need to explicitly
    define the `serviceConnection`.

    <Tip>
      You can find complete YAMLs in each connector [docs](/v1.12.x/connectors) and find more information about the available
      configurations.
    </Tip>
  </Step>

  <Step title="Preparing the Workflow">
    **Preparing the Workflow**

    Finally, we'll prepare a function that we can execute anywhere.

    It will take care of instantiating the workflow, executing it and giving us the results.
  </Step>
</Steps>

```python theme={null}
import yaml
from metadata.workflow.metadata import MetadataWorkflow

CONFIG = """
source:
  type: snowflake-lineage
  serviceName: <service name>
  sourceConfig:
    config:
      type: DatabaseLineage
      queryLogDuration: 1
      parsingTimeoutLimit: 300
      ...
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  openMetadataServerConfig:
    hostPort: "http://localhost:8585/api"
    authProvider: openmetadata
    securityConfig:
      jwtToken: "{bot_jwt_token}"
"""
def run():
    workflow = MetadataWorkflow.create(yaml.safe_load(CONFIG))
    workflow.execute()
    workflow.raise_from_status()
    workflow.print_status()
    workflow.stop()
```

<Tip>
  You can test the workflow via `metadata ingest -c <path-to-yaml>`.
</Tip>

### Usage Workflow

As with the lineage workflow, we'll scan the query history for any DML statements. The goal is to ingest queries
into the platform, figure out the relevancy of your assets and frequently joined tables.

<Steps>
  <Step title="Adding the imports">
    **Adding the imports**

    The first step is to import the `UsageWorkflow` class, which will take care of the full ingestion logic. We'll
    add the import for printing the results at the end.
  </Step>

  <Step title="Defining the YAML">
    **Defining the YAML**

    Then, we need to pass the YAML configuration. For this simple example we are defining a variable, but you can
    read from a file, parse secrets from your environment, or any other approach you'd need.

    Note how we have not added here the `serviceConnection`. Since the service would have been created during the
    metadata ingestion, we can let the Ingestion Framework dynamically fetch the Service Connection information.

    If, however, you are configuring the workflow with `storeServiceConnection: false`, you'll need to explicitly
    define the `serviceConnection`.

    <Tip>
      You can find complete YAMLs in each connector [docs](/v1.12.x/connectors) and find more information about the available
      configurations.
    </Tip>
  </Step>

  <Step title="Preparing the Workflow">
    **Preparing the Workflow**

    Finally, we'll prepare a function that we can execute anywhere.

    It will take care of instantiating the workflow, executing it and giving us the results.
  </Step>
</Steps>

```python theme={null}
import yaml
from metadata.workflow.usage import UsageWorkflow

CONFIG = """
source:
  type: snowflake-usage
  serviceName: <service name>
  sourceConfig:
    config:
      type: DatabaseUsage
      queryLogDuration: 1
      parsingTimeoutLimit: 300
      ...
processor:
  type: query-parser
  config: {}
stage:
  type: table-usage
  config:
    filename: "/tmp/snowflake_usage"
bulkSink:
  type: metadata-usage
  config:
    filename: "/tmp/snowflake_usage"
workflowConfig:
  openMetadataServerConfig:
    hostPort: "http://localhost:8585/api"
    authProvider: openmetadata
    securityConfig:
      jwtToken: "{bot_jwt_token}"
"""
def run():
    workflow = UsageWorkflow.create(yaml.safe_load(CONFIG))
    workflow.execute()
    workflow.raise_from_status()
    workflow.print_status()
    workflow.stop()
```

<Tip>
  You can test the workflow via `metadata usage -c <path-to-yaml>`.
</Tip>

### Profiler Workflow

This workflow will execute queries against your database and send the results into OpenMetadata. The goal is to compute
metrics about your data and give you a high-level view of its shape, together with the sample data.

This is an interesting previous step before creating Data Quality Workflows.

You can find more information about this workflow [here](/v1.12.x/how-to-guides/data-quality-observability/profiler/profiler-workflow).

<Steps>
  <Step title="Adding the imports">
    **Adding the imports**

    The first step is to import the `ProfilerWorkflow` class, which will take care of the full ingestion logic. We'll
    add the import for printing the results at the end.
  </Step>

  <Step title="Defining the YAML">
    **Defining the YAML**

    Then, we need to pass the YAML configuration. For this simple example we are defining a variable, but you can
    read from a file, parse secrets from your environment, or any other approach you'd need.

    Note how we have not added here the `serviceConnection`. Since the service would have been created during the
    metadata ingestion, we can let the Ingestion Framework dynamically fetch the Service Connection information.

    If, however, you are configuring the workflow with `storeServiceConnection: false`, you'll need to explicitly
    define the `serviceConnection`.

    <Tip>
      You can find complete YAMLs in each connector [docs](/v1.12.x/connectors) and find more information about the available
      configurations.
    </Tip>
  </Step>

  <Step title="Preparing the Workflow">
    **Preparing the Workflow**

    Finally, we'll prepare a function that we can execute anywhere.

    It will take care of instantiating the workflow, executing it and giving us the results.
  </Step>
</Steps>

```python theme={null}
import yaml
from metadata.workflow.profiler import ProfilerWorkflow

CONFIG = """
source:
  type: snowflake
  serviceName: <service name>
  sourceConfig:
    config:
      type: Profiler
      generateSampleData: true
      ...
processor:
  type: orm-profiler
  config: {}
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  openMetadataServerConfig:
    hostPort: "http://localhost:8585/api"
    authProvider: openmetadata
    securityConfig:
      jwtToken: "{bot_jwt_token}"
"""
def run():
    workflow = ProfilerWorkflow.create(yaml.safe_load(CONFIG))
    workflow.execute()
    workflow.raise_from_status()
    workflow.print_status()
    workflow.stop()
```

<Tip>
  You can test the workflow via `metadata profile -c <path-to-yaml>`.
</Tip>

### Data Quality Workflow

This workflow will execute queries against your database and send the results into OpenMetadata. The goal is to compute
metrics about your data and give you a high-level view of its shape, together with the sample data.

This is an interesting previous step before creating Data Quality Workflows.

You can find more information about this workflow [here](/v1.12.x/how-to-guides/data-quality-observability/quality/configure).

<Steps>
  <Step title="Adding the imports">
    **Adding the imports**

    The first step is to import the `TestSuiteWorkflow` class, which will take care of the full ingestion logic. We'll
    add the import for printing the results at the end.
  </Step>

  <Step title="Defining the YAML">
    **Defining the YAML**

    Then, we need to pass the YAML configuration. For this simple example we are defining a variable, but you can
    read from a file, parse secrets from your environment, or any other approach you'd need.

    Note how we have not added here the `serviceConnection`. Since the service would have been created during the
    metadata ingestion, we can let the Ingestion Framework dynamically fetch the Service Connection information.

    If, however, you are configuring the workflow with `storeServiceConnection: false`, you'll need to explicitly
    define the `serviceConnection`.

    Moreover, see how we are not configuring any tests in the `processor`. You can do [that](/v1.12.x/how-to-guides/data-quality-observability/quality/configure#full-yaml-config-example),
    but even if nothing gets defined in the YAML, we will execute all the tests configured against the table.

    <Tip>
      You can find complete YAMLs in each connector [docs](/v1.12.x/connectors) and find more information about the available
      configurations.
    </Tip>
  </Step>

  <Step title="Preparing the Workflow">
    **Preparing the Workflow**

    Finally, we'll prepare a function that we can execute anywhere.

    It will take care of instantiating the workflow, executing it and giving us the results.
  </Step>
</Steps>

```python theme={null}
import yaml
from metadata.workflow.data_quality import TestSuiteWorkflow

CONFIG = """
source:
  type: TestSuite
  serviceName: <service name>
  sourceConfig:
    config:
      type: TestSuite
      entityFullyQualifiedName: <Table FQN, e.g., `service.database.schema.table`>
processor:
  type: orm-test-runner
  config: {}
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  openMetadataServerConfig:
    hostPort: "http://localhost:8585/api"
    authProvider: openmetadata
    securityConfig:
      jwtToken: "{bot_jwt_token}"
"""
def run():
    workflow = TestSuiteWorkflow.create(yaml.safe_load(CONFIG))
    workflow.execute()
    workflow.raise_from_status()
    workflow.print_status()
    workflow.stop()
```

<Tip>
  You can test the workflow via `metadata test -c <path-to-yaml>`.
</Tip>
