> ## Documentation Index
> Fetch the complete documentation index at: https://docs.open-metadata.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Lineage Workflow | OpenMetadata Data Lineage Guide

> Discover how to set up data lineage workflows in OpenMetadata. Learn to track data flow, configure connectors, and visualize dependencies across your data pipeline.

export const connector_0 = "bigquery"

# Lineage Workflow

Learn how to configure the Lineage workflow from the UI to ingest Lineage data from your data sources.

<Tip>
  Checkout the documentation of the connector you are using to know if it supports automated lineage workflow.

  If your database service is not yet supported, you can use this same workflow by providing a Query Log file!

  Learn how to do so 👇

  <CardGroup cols={1}>
    <Card title="Lineage Workflow through Query Logs" href="/v1.12.x/connectors/ingestion/workflows/lineage/lineage-workflow-query-logs">
      Configure the lineage workflow by providing a Query Log file.
    </Card>
  </CardGroup>
</Tip>

## UI Configuration

Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Entity Lineage information.

This will populate the Lineage tab from the Table Entity Page.

<img src="https://mintcdn.com/openmetadata/cpYhk0oyurO_-Qc1/public/images/features/ingestion/workflows/lineage/table-entity-page.png?fit=max&auto=format&n=cpYhk0oyurO_-Qc1&q=85&s=23c0863d906bb723138d481adaa85283" alt="table-entity-page" width="4044" height="1944" data-path="public/images/features/ingestion/workflows/lineage/table-entity-page.png" />

We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data.

### 1. Add a Lineage Ingestion

From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Lineage Ingestion.

<img src="https://mintcdn.com/openmetadata/cpYhk0oyurO_-Qc1/public/images/features/ingestion/workflows/lineage/add-ingestion.png?fit=max&auto=format&n=cpYhk0oyurO_-Qc1&q=85&s=f88a4935815566f7bd0736789ff9d88f" alt="add-ingestion" width="4044" height="1944" data-path="public/images/features/ingestion/workflows/lineage/add-ingestion.png" />

### 2. Configure the Lineage Ingestion

Here you can enter the Lineage Ingestion details:

<img src="https://mintcdn.com/openmetadata/cpYhk0oyurO_-Qc1/public/images/features/ingestion/workflows/lineage/configure-lineage-ingestion.png?fit=max&auto=format&n=cpYhk0oyurO_-Qc1&q=85&s=2ca45456b98fb5e590434a2f9a571f31" alt="configure-lineage-ingestion" width="2600" height="2496" data-path="public/images/features/ingestion/workflows/lineage/configure-lineage-ingestion.png" />

### Lineage Options

**Query Log Duration**

Specify the duration in days for which the lineage should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data lineage will capture lineage information for 48 hours prior to when the ingestion workflow is run.

**Result Limit**

Set the limit for the query log results to be run at a time.

### 3. Schedule and Deploy

After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.

<img src="https://mintcdn.com/openmetadata/cpYhk0oyurO_-Qc1/public/images/features/ingestion/workflows/lineage/scheule-and-deploy.png?fit=max&auto=format&n=cpYhk0oyurO_-Qc1&q=85&s=5ae414dbff4c37657246803afb70b535" alt="schedule-and-deploy" width="4044" height="1944" data-path="public/images/features/ingestion/workflows/lineage/scheule-and-deploy.png" />

## YAML Configuration

In the [connectors](/v1.12.x/connectors) section we showcase how to run the metadata ingestion from a JSON/YAML file using the Airflow SDK or the CLI via metadata ingest. Running a lineage workflow is also possible using a JSON/YAML configuration file.

This is a good option if you wish to execute your workflow via the Airflow SDK or using the CLI; if you use the CLI a lineage workflow can be triggered with the command `metadata ingest -c FILENAME.yaml`. The `serviceConnection` config will be specific to your connector (you can find more information in the [connectors](/v1.12.x/connectors) section), though the sourceConfig for the lineage will be similar across all connectors.

## Lineage

After running a Metadata Ingestion workflow, we can run Lineage workflow.
While the `serviceName` will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the `serviceConnection` details from the server.

### 1. Define the YAML Config

This is a sample config for {connector_0} Lineage:

<CodePreview>
  <ContentPanel>
    <ContentSection id={1} title="Source Configuration" lines="4">
      Configure the source type and service name for your lineage workflow.

      You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryLineagePipeline.json).
    </ContentSection>

    <ContentSection id={2} title="Lineage Config Type" lines="6">
      **type**: Set to `DatabaseLineage` for database lineage ingestion.
    </ContentSection>

    <ContentSection id={3} title="Query Log Duration" lines="7-8">
      **queryLogDuration**: Configuration to tune how far we want to look back in query logs to process lineage data in days.
    </ContentSection>

    <ContentSection id={4} title="Parsing Timeout Limit" lines="9">
      **parsingTimeoutLimit**: Configuration to set the timeout for parsing the query in seconds.
    </ContentSection>

    <ContentSection id={5} title="Filter Condition" lines="10">
      **filterCondition**: Condition to filter the query history.
    </ContentSection>

    <ContentSection id={6} title="Result Limit" lines="11">
      **resultLimit**: Configuration to set the limit for query logs.
    </ContentSection>

    <ContentSection id={7} title="Query Log File Path" lines="12-13">
      **queryLogFilePath**: Configuration to set the file path for query logs. If instead of getting the query logs from the database we want to pass a file with the queries.
    </ContentSection>

    <ContentSection id={8} title="Database Filter Pattern" lines="14-19">
      **databaseFilterPattern**: Regex to only fetch databases that matches the pattern.
    </ContentSection>

    <ContentSection id={9} title="Schema Filter Pattern" lines="20-25">
      **schemaFilterPattern**: Regex to only fetch tables or databases that matches the pattern.
    </ContentSection>

    <ContentSection id={10} title="Table Filter Pattern" lines="26-32">
      **tableFilterPattern**: Regex to only fetch tables or databases that matches the pattern.
    </ContentSection>

    <ContentSection id={11} title="Override View Lineage" lines="33">
      **overrideViewLineage**: Set the 'Override View Lineage' toggle to control whether to override the existing view lineage.
    </ContentSection>

    <ContentSection id={12} title="Process View Lineage" lines="34">
      **processViewLineage**: Set the 'Process View Lineage' toggle to control whether to process view lineage.
    </ContentSection>

    <ContentSection id={13} title="Process Query Lineage" lines="35">
      **processQueryLineage**: Set the 'Process Query Lineage' toggle to control whether to process query lineage.
    </ContentSection>

    <ContentSection id={14} title="Process Stored Procedure Lineage" lines="36">
      **processStoredProcedureLineage**: Set the 'Process Stored ProcedureLog Lineage' toggle to control whether to process stored procedure lineage.
    </ContentSection>

    <ContentSection id={15} title="Threads" lines="37">
      **threads**: Number of Threads to use in order to parallelize lineage ingestion.
    </ContentSection>

    <ContentSection id={16} title="Sink Configuration" lines="38-40">
      To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.
    </ContentSection>
  </ContentPanel>

  <CodePanel fileName="{connector}_lineage.yaml">
    ```yaml theme={null}
    source:
      type: {connector}-lineage
      serviceName: {connector}
      sourceConfig:
        config:
          type: DatabaseLineage
          # Number of days to look back
          queryLogDuration: 1
          parsingTimeoutLimit: 300
          # filterCondition: query_text not ilike '--- metabase query %'
          resultLimit: 1000
          # If instead of getting the query logs from the database we want to pass a file with the queries
          # queryLogFilePath: /tmp/query_log/file_path
          # databaseFilterPattern:
          #   includes:
          #     - database1
          #     - database2
          #   excludes:
          #     - database3
          # schemaFilterPattern:
          #   includes:
          #     - schema1
          #     - schema2
          #   excludes:
          #     - schema3
          # tableFilterPattern:
          #   includes:
          #     - table1
          #     - table2
          #   excludes:
          #     - table3
          #     - table4
          overrideViewLineage: false
          processViewLineage: true
          processQueryLineage: true
          processStoredProcedureLineage: true
          threads: 1
    sink:
      type: metadata-rest
      config: {}
    ```
  </CodePanel>
</CodePreview>

* You can learn more about how to configure and run the Lineage Workflow to extract Lineage data from [here](/connectors/ingestion/workflows/lineage)

### 2. Run with the CLI

After saving the YAML config, we will run the command the same way we did for the metadata ingestion:

```bash theme={null}
metadata ingest -c <path-to-yaml>
```
