OpenMetadata
Search…
Usage Workflow Through Query Logs
The following database connectors supports usage workflow in OpenMetadata:
If you are using any other database connector, direct execution of usage workflow is not possible. This is mainly because these database connectors does not maintain query execution logs which is required for usage workflow. This documentation will help you to learn, how to execute the usage workflow using a query log file for all the database connectors.

Query Log File

A query log file is a CSV file which contains the following information.
  • query: This field contains the literal query that has been executed in the database.
  • user_name (optional): Enter the database user name which has executed this query.
  • start_time (optional): Enter the query execution start time in YYYY-MM-DD HH:MM:SS format.
  • end_time (optional): Enter the query execution end time in YYYY-MM-DD HH:MM:SS format.
  • aborted (optional): This field accepts values as true or false and indicates whether the query was aborted during execution
  • database_name (optional): Enter the database name on which the query was executed.
  • schema_name (optional): Enter the schema name to which the query is associated.
Checkout a sample query log file here.

Usage Workflow

In order to run a Usage Workflow we need to make sure that Metadata Ingestion Workflow for corresponding service has already been executed. We will follow the steps to create a JSON configuration able to collect the query log file and execute the usage workflow.

1. Create a configuration file using template YAML

Create a new file called query_log_usage.yaml in the current directory. Note that the current directory should be the openmetadata directory.
Copy and paste the configuration template below into the query_log_usage.yaml the file you created.
query_log_usage.yaml
1
source:
2
type: query-log-usage
3
serviceName: local_mysql
4
serviceConnection:
5
config:
6
type: Mysql
7
username: openmetadata_user
8
password: openmetadata_password
9
hostPort: localhost:3306
10
connectionOptions: {}
11
connectionArguments: {}
12
sourceConfig:
13
config:
14
queryLogFilePath: <path to query log file>
15
processor:
16
type: query-parser
17
config:
18
filter: ''
19
stage:
20
type: table-usage
21
config:
22
filename: /tmp/query_log_usage
23
bulkSink:
24
type: metadata-usage
25
config:
26
filename: /tmp/query_log_usage
27
workflowConfig:
28
openMetadataServerConfig:
29
hostPort: http://localhost:8585/api
30
authProvider: no-auth
Copied!
The serviceName and serviceConnection used in the above config has to be the same as used during Metadata Ingestion.
The sourceConfig is defined here.
  • queryLogFilePath: Enter the file path of query log csv file.

2. Run with the CLI

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
1
metadata ingest -c <path-to-yaml>
Copied!
Note that from connector-to-connector, this recipe will always be the same. By updating the JSON configuration, you will be able to extract metadata from different sources.