Auto Ingest dbt-core
Learn how to automatically ingest dbt-core artifacts into OpenMetadata using the simplified metadata ingest-dbt
CLI command that reads configuration directly from your dbt_project.yml
file.
This feature eliminates the need for separate YAML configuration files. All configuration is done directly in your existing dbt_project.yml
file.
Overview
The metadata ingest-dbt
command provides a streamlined way to ingest dbt artifacts into OpenMetadata by:
- Reading configuration directly from your
dbt_project.yml
file - Automatically discovering dbt artifacts (
manifest.json
,catalog.json
,run_results.json
) - Supporting comprehensive filtering and configuration options
Prerequisites
- dbt project setup: You must have a dbt project with a valid
dbt_project.yml
file - dbt artifacts: Run
dbt compile
ordbt run
to generate required artifacts in thetarget/
directory - OpenMetadata service: Your database service must already be configured in OpenMetadata
- OpenMetadata Python package: Install the OpenMetadata ingestion package
Dependencies: The package includes python-dotenv>=0.19.0
for automatic .env
file support, so no additional setup is required for environment variable functionality.
Quick Start
1. Configure your dbt_project.yml
Add the following variables to the vars
section of your dbt_project.yml
file:
Environment Variables: For security, you can use environment variables instead of hardcoding sensitive values. See the Environment Variables section below for supported patterns.
2. Generate dbt artifacts
3. Run the ingestion
If you're already in your dbt project directory:
Or if you're in a different directory:
Environment Variables
For security and flexibility, you can use environment variables in your dbt_project.yml
configuration instead of hardcoding sensitive values like JWT tokens. The system supports three different environment variable patterns:
Supported Patterns
Pattern | Description | Example |
---|---|---|
${VAR} | Shell-style variable substitution | "${OPENMETADATA_TOKEN}" |
{{ env_var("VAR") }} | dbt-style without default | "{{ env_var('OPENMETADATA_HOST') }}" |
{{ env_var("VAR", "default") }} | dbt-style with default value | "{{ env_var('SERVICE_NAME', 'default-service') }}" |
Environment Variables Example
Then set your environment variables:
Alternative: Using .env Files
For local development, you can create a .env
file in your dbt project directory:
Note: The system automatically loads environment variables from .env
files in both the dbt project directory and the current working directory. Environment variables set in the shell take precedence over .env
file values.
Error Handling: If a required environment variable is not set and no default is provided, the ingestion will fail with a clear error message indicating which variable is missing.
Configuration Options
Required Parameters
Parameter | Description |
---|---|
openmetadata_host_port | OpenMetadata server URL (must start with https:// ) |
openmetadata_jwt_token | JWT token for authentication |
openmetadata_service_name | Name of the database service in OpenMetadata |
Optional Parameters
Parameter | Default | Description |
---|---|---|
openmetadata_dbt_update_descriptions | true | Update table/column descriptions from dbt |
openmetadata_dbt_update_owners | true | Update model owners from dbt |
openmetadata_include_tags | true | Include dbt tags as OpenMetadata tags |
openmetadata_search_across_databases | false | Search for tables across multiple databases |
openmetadata_dbt_classification_name | null | Custom classification name for dbt tags |
Filter Patterns
Control which databases, schemas, and tables to include or exclude:
Complete Example
Command Options
Note: Global options like --version
, --log-level
, and --debug
are available at the main metadata
command level:
Artifacts Discovery
The command automatically discovers artifacts from your dbt project's target/
directory:
Artifact | Required | Description |
---|---|---|
manifest.json | ✅ Yes | Model definitions, relationships, and metadata |
catalog.json | ❌ Optional | Table and column statistics from dbt docs generate |
run_results.json | ❌ Optional | Test results from dbt test |
Generate All Artifacts
What Gets Ingested
- Model Definitions: Queries, configurations, and relationships
- Lineage: Table-to-table and column-level lineage
- Documentation: Model and column descriptions
- Data Quality: dbt test definitions and results
- Tags & Classification: Model and column tags
- Ownership: Model owners and team assignments
Error Handling & Troubleshooting
Common Issues
Issue | Solution |
---|---|
dbt_project.yml not found | Ensure you're in a valid dbt project directory |
Required configuration not found | Add openmetadata_* variables to your dbt_project.yml |
manifest.json not found | Run dbt compile or dbt run first |
Invalid URL format | Ensure openmetadata_host_port includes protocol (https:// ) |
Environment variable 'VAR' is not set | Set the required environment variable or provide a default value |
Environment variable not set and no default | Either set the environment variable or use the {{ env_var('VAR', 'default') }} pattern |
Debug Mode
Enable detailed logging:
Best Practices
Security
- Always use environment variables for sensitive data like JWT tokens
- Multiple patterns supported for flexibility:
- Never commit sensitive values directly to version control
Filtering
- Use specific patterns to exclude temporary/test tables
- Filter based on your organization's naming conventions
- Exclude system schemas and databases
Automation
- Integrate into CI/CD pipelines
- Run after successful dbt builds
- Set up scheduled ingestion for regular updates
CI/CD Integration
Next Steps
After successful ingestion:
- Explore your data in the OpenMetadata UI
- Configure additional dbt features like tags, tiers, and glossary
- Set up data governance policies and workflows
- Schedule regular ingestion for keeping metadata up-to-date
For additional troubleshooting, refer to the dbt Troubleshooting Guide.