TestRunner - Running Table-Level Tests

The TestRunner class provides a fluent API for executing data quality tests against tables cataloged in OpenMetadata. It automatically fetches table metadata and service connections, allowing you to run tests with minimal configuration.

⚠️ If you're using Collate Cloud to run OpenMetadata, please refer to the section about External Secrets Managers

Overview

TestRunner enables you to:

Execute tests defined in code against cataloged tables
Run tests previously configured in the OpenMetadata UI
Load test definitions from YAML workflow files
Validate data at the table and column levels
Get detailed test results for programmatic handling

Basic Usage

Creating a TestRunner

Create a runner for a specific table using its fully qualified name (FQN):

The table FQN format is: {service}.{database}.{schema}.{table}

Adding Tests

Add test definitions to the runner:

Adding Multiple Tests

Use add_tests() to add several tests at once:

Running Tests

Execute all configured tests:

Complete Example

Here's a complete example of testing a customer table:

Running Tests from OpenMetadata UI

Instead of defining tests in code, you can run tests that data stewards have configured in the OpenMetadata UI. This enables a collaborative workflow where:

Data stewards define and maintain test criteria in the UI
Engineers execute those tests automatically in pipelines

This approach ensures:

Test definitions stay synchronized with business requirements
Engineers don't need to modify code when test criteria change
All stakeholders own data quality

Customizing Test Metadata

You can customize test names, display names, and descriptions:

Or pass values directly to the constructor:

Configuring Row Count Computation

Some tests support computing the number and percentage of rows that passed or failed:

This provides detailed metrics about test failures, useful for:

Identifying the scope of data quality issues
Prioritizing remediation efforts
Tracking data quality trends over time

Test Runner Configuration

Customize the test runner behavior using the setup() method:

Configuration Parameters

Parameter	Type	Default	Description
`force_test_update`	`bool`	`False`	Force update even if tests already exist
`log_level`	`LogLevels`	`INFO`	Logging level (DEBUG, INFO, WARN, ERROR)
`raise_on_error`	`bool`	`False`	Raise exceptions if test data already exists
`success_threshold`	`int`	`90`	Percentage threshold for overall success
`enable_streamable_logs`	`bool`	`False`	Enable streamable log output

Understanding Test Results

Test results contain detailed information about test execution:

Test Status Values

Success: Test passed all validation criteria
Failed: Test did not meet validation criteria
Aborted: Test execution was interrupted or could not complete

Integration with ETL Workflows

Integrate TestRunner into your ETL pipelines:

Error Handling

Handle potential errors gracefully:

Best Practices

Use descriptive test names: Make test failures easy to understand
Leverage UI-defined tests: Let data stewards define test criteria
Handle results programmatically: Don't just print - take action
Use appropriate thresholds: Set realistic min/max values based on data patterns
Combine table and column tests: Ensure both structural and content quality

Using External Secrets Managers

Important Note

If your OpenMetadata instance uses database-stored credentials (the default configuration), you do not need to follow this guide. The SDK will automatically retrieve and decrypt credentials.

This guide is only necessary when your organization uses an external secrets manager for credential storage.

Why This is Required

The TestRunner API executes data quality tests directly from your Python code (e.g., within your ETL pipelines). To connect to your data sources, it needs to:

Retrieve the service connection configuration from OpenMetadata
Decrypt the credentials stored in your secrets manager
Establish a connection to the data source
Execute the test cases

Without proper secrets manager configuration, the SDK cannot decrypt credentials and will fail to connect to your data sources.

General Setup Steps

Contact your OpenMetadata/Collate administrator to obtain:
- The secrets manager type (AWS, Azure, GCP, etc.)
- The secrets manager loader configuration
- Required environment variables or configuration files
- Any additional setup (IAM roles, service principals, etc.)
Install required dependencies for your secrets manager provider
Configure environment variables with access credentials
Initialize the SecretsManagerFactory before using TestRunner
Configure the SDK and run your tests

Example using AWS Secrets Manager

Required Dependencies:

Example Configuration:

Configuration by Provider

AWS and AWS Parameters Store

OpenMetadata's ingestion extras: aws (e.g pip install 'openmetadata-ingestion[aws]')

SecretsManagerProvider: (one of)

SecretsManagerProvider.aws
SecretsManagerProvider.managed_aws
SecretsManagerProvider.aws_ssm
SecretsManagerProvider.managed_aws_ssm

Environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION

Azure Key Vault

OpenMetadata's ingestion extras: azure (e.g pip install 'openmetadata-ingestion[azure]')

SecretsManagerProvider: (one of)

SecretsManagerProvider.azure_kv
SecretsManagerProvider.managed_azure_kv

Environment variables:

AZURE_CLIENT_ID
AZURE_CLIENT_SECRET
AZURE_TENANT_ID
AZURE_KEY_VAULT_NAME

Google Cloud Secret Manager

OpenMetadata's ingestion extras: gcp (e.g pip install 'openmetadata-ingestion[gcp]')

SecretsManagerProvider: SecretsManagerProvider.gcp

Environment variables:

GOOGLE_APPLICATION_CREDENTIALS: path to the file with the credentials json file
GCP_PROJECT_ID

Troubleshooting

Error: "Cannot decrypt service connection"

Cause: Secrets manager not initialized or misconfigured

Solution: Ensure SecretsManagerFactory is initialized before calling configure() or creating the TestRunner

Error: "Access Denied" or "Unauthorized"

Cause: Insufficient permissions to access secrets

Solution:

Verify IAM role/service principal has correct permissions
Check credentials are valid and not expired
Ensure correct region/vault name is specified

Error: "Module not found" for secrets manager

Cause: Missing dependencies for your secrets manager

Solution: Install required extras:

Tests Fail with Connection Errors

Cause: Credentials not properly decrypted or secrets manager misconfigured

Solution:

Verify secrets manager provider matches your OpenMetadata backend configuration
Test credential access independently (e.g., using AWS CLI, Azure CLI, gcloud)
Check network connectivity to secrets manager service
Enable debug logging to see detailed error messages:

Contact Your Administrator

If you're unsure about:

Which secrets manager your organization uses
Required environment variables or configuration
Access credentials or IAM roles
Permissions needed

Contact your OpenMetadata or Collate administrator for the specific configuration required in your environment.

Next Steps

Learn about DataFrame Validation for validating transformations
Review the Test Definitions Reference for all available tests
Explore Advanced Usage including YAML workflows

TestRunner - Running Table-Level Tests

Table of contents

Overview

Basic Usage

Creating a TestRunner

Adding Tests

Adding Multiple Tests

Running Tests

Complete Example

Running Tests from OpenMetadata UI

Customizing Test Metadata

Configuring Row Count Computation

Test Runner Configuration

Configuration Parameters

Understanding Test Results

Test Status Values

Integration with ETL Workflows

Error Handling

Best Practices

Using External Secrets Managers

Important Note

Why This is Required

General Setup Steps

Example using AWS Secrets Manager

Configuration by Provider

AWS and AWS Parameters Store

Azure Key Vault

Google Cloud Secret Manager

Troubleshooting

Error: "Cannot decrypt service connection"

Error: "Access Denied" or "Unauthorized"

Error: "Module not found" for secrets manager

Tests Fail with Connection Errors

Contact Your Administrator

Next Steps