TestRunner - Running Table-Level Tests
The TestRunner class provides a fluent API for executing data quality tests against tables cataloged in OpenMetadata. It automatically fetches table metadata and service connections, allowing you to run tests with minimal configuration.
Table of contents
- Overview
- Basic Usage
- Complete Example
- Running Tests from OpenMetadata's UI
- Customizing Test Metadata
- Configuring Row Count Computation
- Test Runner Configuration
- Understanding Test Results
- Integration with ETL Workflows
- Error Handling
- Best Practices
- Using External Secrets Managers
- Next Steps
โ ๏ธ If you're using Collate Cloud to run OpenMetadata, please refer to the section about External Secrets Managers
Overview
TestRunner enables you to:
- Execute tests defined in code against cataloged tables
- Run tests previously configured in the OpenMetadata UI
- Load test definitions from YAML workflow files
- Validate data at the table and column levels
- Get detailed test results for programmatic handling
Basic Usage
Creating a TestRunner
Create a runner for a specific table using its fully qualified name (FQN):
The table FQN format is: {service}.{database}.{schema}.{table}
Adding Tests
Add test definitions to the runner:
Adding Multiple Tests
Use add_tests() to add several tests at once:
Running Tests
Execute all configured tests:
Complete Example
Here's a complete example of testing a customer table:
Running Tests from OpenMetadata UI
Instead of defining tests in code, you can run tests that data stewards have configured in the OpenMetadata UI. This enables a collaborative workflow where:
- Data stewards define and maintain test criteria in the UI
- Engineers execute those tests automatically in pipelines
This approach ensures:
- Test definitions stay synchronized with business requirements
- Engineers don't need to modify code when test criteria change
- All stakeholders own data quality
Customizing Test Metadata
You can customize test names, display names, and descriptions:
Or pass values directly to the constructor:
Configuring Row Count Computation
Some tests support computing the number and percentage of rows that passed or failed:
This provides detailed metrics about test failures, useful for:
- Identifying the scope of data quality issues
- Prioritizing remediation efforts
- Tracking data quality trends over time
Test Runner Configuration
Customize the test runner behavior using the setup() method:
Configuration Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
force_test_update | bool | False | Force update even if tests already exist |
log_level | LogLevels | INFO | Logging level (DEBUG, INFO, WARN, ERROR) |
raise_on_error | bool | False | Raise exceptions if test data already exists |
success_threshold | int | 90 | Percentage threshold for overall success |
enable_streamable_logs | bool | False | Enable streamable log output |
Understanding Test Results
Test results contain detailed information about test execution:
Test Status Values
Success: Test passed all validation criteriaFailed: Test did not meet validation criteriaAborted: Test execution was interrupted or could not complete
Integration with ETL Workflows
Integrate TestRunner into your ETL pipelines:
Error Handling
Handle potential errors gracefully:
Best Practices
Use descriptive test names: Make test failures easy to understand
Leverage UI-defined tests: Let data stewards define test criteria
Handle results programmatically: Don't just print - take action
Use appropriate thresholds: Set realistic min/max values based on data patterns
Combine table and column tests: Ensure both structural and content quality
Using External Secrets Managers
Important Note
If your OpenMetadata instance uses database-stored credentials (the default configuration), you do not need to follow this guide. The SDK will automatically retrieve and decrypt credentials.
This guide is only necessary when your organization uses an external secrets manager for credential storage.
Why This is Required
The TestRunner API executes data quality tests directly from your Python code (e.g., within your ETL pipelines). To connect to your data sources, it needs to:
- Retrieve the service connection configuration from OpenMetadata
- Decrypt the credentials stored in your secrets manager
- Establish a connection to the data source
- Execute the test cases
Without proper secrets manager configuration, the SDK cannot decrypt credentials and will fail to connect to your data sources.
General Setup Steps
Contact your OpenMetadata/Collate administrator to obtain:
- The secrets manager type (AWS, Azure, GCP, etc.)
- The secrets manager loader configuration
- Required environment variables or configuration files
- Any additional setup (IAM roles, service principals, etc.)
Install required dependencies for your secrets manager provider
Configure environment variables with access credentials
Initialize the SecretsManagerFactory before using TestRunner
Configure the SDK and run your tests
Example using AWS Secrets Manager
Required Dependencies:
Example Configuration:
Configuration by Provider
AWS and AWS Parameters Store
OpenMetadata's ingestion extras: aws (e.g pip install 'openmetadata-ingestion[aws]')
SecretsManagerProvider: (one of)
SecretsManagerProvider.awsSecretsManagerProvider.managed_awsSecretsManagerProvider.aws_ssmSecretsManagerProvider.managed_aws_ssm
Environment variables:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_DEFAULT_REGION
Azure Key Vault
OpenMetadata's ingestion extras: azure (e.g pip install 'openmetadata-ingestion[azure]')
SecretsManagerProvider: (one of)
SecretsManagerProvider.azure_kvSecretsManagerProvider.managed_azure_kv
Environment variables:
AZURE_CLIENT_IDAZURE_CLIENT_SECRETAZURE_TENANT_IDAZURE_KEY_VAULT_NAME
Google Cloud Secret Manager
OpenMetadata's ingestion extras: gcp (e.g pip install 'openmetadata-ingestion[gcp]')
SecretsManagerProvider: SecretsManagerProvider.gcp
Environment variables:
GOOGLE_APPLICATION_CREDENTIALS: path to the file with the credentials json fileGCP_PROJECT_ID
Troubleshooting
Error: "Cannot decrypt service connection"
Cause: Secrets manager not initialized or misconfigured
Solution: Ensure SecretsManagerFactory is initialized before calling configure() or creating the TestRunner
Error: "Access Denied" or "Unauthorized"
Cause: Insufficient permissions to access secrets
Solution:
- Verify IAM role/service principal has correct permissions
- Check credentials are valid and not expired
- Ensure correct region/vault name is specified
Error: "Module not found" for secrets manager
Cause: Missing dependencies for your secrets manager
Solution: Install required extras:
Tests Fail with Connection Errors
Cause: Credentials not properly decrypted or secrets manager misconfigured
Solution:
- Verify secrets manager provider matches your OpenMetadata backend configuration
- Test credential access independently (e.g., using AWS CLI, Azure CLI, gcloud)
- Check network connectivity to secrets manager service
- Enable debug logging to see detailed error messages:
Contact Your Administrator
If you're unsure about:
- Which secrets manager your organization uses
- Required environment variables or configuration
- Access credentials or IAM roles
- Permissions needed
Contact your OpenMetadata or Collate administrator for the specific configuration required in your environment.
Next Steps
- Learn about DataFrame Validation for validating transformations
- Review the Test Definitions Reference for all available tests
- Explore Advanced Usage including YAML workflows