how-to-guides

No menu items for this category
OpenMetadata Documentation

Getting Started with Data Quality as Code

This guide will help you install the OpenMetadata Python SDK and configure authentication to start running data quality tests programmatically.

Before you begin, ensure you have:

  • Python 3.10 or higher installed
  • pip package manager
  • Access to an OpenMetadata instance (version 1.11.0 or later)
  • A JWT token for authentication (see Authentication below)

Install the openmetadata-ingestion package with the necessary extras for your use case:

Install additional dependencies based on the databases you'll be testing:

If you plan to use DataFrame validation features:

Combine multiple extras as needed:

Data Quality as Code requires authentication with your OpenMetadata instance. The SDK supports JWT token authentication.

You can obtain a JWT token in two ways:

OpenMetadata provides pre-configured bots like the ingestion-bot:

  1. Log in to your OpenMetadata instance
  2. Navigate to Settings > Bots
  3. Find the ingestion-bot (or create a new bot)
  4. Copy the JWT token
Obtain Bot JWT Token

Obtain Bot JWT Token from OpenMetadata UI

For production use, create a dedicated bot with specific permissions:

  1. Go to Settings > Bots
  2. Click Add Bot
  3. Provide a name and description
  4. Assign appropriate roles (typically DefaultBotPolicy and Ingestion Bot Policy)
  5. Copy the generated JWT token

Once you have a JWT token, configure the SDK in your Python code:

For better security, let configure pick them up from environment variables:

Set the environment variable before running your script:

The configure() function accepts the following parameters:

ParameterTypeRequiredDescriptionEnvironment Variable
hoststrNoOpenMetadata API URL (e.g., http://localhost:8585/api)OPENMETADATA_HOST
jwt_tokenstrNoJWT authentication tokenOPENMETADATA_JWT_TOKEN

Create a simple test to verify your setup:

Replace "your_service.database.schema.table" with the fully qualified name of an actual table in your OpenMetadata instance.

Now that you're set up, let's run your first data quality test:

If you experience connection timeouts, verify:

  1. OpenMetadata instance is running and accessible
  2. API URL is correct (should end with /api)
  3. Network connectivity between your script and OpenMetadata
  4. Firewall rules allow the connection

If you encounter import errors:

Verify the package is installed correctly:

If not listed, reinstall:

Now that you have the SDK installed and configured: