how-to-guides

No menu items for this category
OpenMetadata Documentation

Test Definitions Reference

This page provides a complete reference for all data quality test definitions available in the OpenMetadata Python SDK. Tests are organized into two categories: Table-Level Tests and Column-Level Tests.

All test definitions are available from the metadata.sdk.data_quality module:

All test definitions support these optional parameters:

ParameterTypeDescription
namestrUnique identifier for the test case
display_namestrHuman-readable name shown in UI
descriptionstrDetailed description of what the test validates

Column tests additionally require:

ParameterTypeDescription
columnstrName of the column to test (required)

Table-level tests validate properties of entire tables, such as row counts, column counts, or custom SQL queries.

Validates that the number of rows in a table falls within a specified range.

Parameters:

  • min_count (int, optional): Minimum acceptable number of rows
  • max_count (int, optional): Maximum acceptable number of rows

Example:

Use Cases:

  • Monitor data volume and detect data loss
  • Validate expected data growth patterns
  • Detect unexpected data surges

Validates that the table has an exact number of rows.

Parameters:

  • row_count (int, required): Expected number of rows

Example:

Use Cases:

  • Validate fixed-size reference tables
  • Ensure complete dimension table loads
  • Verify static lookup tables

Validates that the number of columns in a table falls within a specified range.

Parameters:

  • min_count (int, optional): Minimum acceptable number of columns
  • max_count (int, optional): Maximum acceptable number of columns

Example:

Use Cases:

  • Schema validation
  • Detect unexpected column additions or removals
  • Monitor schema evolution

Validates that the table has an exact number of columns.

Parameters:

  • column_count (int, required): Expected number of columns

Example:

Use Cases:

  • Strict schema validation
  • Ensure schema stability
  • Prevent schema drift

Validates that a specific column exists in the table schema.

Parameters:

  • column_name (str, required): Name of the column that must exist

Example:

Use Cases:

  • Verify required columns are present
  • Ensure critical columns aren't dropped
  • Validate schema migrations

Validates that table columns match an expected set of column names.

Parameters:

  • column_names (list[str], required): List of expected column names
  • ordered (bool, optional): If True, column order must match exactly (default: False)

Example:

Use Cases:

  • Validate complete schema structure
  • Ensure schema consistency across environments
  • Detect unexpected schema changes

Validates that the number of rows inserted within a time range is within bounds.

Parameters:

  • min_count (int, optional): Minimum acceptable number of inserted rows
  • max_count (int, optional): Maximum acceptable number of inserted rows
  • range_type (str, optional): Time unit ("HOUR", "DAY", "WEEK", "MONTH") (default: "DAY")
  • range_interval (int, optional): Number of time units to look back (default: 1)

Example:

Use Cases:

  • Monitor data ingestion rates
  • Detect ingestion pipeline failures
  • Validate ETL job completions

Validates data using a custom SQL query expression.

Parameters:

  • sql_expression (str, required): SQL query to execute
  • strategy (str, optional): "ROWS" (count failing rows) or "COUNT" (expect a count) (default: "ROWS")

Example:

Use Cases:

  • Implement custom business logic validation
  • Validate referential integrity
  • Check complex data relationships

Compares two tables and identifies differences in their data.

Parameters:

  • table2 (str, required): Fully qualified name of the comparison table
  • key_columns (list[str], optional): Columns to use as join keys
  • table2_key_columns (list[str], optional): Join key columns from table 2
  • use_columns (list[str], optional): Specific columns to compare
  • extra_columns (list[str], optional): Additional columns to include in output
  • table2_extra_columns (list[str], optional): Additional columns from table 2

Example:

Use Cases:

  • Validate data migrations
  • Verify data replication
  • Compare production vs staging data

Column-level tests validate properties of specific columns.

Validates that a column contains no null or missing values.

Parameters:

  • column (str, required): Name of the column to validate

Example:

Use Cases:

  • Ensure required fields are populated
  • Validate data completeness
  • Enforce NOT NULL constraints

Validates that all values in a column are unique with no duplicates.

Parameters:

  • column (str, required): Name of the column to validate

Example:

Use Cases:

  • Validate primary keys
  • Ensure unique identifiers
  • Detect duplicate records

Validates that all values in a column belong to a specified set.

Parameters:

  • column (str, required): Name of the column to validate
  • allowed_values (list[str], required): List of acceptable values

Example:

Use Cases:

  • Validate enum values
  • Enforce categorical constraints
  • Validate lookup values

Validates that column values do not contain any forbidden values.

Parameters:

  • column (str, required): Name of the column to validate
  • forbidden_values (list[str], required): List of values that must not appear

Example:

Use Cases:

  • Detect test data in production
  • Blacklist invalid values
  • Filter out placeholder values

Validates that column values match a specified regular expression pattern.

Parameters:

  • column (str, required): Name of the column to validate
  • regex (str, required): Regular expression pattern

Example:

Use Cases:

  • Validate data format consistency
  • Ensure pattern compliance
  • Detect malformed data

Validates that column values do not match a forbidden regular expression pattern.

Parameters:

  • column (str, required): Name of the column to validate
  • regex (str, required): Regular expression pattern that values must NOT match

Example:

Use Cases:

  • Detect test data patterns
  • Prevent specific formats
  • Identify security risks

Validates that all values in a column fall within a specified numeric range.

Parameters:

  • column (str, required): Name of the column to validate
  • min_value (float, optional): Minimum acceptable value
  • max_value (float, optional): Maximum acceptable value

Example:

Use Cases:

  • Validate numeric constraints
  • Detect outliers
  • Ensure value ranges

Validates that the maximum value in a column falls within a specified range.

Parameters:

  • column (str, required): Name of the column to validate
  • min_value (float, optional): Minimum acceptable maximum value
  • max_value (float, optional): Maximum acceptable maximum value

Example:

Use Cases:

  • Monitor data ranges
  • Detect upper outliers
  • Validate maximum constraints

Validates that the minimum value in a column falls within a specified range.

Parameters:

  • column (str, required): Name of the column to validate
  • min_value (float, optional): Minimum acceptable minimum value
  • max_value (float, optional): Maximum acceptable minimum value

Example:

Use Cases:

  • Monitor lower bounds
  • Detect lower outliers
  • Validate minimum constraints

Validates that the mean (average) value falls within a specified range.

Parameters:

  • column (str, required): Name of the column to validate
  • min_value (float, optional): Minimum acceptable mean value
  • max_value (float, optional): Maximum acceptable mean value

Example:

Use Cases:

  • Statistical validation
  • Detect data drift
  • Monitor averages

Validates that the median value falls within a specified range.

Parameters:

  • column (str, required): Name of the column to validate
  • min_value (float, optional): Minimum acceptable median value
  • max_value (float, optional): Maximum acceptable median value

Example:

Use Cases:

  • Robust central tendency checks
  • Detect skewed distributions
  • Monitor typical values

Validates that the standard deviation falls within a specified range.

Parameters:

  • column (str, required): Name of the column to validate
  • min_value (float, optional): Minimum acceptable standard deviation
  • max_value (float, optional): Maximum acceptable standard deviation

Example:

Use Cases:

  • Detect unexpected variability
  • Monitor data consistency
  • Validate distribution stability

Validates that the sum of all values falls within a specified range.

Parameters:

  • column (str, required): Name of the column to validate
  • min_value (float, optional): Minimum acceptable sum
  • max_value (float, optional): Maximum acceptable sum

Example:

Use Cases:

  • Validate totals
  • Monitor aggregates
  • Detect unexpected volumes

Validates the count of missing or null values.

Parameters:

  • column (str, required): Name of the column to validate
  • missing_count_value (int, optional): Expected number of missing values
  • missing_value_match (list[str], optional): Additional strings to treat as missing

Example:

Use Cases:

  • Monitor data completeness
  • Track missing data patterns
  • Validate optional fields

Validates that string lengths fall within a specified range.

Parameters:

  • column (str, required): Name of the column to validate
  • min_length (int, optional): Minimum acceptable string length
  • max_length (int, optional): Maximum acceptable string length

Example:

Use Cases:

  • Validate string constraints
  • Prevent truncation
  • Ensure format compliance

Validates that a specific value appears at an expected row position.

Parameters:

  • column (str, required): Name of the column to validate
  • expected_value (str, required): The exact value expected
  • row_index (int, optional): Zero-based row position (default: 0)

Example:

Use Cases:

  • Validate sorted data
  • Check ordered results
  • Verify specific positions

All tests support customization through fluent methods:

Or pass values directly to the constructor: