sdk

No menu items for this category
OpenMetadata Documentation

DataFrame validation API.

Classes:

Attributes:

Facade for DataFrame data quality validation.

Provides a simple interface to configure and execute data quality tests on pandas DataFrames using OpenMetadata test definitions.

Examples:

validator = DataFrameValidator() validator.add_test(ColumnValuesToBeNotNull(column="email")) validator.add_test(ColumnValuesToBeUnique(column="customer_id"))

result = validator.validate(df, mode=FailureMode.ShortCircuit) if not result.success: print(f"Validation failed: {result.failures}")

Functions:

Add a single test definition to be executed.

Parameters:

  • test (BaseTest) – Test definition (e.g., ColumnValuesToBeNotNull)

Add multiple test definitions at once.

Parameters:

  • *tests (BaseTest) – Variable number of test definitions

Execute all configured tests on the DataFrame and call callbacks.

Useful for running validation based on chunks, for example:

Parameters:

  • data (Iterable) – An iterable of pandas DataFrames
  • on_success (ValidatorCallback) – Callback to execute after successful validation
  • on_failure (ValidatorCallback) – Callback to execute after failed validation
  • mode (FailureMode) – Validation mode (FailureMode.ShortCircuit stops on first failure)

Returns:

  • ValidationResult – Merged ValidationResult aggregating all batch validations

Execute all configured tests on the DataFrame.

Parameters:

  • df (DataFrame) – DataFrame to validate
  • mode (FailureMode) – Validation mode (FailureMode.ShortCircuit stops on first failure)

Returns:

  • ValidationResult – ValidationResult with outcomes for all tests