how-to-guides

No menu items for this category

Tests in the YAML Config

Here you can see all the supported tests definitions and how to configure them in the YAML config file.

A Test Definition is a generic definition of a test. This Test Definition then gets specified in a Test Case. This Test Case is where the parameter(s) of a Test Definition are specified.

In this section, you will learn what tests we currently support and how to configure them in the YAML/JSON config file.

Tests applied on top of a Table. Here is the list of all table tests:

Validate the total row count in the table is equal to the given value.

Dimension: Integrity

Properties:

  • value: Expected number of rows.

Behavior

ConditionStatus
value match the number of rows in the tableSuccess ✅
value does not match the number of rows in the tableFailed ❌

YAML Config

JSON Config

Validate the total row count is within a given range of values.

Dimension: Integrity

Properties:

  • minValue: Lower bound of the interval. If informed, the number of rows should be bigger than this number.
  • maxValue: Upper bound of the interval. If informed, the number of rows should be lower than this number.

Any of those two need to be informed.

Behavior

ConditionStatus
The number of rows in the table is between minValue and maxValueSuccess ✅
The number of rows in the table is not between minValue and maxValueFailed ❌

YAML Config

JSON Config

Validate that the number of columns in a table is equal to a given value.

Dimension: Integrity

Properties

  • columnCount: Expected number of columns.

Behavior

ConditionStatus
columnCount matches the number of column in the tableSuccess ✅
columnCount does not matches the number of column in the tableFailed ❌

YAML Config

JSON Config

Validate the number of columns in a table is between the given value

Dimension: Integrity

Properties

  • minColValue: lower bound
  • maxColValue: upper bound

Behavior

ConditionStatus
The number of columns in the table is between minColValue and maxColValueSuccess ✅
The number of columns in the table is not between minColValue and maxColValueFailed ❌

YAML Config

JSON Config

Validate a column name is present in the table

Dimension: Integrity

Properties

  • columnName: the name of the column to check for

Behavior

ConditionStatus
columnName exists in the set of column name for the tableSuccess ✅
columnName does not exists in the set of column name for the tableFailed ❌

YAML Config

JSON Config

Validate a list of table column name matches an expected set of columns

Dimension: Integrity

Properties

  • columnNames: comma separated string of column name
  • ordered: whether the test should check for column ordering. Default to False

Behavior

ConditionStatus
[ordered=False] columnNames matches the list of column names in the table regardless of the orderSuccess ✅
[ordered=True] columnNames matches the list of column names in the table in the corresponding order (e.g. ["a","b"] == ["a","b"]Success ✅
[ordered=FALSE] columnNames does no match the list of column names in the table regardless of the orderFailed ❌
[ordered=True] columnNames does no match the list of column names in the table and/or the corresponding order (e.g. ["a","b"] != ["b","a"]Failed ❌

YAML Config

JSON Config

Write you own SQL test. When writting your query you can use 2 strategies:

  • ROWS (default): expects the query to be written as SELECT <field>, <field> FROM <foo> WHERE <condition>. Note if your query returns a large amount of rows it might cause an "Out Of Memeory" error. In this case we recomend you to use the COUNT strategy.
  • COUNT: expects the query to be written as SELECT COUNT(<field>) FROM <foo> WHERE <condition>.

How to use the Threshold Parameter? The threshold allows you to define a limit for which you test should pass or fail - by defaut this number is 0. For example if my custom SQL query test returns 10 rows (or a COUNT value of 10) and my threshold is 5 the test will fail. If I update my threshold to 11 on my next run my test will pass.

Properties

  • sqlExpression: SQL expression
  • strategy: one of ROWS or COUNT
  • threshold: an integer defining the threshold above which the test should fail (default to 0 if not specified)

Behavior

ConditionStatus
sqlExpression returns row <= threshold (default to 0)Success ✅
sqlExpression returns row > threshold (default to 0)Failed ❌

Example

YAML Config

JSON Config

Validate the number of rows inserted for the defined period is between the expected range

Dimension: Integrity

Properties

  • Min Row Count: Lower bound
  • Max Row Count: Upper bound
  • Column Name: The name of the column used to apply the range filter
  • Range Type: One of HOUR, DAY, MONTH, YEAR
  • Interval: The range interval (e.g. 1,2,3,4,5, etc)

Behavior

ConditionStatus
Number of rows is between Min Row Count and Max Row CountSuccess ✅
Number of rows is not between Min Row Count and `Max Row CountFailed ❌

YAML Config

JSON Config

Compare 2 tables for differences. Allows a user to check for integrity. Supports comparing tables across different services. For example, you can compare a table in Snowflake with a table in Redshift.

Supported connectors:

  • Snowflake
  • BigQuery
  • Athena
  • Redshift
  • Postgres
  • MySQL
  • MSSQL
  • Oracle
  • Trino
  • SAP Hana

Dimension: Consistency

Properties

  • keyColumns: The key column to use as the key for the comparison. Resolves to the primary key (if defined) if not set
  • useColumns: The columns against which the comparison will done. If not provided it will use all the columns
  • table2: The table against which the comparison will be done. Must be the fully qualified name as defined in OpenMetadata
  • threshold: The threshold of different rows above which the test should fail -- default to 0
  • where: Any where clause to pass
  • caseSensitiveColumns: Whether the column comparison should be case sensitive or not. Default to false.

Behavior

ConditionStatus
Number of rows is greater than the threshold (default to 0)Failed ❌
Number of rows is less than or equal to the thresholdSuccess ✅

YAML Config

JSON Config

Validate the freshness of a table's data.

Dimension: Accuracy

Properties

  • column: the colummn that will be used to chech the table freshness
  • timeSinceUpdate: (in seconds) The data is expected to be updated within this number of seconds. If the time since the last update is greater than this value, the test will fail.

Behavior

ConditionStatus
Time since update is greater than timeSinceUpdateFailed ❌
Time since update is less than or equal to timeSinceUpdateSuccess ✅

YAML Config

JSON Config

Tests applied on top of Column metrics. Here is the list of all column tests:

Makes sure that there are no duplicate values in a given column.

Dimension: Uniqueness

Behavior

ConditionStatus
column values are uniqueSuccess ✅
column values are not uniqueFailed ❌

Properties

  • columnValuesToBeUnique: To be set as true. This is required for proper JSON parsing in the profiler module.

YAML Config

JSON Config

Validates that there are no null values in the column.

Dimension: Completeness

Properties

  • columnValuesToBeNotNull: To be set as true. This is required for proper JSON parsing in the profiler module.

Behavior

ConditionStatus
No NULL values are present in the columnSuccess ✅
1 or more NULL values are present in the columnFailed ❌

YAML Config

JSON Config

This test allows us to specify how many values in a column we expect that will match a certain regex expression. Please note that for certain databases we will fall back to SQL LIKE expression. The databases supporting regex pattern as of 0.13.2 are:

  • redshift
  • postgres
  • oracle
  • mysql
  • mariaDB
  • sqlite
  • clickhouse
  • snowflake

The other databases will fall back to the LIKE expression

Dimension: Validity

Properties

  • regex: expression to match a regex pattern. E.g., [a-zA-Z0-9]{5}.

Behavior

ConditionStatus
All column values match regexSuccess ✅
1 or more column values do not match regexFailed ❌

YAML Config

JSON Config

This test allows us to specify values in a column we expect that will not match a certain regex expression. If the test find values matching the forbiddenRegex the test will fail. Please note that for certain databases we will fall back to SQL LIKE expression. The databases supporting regex pattern as of 0.13.2 are:

  • redshift
  • postgres
  • oracle
  • mysql
  • mariaDB
  • sqlite
  • clickhouse
  • snowflake

The other databases will fall back to the LIKE expression

Dimension: Validity

Properties

  • regex: expression to match a regex pattern. E.g., [a-zA-Z0-9]{5}.

Behavior

ConditionStatus
0 column value match regexSuccess ✅
1 or more column values match regexFailed ❌

YAML Config

JSON Config

Validate values form a set are present in a column.

Dimension: Validity

Properties

  • allowedValues: List of allowed strings or numbers.

Behavior

ConditionStatus
matchEnum is false and 1 or more values from allowedValues is found in the columnSuccess ✅
matchEnum is true and all columns have a value from allowedValuesSuccess ✅
matchEnum is false 0 value from allowedValues is found in the columnFailed ❌
matchEnum is true and 1 or more columns does not have a vluae from allowedValuesFailed ❌

YAML Config

JSON Config

JSON Config

Validate that there are no values in a column in a set of forbidden values.

Dimension: Validity

Properties

  • forbiddenValues: List of forbidden strings or numbers.

Behavior

ConditionStatus
0 value from forbiddenValues is found in the columnSuccess ✅
1 or more values from forbiddenValues is found in the columnFailed ❌

YAML Config

JSON Config

Validate that the values of a column are within a given range.

Only supports numerical types.

Dimension: Accuracy

Properties

  • minValue: Lower bound of the interval. If informed, the column values should be bigger than this number.
  • maxValue: Upper bound of the interval. If informed, the column values should be lower than this number.

Any of those two need to be informed.

Behavior

ConditionStatus
value is between minValue and maxValueSuccess ✅
value is greater than minValue if only minValue is specifiedSuccess ✅
value is less then maxValue if only maxValue is specifiedSuccess ✅
value is not between minValue and maxValueFailed ❌
value is less than minValue if only minValue is specifiedFailed ❌
value is greater then maxValue if only maxValue is specifiedFailed ❌

YAML Config

JSON Config

Validates that the number of missing values matches a given number. Missing values are the sum of nulls, plus the sum of values in a given list which we need to consider as missing data. A clear example of that would be NA or N/A.

Dimension: Completeness

Properties

  • missingCountValue: The number of missing values needs to be equal to this. This field is mandatory.
  • missingValueMatch (Optional): A list of strings to consider as missing values.

Behavior

ConditionStatus
Number of missing value is equal to missingCountValueSuccess ✅
Number of missing value is not equal to missingCountValueFailed ❌

YAML Config

JSON Config

JSON Config

Validates that the lengths of the strings in a column are within a given range.

Only supports concatenable types.

Dimension: Accuracy

Properties

  • minLength: Lower bound of the interval. If informed, the string length should be bigger than this number.
  • maxLength: Upper bound of the interval. If informed, the string length should be lower than this number.

Any of those two need to be informed.

Behavior

ConditionStatus
value length is between minLength and maxLengthSuccess ✅
value length is greater than minLength if only minLength is specifiedSuccess ✅
value length is less then maxLength if only maxLength is specifiedSuccess ✅
value length is not between minLength and maxLengthFailed ❌
value length is less than minLength if only minLength is specifiedFailed ❌
value length is greater then maxLength if only maxLength is specifiedFailed ❌

YAML Config

JSON Config

Validate the maximum value of a column is between a specific range

Only supports numerical types.

Dimension: Accuracy

Properties

  • minValueForMaxInCol: lower bound
  • maxValueForMaxInCol: upper bound

Behavior

ConditionStatus
column max value is between minValueForMaxInCol and maxValueForMaxInColSuccess ✅
column max value is greater than minValueForMaxInCol if only minValueForMaxInCol is specifiedSuccess ✅
column max value is less then maxValueForMaxInCol if only maxValueForMaxInCol is specifiedSuccess ✅
column max value is not between minValueForMaxInCol and maxValueForMaxInColFailed ❌
column max value is less than minValueForMaxInCol if only minValueForMaxInCol is specifiedFailed ❌
column max value is greater then maxValueForMaxInCol if only maxValueForMaxInCol is specifiedFailed ❌

YAML Config

JSON Config

Validate the minimum value of a column is between a specific range

Only supports numerical types.

Dimension: Accuracy

Properties

  • minValueForMinInCol: lower bound
  • maxValueForMinInCol: upper bound

Behavior

ConditionStatus
column min value is between minValueForMinInCol and maxValueForMinInColSuccess ✅
column min value is greater than minValueForMinInCol if only minValueForMinInCol is specifiedSuccess ✅
column min value is less then maxValueForMinInCol if only maxValueForMinInCol is specifiedSuccess ✅
column min value is not between minValueForMinInCol and maxValueForMinInColFailed ❌
column min value is less than minValueForMinInCol if only minValueForMinInCol is specifiedFailed ❌
column min value is greater then maxValueForMinInCol if only maxValueForMinInCol is specifiedFailed ❌

YAML Config

JSON Config

Validate the mean of a column is between a specific range

Only supports numerical types.

Dimension: Accuracy

Properties

  • minValueForMeanInCol: lower bound
  • maxValueForMeanInCol: upper bound

Behavior

ConditionStatus
column mean value is between minValueForMeanInCol and maxValueForMeanInColSuccess ✅
column mean value is greater than minValueForMeanInCol if only minValueForMeanInCol is specifiedSuccess ✅
column mean value is less then maxValueForMeanInCol if only maxValueForMeanInCol is specifiedSuccess ✅
column mean value is not between minValueForMeanInCol and maxValueForMeanInColFailed ❌
column mean value is less than minValueForMeanInCol if only minValueForMeanInCol is specifiedFailed ❌
column mean value is greater then maxValueForMeanInCol if only maxValueForMeanInCol is specifiedFailed ❌

YAML Config

JSON Config

Validate the median of a column is between a specific range

Only supports numerical types.

Dimension: Accuracy

Properties

  • minValueForMedianInCol: lower bound
  • maxValueForMedianInCol: upper bound

Behavior

ConditionStatus
column median value is between minValueForMedianInCol and maxValueForMedianInColSuccess ✅
column median value is greater than minValueForMedianInCol if only minValueForMedianInCol is specifiedSuccess ✅
column median value is less then maxValueForMedianInCol if only maxValueForMedianInCol is specifiedSuccess ✅
column median value is not between minValueForMedianInCol and maxValueForMedianInColFailed ❌
column median value is less than minValueForMedianInCol if only minValueForMedianInCol is specifiedFailed ❌
column median value is greater then maxValueForMedianInCol if only maxValueForMedianInCol is specifiedFailed ❌

YAML Config

JSON Config

Validate the sum of a column is between a specific range

Only supports numerical types.

Dimension: Accuracy

Properties

  • minValueForColSum: lower bound
  • maxValueForColSum: upper bound

Behavior

ConditionStatus
Sum of the column values is between minValueForColSum and maxValueForColSumSuccess ✅
Sum of the column values is greater than minValueForColSum if only minValueForColSum is specifiedSuccess ✅
Sum of the column values is less then maxValueForColSum if only maxValueForColSum is specifiedSuccess ✅
Sum of the column values is not between minValueForColSum and maxValueForColSumFailed ❌
Sum of the column values is less than minValueForColSum if only minValueForColSum is specifiedFailed ❌
Sum of the column values is greater then maxValueForColSum if only maxValueForColSum is specifiedFailed ❌

YAML Config

JSON Config

Validate the standard deviation of a column is between a specific range

Only supports numerical types.

Dimension: Accuracy

Properties

  • minValueForStdDevInCol: lower bound
  • minValueForStdDevInCol: upper bound

Behavior

ConditionStatus
column values standard deviation is between minValueForStdDevInCol and minValueForStdDevInColSuccess ✅
column values standard deviation is greater than minValueForStdDevInCol if only minValueForStdDevInCol is specifiedSuccess ✅
column values standard deviation is less then minValueForStdDevInCol if only minValueForStdDevInCol is specifiedSuccess ✅
column values standard deviation is not between minValueForStdDevInCol and minValueForStdDevInColFailed ❌
column values standard deviation is less than minValueForStdDevInCol if only minValueForStdDevInCol is specifiedFailed ❌
column values standard deviation is greater then minValueForStdDevInCol if only minValueForStdDevInCol is specifiedFailed ❌

YAML Config

JSON Config

Validate the reference value for a column is a the expected geographic location

Data will be temporarely stored in memory while the test case is running to validate the location. Not data will be permanently stored. France is the only supported location at this time. To add any additional location please reach out to the team in our slack support channel

Dimension: Accuracy

Properties

  • locationReferenceType: the type of location refernce CITY or POSTAL_CODE
  • longitudeColumnName: longitude column name
  • latitudeColumnName: latitude column name
  • radius: radius in meter from which the location can be from the expected lat/long -- acts as a buffer

Behavior

ConditionStatus
column values lat/long is within the polygon of the column reference (+/- radius)Success ✅
column values lat/long is outside the polygon of the column reference (+/- radius)Failed ❌

YAML Config

JSON Config