Documentation Index
Fetch the complete documentation index at: https://docs.open-metadata.org/llms.txt
Use this file to discover all available pages before exploring further.
Column Tests
Tests applied on top of Column metrics. Here is the list of all column tests:
Column Values to Be Unique
Makes sure that there are no duplicate values in a given column.
Dimension:
Uniqueness
Behavior
| Condition | Status |
|---|
| column values are unique | Success ✅ |
| column values are not unique | Failed ❌ |
Properties
columnValuesToBeUnique: To be set as true. This is required for proper JSON parsing in the profiler module.
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeUnique
computePassedFailedRowCount: <true or false>
parameterValues:
- name: columnNames
value: true
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeUnique",
"parameterValues": [
{
"name": "columnNames",
"value": true
}
]
}
Column Values to Be Not Null
Validates that there are no null values in the column.
Dimension:
Completeness
Properties
columnValuesToBeNotNull: To be set as true. This is required for proper JSON parsing in the profiler module.
Behavior
| Condition | Status |
|---|
No NULL values are present in the column | Success ✅ |
1 or more NULL values are present in the column | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeNotNull
computePassedFailedRowCount: <true or false>
parameterValues:
- name: columnValuesToBeNotNull
value: true
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeNotNull",
"parameterValues": [
{
"name": "columnValuesToBeNotNull",
"value": true
}
]
}
Column Values to Match Regex
This test allows us to specify how many values in a column we expect that will match a certain regex expression. Please note that for certain databases we will fall back to SQL LIKE expression. The databases supporting regex pattern as of 0.13.2 are:
- redshift
- postgres
- oracle
- mysql
- mariaDB
- sqlite
- clickhouse
- snowflake
The other databases will fall back to the LIKE expression
Dimension:
Validity
Properties
regex: expression to match a regex pattern. E.g., [a-zA-Z0-9]{5}.
Behavior
| Condition | Status |
|---|
All column values match regex | Success ✅ |
1 or more column values do not match regex | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToMatchRegex
computePassedFailedRowCount: <true or false>
parameterValues:
- name: regex
value: "%something%"
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "regex",
"value": "%something%"
}
]
}
Column Values to not Match Regex
This test allows us to specify values in a column we expect that will not match a certain regex expression. If the test find values matching the forbiddenRegex the test will fail. Please note that for certain databases we will fall back to SQL LIKE expression. The databases supporting regex pattern as of 0.13.2 are:
- redshift
- postgres
- oracle
- mysql
- mariaDB
- sqlite
- clickhouse
- snowflake
The other databases will fall back to the LIKE expression
Dimension:
Validity
Properties
regex: expression to match a regex pattern. E.g., [a-zA-Z0-9]{5}.
Behavior
| Condition | Status |
|---|
0 column value match regex | Success ✅ |
1 or more column values match regex | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToMatchRegex
computePassedFailedRowCount: <true or false>
parameterValues:
- name: forbiddenRegex
value: "%something%"
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "forbiddenRegex",
"value": "%something%"
}
]
}
Column Values to Be in Set
Validate values form a set are present in a column.
Dimension:
Validity
Properties
allowedValues: List of allowed strings or numbers.
Behavior
| Condition | Status |
|---|
matchEnum is false and 1 or more values from allowedValues is found in the column | Success ✅ |
matchEnum is true and all columns have a value from allowedValues | Success ✅ |
matchEnum is false 0 value from allowedValues is found in the column | Failed ❌ |
matchEnum is true and 1 or more columns does not have a vluae from allowedValues | Failed ❌ |
YAML Config
- name: myTestName
testDefinitionName: columnValuesToBeInSet
columnName: columnName
computePassedFailedRowCount: <true or false>
parameterValues:
- name: allowedValues
value: '["forbidden1", "forbidden2"]'
- name: matchEnum
value: "" # or true
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeInSet",
"parameterValues": [
{
"name": "allowedValues",
"value": [
"forbidden1",
"forbidden2"
]
},
{
"name": "matchEnum",
"value": ""
}
]
}
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeInSet",
"parameterValues": [
{
"name": "allowedValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
Column Values to Be Not In Set
Validate that there are no values in a column in a set of forbidden values.
Dimension:
Validity
Properties
forbiddenValues: List of forbidden strings or numbers.
Behavior
| Condition | Status |
|---|
0 value from forbiddenValues is found in the column | Success ✅ |
1 or more values from forbiddenValues is found in the column | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeNotInSet
computePassedFailedRowCount: <true or false>
parameterValues:
- name: forbiddenValues
value: ["forbidden1", "forbidden2"]
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeNotInSet",
"parameterValues": [
{
"name": "forbiddenValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
Column Values to Be Between
Validate that the values of a column are within a given range.
Only supports numerical types.
Dimension:
Accuracy
Properties
minValue: Lower bound of the interval. If informed, the column values should be bigger than this number.
maxValue: Upper bound of the interval. If informed, the column values should be lower than this number.
Any of those two need to be informed.
Behavior
| Condition | Status |
|---|
value is between minValue and maxValue | Success ✅ |
value is greater than minValue if only minValue is specified | Success ✅ |
value is less then maxValue if only maxValue is specified | Success ✅ |
value is not between minValue and maxValue | Failed ❌ |
value is less than minValue if only minValue is specified | Failed ❌ |
value is greater then maxValue if only maxValue is specified | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeBetween
computePassedFailedRowCount: <true or false>
parameterValues:
- name: minValue
value: ["forbidden1", "forbidden2"]
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeBetween",
"parameterValues": [
{
"name": "minValue",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
Column Values Missing Count to Be Equal
Validates that the number of missing values matches a given number. Missing values are the sum of nulls, plus the sum of values in a given list which we need to consider as missing data. A clear example of that would be NA or N/A.
Dimension:
Completeness
Properties
missingCountValue: The number of missing values needs to be equal to this. This field is mandatory.
missingValueMatch (Optional): A list of strings to consider as missing values.
Behavior
| Condition | Status |
|---|
Number of missing value is equal to missingCountValue | Success ✅ |
Number of missing value is not equal to missingCountValue | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesMissingCountToBeEqual
parameterValues:
- name: missingValueMatch
value: ["NA", "N/A"]
- name: missingCountValue
value: 100
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesMissingCountToBeEqual",
"parameterValues": [
{
"name": "missingValueMatch",
"value": [
"NA",
"N/A"
]
},
{
"name": "missingCountValue",
"value": 100
}
]
}
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesMissingCountToBeEqual",
"parameterValues": [
{
"name": "missingValueMatch",
"value": [
"NA",
"N/A"
]
},
{
"name": "missingCountValue",
"value": 100
}
]
}
Column Values Lengths to Be Between
Validates that the lengths of the strings in a column are within a given range.
Only supports concatenable types.
Dimension:
Accuracy
Properties
minLength: Lower bound of the interval. If informed, the string length should be bigger than this number.
maxLength: Upper bound of the interval. If informed, the string length should be lower than this number.
Any of those two need to be informed.
Behavior
| Condition | Status |
|---|
value length is between minLength and maxLength | Success ✅ |
value length is greater than minLength if only minLength is specified | Success ✅ |
value length is less then maxLength if only maxLength is specified | Success ✅ |
value length is not between minLength and maxLength | Failed ❌ |
value length is less than minLength if only minLength is specified | Failed ❌ |
value length is greater then maxLength if only maxLength is specified | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValueLengthsToBeBetween
computePassedFailedRowCount: <true or false>
parameterValues:
- name: minLength
value: 50
- name: maxLength
value: 100
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValueLengthsToBeBetween",
"parameterValues": [
{
"name": "minLength",
"value": 50
},
{
"name": "maxLength",
"value": 100
}
]
}
Column Value Max to Be Between
Validate the maximum value of a column is between a specific range
Only supports numerical types.
Dimension:
Accuracy
Properties
minValueForMaxInCol: lower bound
maxValueForMaxInCol: upper bound
Behavior
| Condition | Status |
|---|
column max value is between minValueForMaxInCol and maxValueForMaxInCol | Success ✅ |
column max value is greater than minValueForMaxInCol if only minValueForMaxInCol is specified | Success ✅ |
column max value is less then maxValueForMaxInCol if only maxValueForMaxInCol is specified | Success ✅ |
column max value is not between minValueForMaxInCol and maxValueForMaxInCol | Failed ❌ |
column max value is less than minValueForMaxInCol if only minValueForMaxInCol is specified | Failed ❌ |
column max value is greater then maxValueForMaxInCol if only maxValueForMaxInCol is specified | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValueMaxToBeBetween
parameterValues:
- name: minValueForMaxInCol
value: 50
- name: maxValueForMaxInCol
value: 100
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValueMaxToBeBetween",
"parameterValues": [
{
"name": "minValueForMaxInCol",
"value": 50
},
{
"name": "maxValueForMaxInCol",
"value": 100
}
]
}
Column Value Min to Be Between
Validate the minimum value of a column is between a specific range
Only supports numerical types.
Dimension:
Accuracy
Properties
minValueForMinInCol: lower bound
maxValueForMinInCol: upper bound
Behavior
| Condition | Status |
|---|
column min value is between minValueForMinInCol and maxValueForMinInCol | Success ✅ |
column min value is greater than minValueForMinInCol if only minValueForMinInCol is specified | Success ✅ |
column min value is less then maxValueForMinInCol if only maxValueForMinInCol is specified | Success ✅ |
column min value is not between minValueForMinInCol and maxValueForMinInCol | Failed ❌ |
column min value is less than minValueForMinInCol if only minValueForMinInCol is specified | Failed ❌ |
column min value is greater then maxValueForMinInCol if only maxValueForMinInCol is specified | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValueMinToBeBetween
parameterValues:
- name: minValueForMinInCol
value: 10
- name: maxValueForMinInCol
value: 50
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValueMinToBeBetween",
"parameterValues": [
{
"name": "minValueForMinInCol",
"value": 10
},
{
"name": "maxValueForMinInCol",
"value": 50
}
]
}
Column Value Mean to Be Between
Validate the mean of a column is between a specific range
Only supports numerical types.
Dimension:
Accuracy
Properties
minValueForMeanInCol: lower bound
maxValueForMeanInCol: upper bound
Behavior
| Condition | Status |
|---|
column mean value is between minValueForMeanInCol and maxValueForMeanInCol | Success ✅ |
column mean value is greater than minValueForMeanInCol if only minValueForMeanInCol is specified | Success ✅ |
column mean value is less then maxValueForMeanInCol if only maxValueForMeanInCol is specified | Success ✅ |
column mean value is not between minValueForMeanInCol and maxValueForMeanInCol | Failed ❌ |
column mean value is less than minValueForMeanInCol if only minValueForMeanInCol is specified | Failed ❌ |
column mean value is greater then maxValueForMeanInCol if only maxValueForMeanInCol is specified | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValueMeanToBeBetween
parameterValues:
- name: minValueForMeanInCol
value: 5
- name: maxValueForMeanInCol
value: 10
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValueMeanToBeBetween",
"parameterValues": [
{
"name": "minValueForMeanInCol",
"value": 5
},
{
"name": "maxValueForMeanInCol",
"value": 10
}
]
}
Validate the median of a column is between a specific range
Only supports numerical types.
Dimension:
Accuracy
Properties
minValueForMedianInCol: lower bound
maxValueForMedianInCol: upper bound
Behavior
| Condition | Status |
|---|
column median value is between minValueForMedianInCol and maxValueForMedianInCol | Success ✅ |
column median value is greater than minValueForMedianInCol if only minValueForMedianInCol is specified | Success ✅ |
column median value is less then maxValueForMedianInCol if only maxValueForMedianInCol is specified | Success ✅ |
column median value is not between minValueForMedianInCol and maxValueForMedianInCol | Failed ❌ |
column median value is less than minValueForMedianInCol if only minValueForMedianInCol is specified | Failed ❌ |
column median value is greater then maxValueForMedianInCol if only maxValueForMedianInCol is specified | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValueMedianToBeBetween
parameterValues:
- name: minValueForMedianInCol
value: 5
- name: maxValueForMedianInCol
value: 10
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValueMedianToBeBetween",
"parameterValues": [
{
"name": "minValueForMedianInCol",
"value": 5
},
{
"name": "maxValueForMedianInCol",
"value": 10
}
]
}
Column Values Sum to Be Between
Validate the sum of a column is between a specific range
Only supports numerical types.
Dimension:
Accuracy
Properties
minValueForColSum: lower bound
maxValueForColSum: upper bound
Behavior
| Condition | Status |
|---|
Sum of the column values is between minValueForColSum and maxValueForColSum | Success ✅ |
Sum of the column values is greater than minValueForColSum if only minValueForColSum is specified | Success ✅ |
Sum of the column values is less then maxValueForColSum if only maxValueForColSum is specified | Success ✅ |
Sum of the column values is not between minValueForColSum and maxValueForColSum | Failed ❌ |
Sum of the column values is less than minValueForColSum if only minValueForColSum is specified | Failed ❌ |
Sum of the column values is greater then maxValueForColSum if only maxValueForColSum is specified | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValueMedianToBeBetween
parameterValues:
- name: minValueForMedianInCol
value: 5
- name: maxValueForMedianInCol
value: 10
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValueMedianToBeBetween",
"parameterValues": [
{
"name": "minValueForMedianInCol",
"value": 5
},
{
"name": "maxValueForMedianInCol",
"value": 10
}
]
}
Column Values Standard Deviation to Be Between
Validate the standard deviation of a column is between a specific range
Only supports numerical types.
Dimension:
Accuracy
Properties
minValueForStdDevInCol: lower bound
minValueForStdDevInCol: upper bound
Behavior
| Condition | Status |
|---|
column values standard deviation is between minValueForStdDevInCol and minValueForStdDevInCol | Success ✅ |
column values standard deviation is greater than minValueForStdDevInCol if only minValueForStdDevInCol is specified | Success ✅ |
column values standard deviation is less then minValueForStdDevInCol if only minValueForStdDevInCol is specified | Success ✅ |
column values standard deviation is not between minValueForStdDevInCol and minValueForStdDevInCol | Failed ❌ |
column values standard deviation is less than minValueForStdDevInCol if only minValueForStdDevInCol is specified | Failed ❌ |
column values standard deviation is greater then minValueForStdDevInCol if only minValueForStdDevInCol is specified | Failed ❌ |
YAML Config
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValueStdDevToBeBetween
parameterValues:
- name: minValueForStdDevInCol
value: 5
- name: maxValueForStdDevInCol
value: 10
JSON Config
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValueStdDevToBeBetween",
"parameterValues": [
{
"name": "minValueForStdDevInCol",
"value": 5
},
{
"name": "maxValueForStdDevInCol",
"value": 10
}
]
}
Column Values To Be At Expected Location
Validate the reference value for a column is a the expected geographic location
Data will be temporarely stored in memory while the test case is running to validate the location. Not data will be permanently stored.
France is the only supported location at this time. To add any additional location please reach out to the team in our slack support channel
Dimension:
Accuracy
Properties
locationReferenceType: the type of location refernce CITY or POSTAL_CODE
longitudeColumnName: longitude column name
latitudeColumnName: latitude column name
radius: radius in meter from which the location can be from the expected lat/long — acts as a buffer
Behavior
| Condition | Status |
|---|
| column values lat/long is within the polygon of the column reference (+/- radius) | Success ✅ |
| column values lat/long is outside the polygon of the column reference (+/- radius) | Failed ❌ |
YAML Config
- name: ExpectedGeoLocation
testDefinitionName: ColumnValuesToBeAtExpectedLocation
columnName: "Code Insee"
parameterValues:
- name: locationReferenceType
value: POSTAL_CODE
- name: longitudeColumnName
value: "Coordonnée Y"
- name: latitudeColumnName
value: "Coordonnée X"
- name: radius
value: "1000"
JSON Config
{
"name": "ExpectedGeoLocation",
"testDefinitionName": "ColumnValuesToBeAtExpectedLocation",
"columnName": "Code Insee",
"parameterValues": [
{
"name": "locationReferenceType",
"value": "POSTAL_CODE"
},
{
"name": "longitudeColumnName",
"value": "Coordonnée Y"
},
{
"name": "latitudeColumnName",
"value": "Coordonnée X"
},
{
"name": "radius",
"value": "1000"
}
]
}