Columns Assertion#
An assertion of type columns
allows you to verify schema properties, like specific types of columns or presence/absence
of individual columns.
Example:#
kind: columns
description: "Verify correctness of column names and types"
mapping: facts_all
expected:
- "network IS PRESENT"
- "xyz IS ABSENT"
- "campaign IS OF TYPE (int,BIGINT)"
- "lineitem IS OF TYPE float"
A more complete example (with the required top level entities) could look as follows:
targets:
verify_output:
kind: verify
assertions:
assert_facts_columns:
kind: columns
description: "Verify correctness of column names and types"
mapping: facts_all
expected:
- network IS PRESENT
- xyz IS ABSENT
- campaign IS OF TYPE (int,BIGINT)
- lineitem IS OF TYPE float
Another example using the assertion inside a test:
test:
test_pricing:
assertions:
assert_pricing_columns:
kind: columns
description: "Assert correctness of column names and types"
mapping: cube_pricing
expected:
- campaign IS OF TYPE (int,long)
- lineitem IS OF TYPE string
- imps IS OF TYPE long
- price IS OF TYPE float
Fields#
kind
(mandatory) (type: string):columns
description
(optional) (type: string): A textual description of the assertionmapping
(optional) (type: string): The name of the mapping which is to be tested.expected
(optional) (type: list:string): A list of column schema expressions.
Column Schema Expressions#
In order to verify type properties of columns, Flowman provides a small expression language. Currently, the following four expressions a supported:
<column_name> IS PRESENT
: Verifies that a column calledcolumn_name
is present in the schema. The check is case-insensitive.<column_name> IS ABSENT
: Verifies that a column calledcolumn_name
is not present in the schema. The check is case-insensitive.<column_name> IS OF TYPE <typename>
: Verifies that a column calledcolumn_name
is present in the schema and that it is of typetypename
. All Spark SQL types are allowed.<column_name> IS OF TYPE (<typename_1>, <typename_2>, ...)
: Verifies that a column calledcolumn_name
is present in the schema and that it is of one of the typestypename_1
,typename_2
, …. All Spark SQL types are allowed.