Observe Mapping¶
Since Flowman 0.30.0, requires Spark 3.0 or later.
The observe mapping is used to collect data dependent metric information from within the whole data flow. The
execution is relatively cheap, although the mapping actually is an optimization barrier for Spark.
Typical use cases are counting records, examining the min and max value of a column.
Example¶
mappings:
measurements:
kind: relation
relation: measurements
inspect_measurements:
kind: observe
input: measurements
measures:
record_count: "count(*)"
min_temperature: "min(air_temperature)"
max_temperature: "max(air_temperature)"
Fields¶
kind(mandatory) (type: string):observebroadcast(optional) (type: boolean) (default: false): Hint for broadcasting the result of this mapping for map-side joins.cache(optional) (type: string) (default: NONE): Cache mode for the results of this mapping. Supported values areNONEDISK_ONLYMEMORY_ONLYMEMORY_ONLY_SERMEMORY_AND_DISKMEMORY_AND_DISK_SER
input(optional) (type: string): Specifies the name of the input mapping, which should be observed with the measures defined in themeasuressection.measures(optional) (type: map:array) (default: empty): This property defines the set of measures to be taken from the input mapping. The keys of the map will be used as the names of corresponding metrics, while the values need to be valid SQL aggregate expressions. The measurements must either contain a literal (e.g.42), or should contain one or more aggregate functions (e.g.sum(a)orsum(a + b) + avg(c) - 1). Expressions that contain references to the input mappings columns must always be wrapped in an aggregate function.
Outputs¶
The observe mapping creates one main output, which matches the input.
Provided Metrics¶
All measures defined are exported as metrics with the same name and with the following labels:
name- The name of the mapping (i.e.inspect_measurementsabove)category- Always set tomappingkind- Always set toobservenamespace- Name of the namespace (typicallydefault)project- Name of the projectversion- Version of the project