Documenting Mappings#
As with other entities, Flowman tries to automatically infer a meaningful documentation for mappings, especially for the schema of all mappings outputs. But of course this is not always possible especially when mappings perform complex transformations such that a single output column depends on multiple input columns. Probably the most complex example is the SQL mapping which allows implementing most complex transformations.
In order to mitigate this issue, you can explicitly provide additional documentation for mappings via the
documentation
tag, which is supported by all mappings.
Example#
mappings:
# Extract multiple columns from the raw measurements data using SQL SUBSTR functions
measurements_extracted:
kind: select
input: measurements_raw
columns:
usaf: "SUBSTR(raw_data,5,6)"
wban: "SUBSTR(raw_data,11,5)"
date: "TO_DATE(SUBSTR(raw_data,16,8), 'yyyyMMdd')"
time: "SUBSTR(raw_data,24,4)"
air_temperature: "CAST(SUBSTR(raw_data,88,5) AS FLOAT)/10"
air_temperature_qual: "SUBSTR(raw_data,93,1)"
# Explicit documentation section for annotating columns from above
documentation:
columns:
# You can document any column you like, you don't have to provide a description for all of them
- name: usaf
description: "The USAF (US Air Force) id of the weather station"
- name: wban
description: "The WBAN id of the weather station"
- name: date
description: "The date when the measurement was made"
- name: time
description: "The time when the measurement was made"
- name: report_type
description: "The report type of the measurement"
- name: air_temperature
description: "The air temperature in degree Celsius"
- name: air_temperature_qual
description: "The quality indicator of the air temperature. 1 means trustworthy quality."
Fields#
description
(optional) (type: string): A description of the mappingcolumns
(optional) (type: schema): A documentation of the output schema. Note that Flowman will inspect the schema of the mapping itself and only overlay the provided documentation. Only fields found in the original output schema will be documented, so you cannot add fields to the documentation which actually do not exist.