Aggregate Mapping#
Example#
mappings:
cube_weather:
kind: aggregate
input: weather_details
dimensions:
- country
- EXTRACT(YEAR FROM ts) AS year
- EXTRACT(MONTH FROM ts) AS month
aggregations:
avg_temp: "AVG(temperature)"
min_temp: "MIN(temperature)"
max_temp: "MAX(temperature)"
Fields#
kind
(mandatory) (type: string):aggregate
broadcast
(optional) (type: boolean) (default: false): Hint for broadcasting the result of this mapping for map-side joins.cache
(optional) (type: string) (default: NONE): Cache mode for the results of this mapping. Supported values areNONE
- Disables caching of teh results of this mappingDISK_ONLY
- Caches the results on diskMEMORY_ONLY
- Caches the results in memory. If not enough memory is available, records will be uncached.MEMORY_ONLY_SER
- Caches the results in memory in a serialized format. If not enough memory is available, records will be uncached.MEMORY_AND_DISK
- Caches the results first in memory and then spills to disk.MEMORY_AND_DISK_SER
- Caches the results first in memory in a serialized format and then spills to disk.
input
(mandatory) (type: string): Specifies the name of the input mapping to be aggregateddimensions
(mandatory) (type: list:string): Specifies the list of dimensions to aggregate on. Since Flowman 0.18.1 you can also specify valid SQL expressions as dimensions.aggregations
(mandatory) (type: map:string): Specifies the list of aggregations to perform. Each aggregation has a name (the key in the map) and an aggregation expression. The name corresponds to the outgoing column name.filter
(optional) (type: string) (default: empty): An optional SQL filter expression that is applied after aggregation, which corresponds to aHAVING
filter in classical SQL.
Outputs#
main
- the only output of the aggregate mapping
Description#
Essentially the aggregate
mapping performs a SQL SELECT ... GROUP BY ...
operations. The
example above would be equivalent to the following SQL statement:
SELECT
country,
EXTRACT(YEAR FROM ts) AS year,
EXTRACT(MONTH FROM ts) AS month,
AVG(temperature) AS avg_temp,
MIN(temperature) AS min_temp,
MAX(temperature) AS max_temp
FROM weather_details
GROUP BY country, EXTRACT(YEAR FROM ts), EXTRACT(MONTH FROM ts)