Assemble Mapping#
The “assemble” mapping is used to reassemble data with a possibly nested schema. For example, it is possible to extract some nested sub-entities, remove specific columns or subtrees or move subtrees to the top level.
Examples#
mappings:
crm_updates:
kind: assemble
input: crm_events
columns:
- kind: rename
columns:
data_operation: dataOperation
- kind: columns
path: data
drop:
- faConfig.feeList
- ebillingConfig
- ebillingConfig.transmissionChannels
- feeItems
- negotiatedPrices
mappings:
transmission_channel_explode:
kind: assemble
input: card_events_latest
columns:
- # Only keep some important fields from parent entity
kind: rename
columns:
data_operation: dataOperation
card_id: data.id
- # Now extract and explode nested child entities
kind: explode
name: channel
path: data.transmissionChannels
transmission_channel_updates:
kind: assemble
input: transmission_channel_explode
columns:
- # Take everything except for channel
kind: append
drop: channel
- # Now unpack all fields channel except maybe some
kind: append
path: channel
drop: recipients
Fields#
kind
(mandatory) (string):assemble
broadcast
(optional) (type: boolean) (default: false): Hint for broadcasting the result of this mapping for map-side joins.cache
(optional) (type: string) (default: NONE): Cache mode for the results of this mapping. Supported values areNONE
- Disables caching of teh results of this mappingDISK_ONLY
- Caches the results on diskMEMORY_ONLY
- Caches the results in memory. If not enough memory is available, records will be uncached.MEMORY_ONLY_SER
- Caches the results in memory in a serialized format. If not enough memory is available, records will be uncached.MEMORY_AND_DISK
- Caches the results first in memory and then spills to disk.MEMORY_AND_DISK_SER
- Caches the results first in memory in a serialized format and then spills to disk.
input
(mandatory) (string): The name of the input mappingfilter
(optional) (type: string) (default: empty): An optional SQL filter expression that is applied after assembling.columns
(mandatory) (objects): This part contains the assembling specification. It consists of an array with the following possible sub elements:append
- Appends one or more columns from the input to the resultkind
(mandatory) (string):append
path
(optional) (string): Specify a common subpath in the incoming data, from where columns should be extractedkeep
(optional) (list: string): Specifies a list of columns to be kept. The columns are relative to the optional path. Only the columns mentioned in thekeep
element will be appended, all other columns will be removed.drop
(optional) (list: string): Specified a list of columns to be removed. The columns are relative to the optional path. All columns not mentioned in the list ofdrop
columns will be appended.
flatten
- Flattens and appends one or more columns from the input to the result. Flattening means that nested columns will be resolved to non-nested columns.kind
(mandatory) (string):flatten
path
(optional) (string): Specify a common subpath in the incoming data, from where columns should be extractedprefix
(optioal) (string): Optional prefix to prepend to every resulting columnnaming
(optioal) (string) (default:snakeCase
): Naming convention to use, eithersnakeCase
orcamelCase
.keep
(optional) (list: string): Specifies a list of columns to be kept. The columns are relative to the optional path. Only the columns mentioned in thekeep
element will be appended, all other columns will be removed.drop
(optional) (list: string): Specified a list of columns to be removed. The columns are relative to the optional path. All columns not mentioned in the list ofdrop
columns will be appended.
explode
- Explodes a single array column into multiple output records. Note that you can only have a singleexplode
entry per assemble mapping. If more explodes are required, you need to create a chain of severalassemble
mappings with correspondingexplode
entries.kind
(mandatory) (string):explode
name
(mandatory) (string): Specifies the name of the exploded column.path
(optional) (string): Specifies the path to be exploded. The path has to be an array (or a child of an array)
rename
- Renames some input columns. Note that it is not an error to try to rename non-existing columns. In this case, no column will be appended.kind
(mandatory) (string):rename
path
(optional) (string): Specify a common subpath in the incoming data, from where columns should be extracted and renamedcolumns
(optional) (map: string): A map containing rename information. The key is the new column name and the value the incoming column name.
lift
- Extracts specific columns of a nested path and appends these as simple columns. Note that it is not an error to specify non-existing columns. Such column will simply be ignored.kind
(mandatory) (string):lift
path
(mandatory) (string): Specify the path in the incoming data which should be lifted to the top levelcolumns
(mandatory) (list: string): Specify all columns relative to the path of columns which should be lifted to the top level
nest
- Collects some columns and nests these into a new sub structure. Note that it is not an error to specify non-existing columns. Such column will simply be ignored.kind
(mandatory) (string):nest
name
(mandatory) (string): Specify the name of the new nested structurepath
(optional) (string): Specify the path in the incoming data which should be lifted to the top levelkeep
(optional) (list: string): Specifies a list of columns to be kept. The columns are relative to the optional path. Only the columns mentioned in thekeep
element will be appended, all other columns will be removed.drop
(optional) (list: string): Specified a list of columns to be removed. The columns are relative to the optional path. All columns not mentioned in the list ofdrop
columns will be appended.
struct
- Collects some columns and nests these into a new sub structurekind
(mandatory) (string):struct
name
(mandatory) (string): Specify the name of the new nested structurecolumns
(mandatory) (map: string): Specify all columns in the incoming data which should be put together as a new structure
Outputs#
main
- the only output of the aggregate mapping
Description#
The assemble
mapping will recreate a new structure from the incoming mapping. This is useful for working with nested
data, where only some subtrees are required, or the structure should be changed otherwise. Note that currently modifying
elements in arrays is not supported, you need to explode
these arrays into multiple records first.