Cast Mapping#
The cast
mapping is a simple way to change the data type of individual columns without needing to specify all
columns. In many cases, you could also use the extend
with SQL CAST
expressions to achieve the
same result, but the later one does not support VARCHAR(n)
and CHAR(n)
data types.
Example#
kind: cast
input: some_mapping
columns:
id: "CHAR(12)"
amount: "DECIMAL(16,3)"
Fields#
kind
(mandatory) (string):cast
broadcast
(optional) (type: boolean) (default: false): Hint for broadcasting the result of this mapping for map-side joins.cache
(optional) (type: string) (default: NONE): Cache mode for the results of this mapping. Supported values areNONE
- Disables caching of teh results of this mappingDISK_ONLY
- Caches the results on diskMEMORY_ONLY
- Caches the results in memory. If not enough memory is available, records will be uncached.MEMORY_ONLY_SER
- Caches the results in memory in a serialized format. If not enough memory is available, records will be uncached.MEMORY_AND_DISK
- Caches the results first in memory and then spills to disk.MEMORY_AND_DISK_SER
- Caches the results first in memory in a serialized format and then spills to disk.
input
(mandatory) (string): The name of the input mappingcolumns
(mandatory) (map:type): Map of column names to desired data types. All data types supported by Apache Spark can be used here.filter
(optional) (type: string) (default: empty): An optional SQL filter expression that is applied after the transformation itself.
Outputs#
main
- the only output of the mapping
Remarks#
In contrast to the project
mapping, you only need to specify those columns where you want to change the
data type. All other columns will be passed through without any change. Also note that the column order of the output
is the same as of the input mapping.
The differences to the project
mapping and the schema
are as follows:
The
project
mapping only performs simple type conversions and will only emit the columns specified in thecolumns
list. It will drop all columns not specified in the list and cannot add new columns.The
schema
mapping allows specifying arbitrary complex data types including nested types. It will make sure that the result precisely matches the specified schema, i.e. it will add and/or drop columns as required.The
cast
mapping will only change the data type of the specified columns and will keep all other columns unchanged. This means that it will not add or drop any of the incoming columns.
Supported data types#
The following simple data types are supported by Apache Spark and Flowman:
string
,text
- text and strings of arbitrary lengthbinary
- binary data of arbitrary lengthtinyint
,byte
- 8-bit signed numberssmallint
,short
- 16-bit signed numbersint
,integer
- 32-bit signed numbersbigint
,long
- 64-bit signed numbersboolean
- true or falsefloat
- 32-bit floating point numberdouble
- 64-bit floating point numberdecimal(a,b)
varchar(n)
- text with up ton
characters. Note that this data type is only supported for specifying input or output data types. Internally Spark and therefore Flowman convert these columns to astring
column of arbitrary length.char(n)
- text with exactlyn
characters. Note that this data type is only supported for specifying input or output data types. Internally Spark and therefore Flowman convert these columns to astring
column of arbitrary length.date
- date typetimestamp
- timestamp type (date and time)duration
- duration type