Inline Schema#
The inline
schema is (as the name already suggests) directly embedded into the corresponding YAML file.
Example#
relations:
input:
kind: csv
location: "${logdir}"
options:
delimiter: "\t"
quote: "\""
escape: "\\"
schema:
kind: inline
fields:
- name: UnixDateTime
type: Long
- name: Impression_Uuid
type: String
- name: Event_Type
type: Integer
- name: User_Uuid
type: String
Fields#
kind
(mandatory) (type: string):inline
fields
(mandatory) (type: list:field): Contains all fields
Field properties#
name
(mandatory) (type: string): specifies the name of the columntype
(mandatory) (type: data type): specifies the data type of the columnnullable
(optional) (type: boolean) (default: true)description
(optional) (type: string)default
(optional) (type: string) Specifies a default valueformat
(optional) (type: string) Some relations or file formats may support different formats for example for storing datescharset
(optional) (type: string) Specifies the character set of a column. Useful for MySQL / MariaDB tables.collation
(optional) (type: string) Specifies the collation of a column. Useful for SQL tables.
Data Types#
The following simple data types are supported by Flowman
string
,text
- text and strings of arbitrary lengthbinary
- binary data of arbitrary lengthtinyint
,byte
- 8-bit signed numberssmallint
,short
- 16-bit signed numbersint
,integer
- 32-bit signed numbersbigint
,long
- 64-bit signed numbersboolean
- true or falsefloat
- 32-bit floating point numberdouble
- 64-bit floating point numberdecimal(a,b)
varchar(n)
- text with up ton
characters. Note that this data type is only supported for specifying input or output data types. Internally Spark and therefore Flowman convert these columns to astring
column of arbitrary length.char(n)
- text with exactlyn
characters. Note that this data type is only supported for specifying input or output data types. Internally Spark and therefore Flowman convert these columns to astring
column of arbitrary length.date
- date typetimestamp
- timestamp type (date and time)duration
- duration type
In addition to those simple data types the following complex types are supported:
struct
for creating nested data types
name: some_struct
type:
kind: struct
fields:
- name: some_field
type: int
- name: some_other_field
type: string
map
name: keyValue
type:
kind: map
keyType: string
valueType: int
array
for storing arrays of sub elements
name: names
type:
kind: array
elementType: string