Inline Schema#

The inline schema is (as the name already suggests) directly embedded into the corresponding YAML file.

Example#

relations:
  input:
    kind: csv
    location: "${logdir}"
    options:
      delimiter: "\t"
      quote: "\""
      escape: "\\"
    schema:
      kind: inline
      fields:
        - name: UnixDateTime
          type: Long
        - name: Impression_Uuid
          type: String
        - name: Event_Type
          type: Integer
        - name: User_Uuid
          type: String

Fields#

  • kind (mandatory) (type: string): inline

  • fields (mandatory) (type: list:field): Contains all fields

Field properties#

  • name (mandatory) (type: string): specifies the name of the column

  • type (mandatory) (type: data type): specifies the data type of the column

  • nullable (optional) (type: boolean) (default: true)

  • description (optional) (type: string)

  • default (optional) (type: string) Specifies a default value

  • format (optional) (type: string) Some relations or file formats may support different formats for example for storing dates

  • charset (optional) (type: string) Specifies the character set of a column. Useful for MySQL / MariaDB tables.

  • collation (optional) (type: string) Specifies the collation of a column. Useful for SQL tables.

Data Types#

The following simple data types are supported by Flowman

  • string, text - text and strings of arbitrary length

  • binary - binary data of arbitrary length

  • tinyint, byte - 8-bit signed numbers

  • smallint, short - 16-bit signed numbers

  • int, integer - 32-bit signed numbers

  • bigint, long - 64-bit signed numbers

  • boolean - true or false

  • float - 32-bit floating point number

  • double - 64-bit floating point number

  • decimal(a,b)

  • varchar(n) - text with up to n characters. Note that this data type is only supported for specifying input or output data types. Internally Spark and therefore Flowman convert these columns to a string column of arbitrary length.

  • char(n) - text with exactly n characters. Note that this data type is only supported for specifying input or output data types. Internally Spark and therefore Flowman convert these columns to a string column of arbitrary length.

  • date - date type

  • timestamp - timestamp type (date and time)

  • duration - duration type

In addition to those simple data types the following complex types are supported:

  • struct for creating nested data types

name: some_struct
type:
  kind: struct
  fields:
    - name: some_field
      type: int
    - name: some_other_field
      type: string
  • map

name: keyValue
type:
  kind: map
  keyType: string
  valueType: int
  • array for storing arrays of sub elements

name: names
type:
 kind: array
 elementType: string