Inline Schema

The inline schema is (as the name already suggests) directly embedded into the corresponding yml file.

Example

relations:
  input:
    kind: csv
    location: "${logdir}"
    options:
      delimiter: "\t"
      quote: "\""
      escape: "\\"
    schema:
      kind: inline
      fields:
        - name: UnixDateTime
          type: Long
        - name: Impression_Uuid
          type: String
        - name: Event_Type
          type: Integer
        - name: User_Uuid
          type: String

Fields

  • kind (mandatory) (type: string): inline
  • fields (mandatory) (type: list:field): Contains all fields

Field properties

  • name (mandatory) (type: string): specifies the name of the column
  • type (mandatory) (type: data type): specifies the data type of the column
  • nullable (optional) (type: boolean) (default: true)
  • description (optional) (type: string)
  • default (optional) (type: string) Specifies a default value
  • format (optional) (type: string) Some relations or file formats may support different formats for example for storing dates
  • charset (optional) (type: string) Specifies the character set of a column. Useful for MySQL / MariaDB tables.
  • collation (optional) (type: string) Specifies the collation of a column. Useful for SQL tables.

Data Types

The following simple data types are supported by Flowman

  • string, text - text and strings of arbitrary length
  • binary - binary data of arbitrary length
  • tinyint, byte - 8 bit signed numbers
  • smallint, short - 16 bit signed numbers
  • int, integer - 32 bit signed numbers
  • bigint, long - 64 bit signed numbers
  • boolean - true or false
  • float - 32 bit floating point number
  • double - 64 bit floating point number
  • decimal(a,b)
  • varchar(n) - text with up to ncharacters. Note that this data type is only supported for specifying input or output data types. Internally Spark and therefore Flowman convert these columns to a string column of arbitrary length.
  • char(n) - text with exactly ncharacters. Note that this data type is only supported for specifying input or output data types. Internally Spark and therefore Flowman convert these columns to a string column of arbitrary length.
  • date - date type
  • timestamp - timestamp type (date and time)
  • duration - duration type

In addition to those simple data types the following complex types are supported:

  • struct for creating nested data types
name: some_struct
type:
  kind: struct
  fields:
    - name: some_field
      type: int
    - name: some_other_field
      type: string
  • map
name: keyValue
type:
  kind: map
  keyType: string
  valueType: int
  • array for storing arrays of sub elements
name: names
type:
 kind: array
 elementType: string