Local Relations#
In addition to working with file relations backed up by Hadoop compatible file systems (HDFS, S3, …), Flowman also supports the local file system as backend for working with files. The implementation is independent of the normal Apache Spark data sources, therefore only a very limited set of file formats are supported.
Example#
relations:
local:
kind: local
location: $outputPath
pattern: data.csv
format: csv
schema:
kind: inline
fields:
- name: str_col
type: string
- name: int_col
type: integer
Fields#
kind(mandatory) (string):localschema(optional) (schema) (default: empty): Explicitly specifies the schema of the local relation.description(optional) (string) (default: empty): A description of the relation. This is purely for informational purpose.options(optional) (map:string) (default: empty): All options are passed directly to the reader/writer backend and are specific to each supported format.format(optional) (string) (default:csv): This specifies the file format to use. Since the implementation of local file formats is separate from Apache Spark, only a limited number of formats are currently supported.location(mandatory) (string): This field specifies the storage location in the local file system. If the data source is partitioned, this should specify only the root location below which partition directories are created.partitions(optional) (list:partition) (default: empty): In order to use partitioned file based data sources, you need to define the partitioning columns. Each partitioning column has a name and a type and optionally a granularity.pattern(optional) (string) (default: empty): This field specifies the directory and/or file name pattern to access specific partitions. Please see the section Partitioning below.
Description#
When using local relations as data sinks in a relation target, then Flowman will manage the
whole lifecycle of the directory for you. This means that
The directory specified in
locationwill be created duringcreatephaseThe directory specified in
locationwill be populated with records or partitioning subdirectories will be added duringbuildphaseThe directory specified in
locationwill be truncated, or individual partitions will be dropped duringcleanphaseThe directory specified in
locationtables will be removed duringdestroyphase
Automatic Migrations#
The local relation does not support any automatic migration like adding/removing columns.
Supported File Format#
The local relation only supports a very limited set of file formats (currently only CSV files)
CSV#
Partitioning#
The local relation also supports partitioning by storing different partitions in separate files or subdirectories.
You need to explicitly specify a partition pattern via the pattern field.