File Target#
A target for writing files into a shared file system like HDFS or S3. In most cases you should prefer using a File Relation together with a Relation Target instead of using a file target.
Example:#
targets:
csv_export:
kind: file
mapping: some_mapping
format: "csv"
location: "${export_dir}"
mode: overwrite
parallelism: 32
rebalance: true
options:
delimiter: ","
quote: "\""
escape: "\\"
header: "true"
compression: "gzip"
Fields#
kind
(mandatory) (type: string):file
description
(optional) (type: string): Optional descriptive text of the build targetmapping
(optional) (type: string): Specifies the name of the input mapping to be writtenmode
(optional) (type: string) (default=overwrite): Specifies the behavior when data or table or partition already exists. Options include:overwrite
: overwrite the existing data.append
: append the data.ignore
: ignore the operation (i.e. no-op).error
orerrorifexists
: throw an exception at runtime . The default value is controlled by the Flowman config variablefloman.default.target.outputMode
.
partition
(optional) (type: map:string) (default=empty):parallelism
(optional) (type: integer) (default=16): This specifies the parallelism to be used when writing data. The parallelism equals the number of files being generated in HDFS output and also equals the maximum number of threads that are used in total in all Spark executors to produce the output. Ifparallelism
is set to zero or to a negative number, Flowman will not coalesce any partitions and generate as many files as Spark partitions. The default value is controlled by the Flowman config variablefloman.default.target.parallelism
.rebalance
(optional) (type: boolean) (default=false): Enables rebalancing the size of all partitions by introducing an additional internal shuffle operation. Each partition and output file will contain approximately the same number of records. The default value is controlled by the Flowman config variablefloman.default.target.rebalance
.
Supported Phases#
CREATE
- creates the target directoryBUILD
- build the target files containing recordsVERIFY
- verifies that the target file existsTRUNCATE
- removes the target file, but keeps the directoryDESTROY
- recursively removes the target directory and all files inside
Read more about execution phases.
Provided Metrics#
The relation target also provides some metric containing the number of records written:
Metric
target_records
with the following set of attributesname
- The name of the targetcategory
- Always set totarget
kind
- Always set tofile
namespace
- Name of the namespace (typicallydefault
)project
- Name of the projectversion
- Version of the project
See Execution Metrics for more information how to use these metrics.