Projects#

The specification of all relations, data transformations and build targets is done within Flowman projects. Each project has a top level project descriptor which mainly contains some meta information like project name and version and a list of subdirectories, which contain the entity definitions.

Project Specification#

Flowman always requires a Project top level file containing general information (like the name and version of a project) and directories where to look for specifications. The project file should be named project.yml, this way flowexec and flowshell will directly pick it up when only the directory is given on the command line.

A typical project.yml file looks as follows:

name: "example-project"
version: "1.0"
description: "My first example project"

modules:
 # List of subdirectories containing project files (*.yml, *.yaml)
 - config
 - model
 - mapping
 - target
 - job
  
imports:
  # Import another project with no (or default) parameters
  - project: other_project 

  # Import a project with specified job context
  - project: commons
    # Optional specify the location
    location: ${project.basedir}/../commons
    # Specify values for job parameters
    arguments:
      processing_date: $processing_date

Fields#

Each project supports the following fields:

  • name (mandatory) (string) The name of the overall project. This field is used by Flowman for sharing mappings and relations between different projects.

  • version (optional) (string) The version currently is not used by Flowman, but can be used for the end-user to help keeping track of which version of a project is currently being used.

  • description (optional) (string) A description of the overall project. Can be any text, is not used by Flowman otherwise

  • modules (mandatory) (list:string) The modules section contains a list of subdirectories or filenames where Flowman should search for more YAML specification files. This helps to organize complex projects into different modules and/or aspects. The directory and file names are relative to the project file itself.

  • imports (optional) (list:import) Within the imports section you can specify different projects to be imported and made available for referencing its entities. Please read sharing entities for more details.

Proposed Directory Layout#

The best practice is to use a directory structure. Depending on the project, two slightly different approaches have turned out to be useful: Either separating models and mappings or putting them together.

root
 ├── config
 │   ├── environment.yml
 │   ├── connections.yml
 │   └── profiles.yml
 ├── job
 │   ├── job.yml
 │   ├── target-1.yml
 │   │   ...
 │   └── target-n.yml
 ├── schema
 │   ├── schema-1.yml
 │   │   ...
 │   └── schema-n.yml
 ├── macros
 │   ├── macro-1.yml
 │   │   ...
 │   └── macro-n.yml
 ├── relation
 │   ├── relation-1.yml
 │   │   ...
 │   └── relation-n.yml
 ├── mapping
 │   ├── mapping-1.yml
 │   │   ...
 │   └── mapping-n.yml
 └── project.yml