Sharing Entities between Projects¶
In bigger projects, it makes sense to organize data transformations like Flowman projects into separate subprojects, so they can be maintained independently by possibly different teams. A classical example would be to have a different Flowman project per source system (let it be your CRM system, your financial transaction processing system etc). In a data lake environment, you probably want to implement independent Flowman projects to perform the first technical transformations for each of these source systems. Then in the next layer, you want to create a more complex and integrated data model built on top of these independent models.
In such scenarios, you want to share some common entity definitions between these projects, for example the Flowman project for building the integrated data model may want to reuse the relations from the other projects.
Flowman well supports these scenarios by the concept of imports.
Example¶
First you define a project which exports entities. Actually you might need to do nothing, since importing a project will make all entities available to the importing side. But maybe your project also requires some variables to be set, like the processing date. Typically, you would include such variables as job parameters:
# Project A, which contains shared resources
jobs:
# Define a base job with common environment variables
base:
parameters:
- name: processing_datetime
type: timestamp
description: "Specifies the datetime in yyyy-MM-ddTHH:mm:ss.Z for which the result will be generated"
- name: processing_duration
type: duration
description: "Specifies the processing duration (either P1D or PT1H)"
environment:
- start_ts=$processing_datetime
- end_ts=${Timestamp.add(${processing_datetime}, ${processing_duration})}
- start_unixtime=${Timestamp.parse($start_ts).toEpochSeconds()}
- end_unixtime=${Timestamp.parse($end_ts).toEpochSeconds()}
# Define a specific job for daily processing
daily:
extends: base
parameters:
- name: processing_datetime
type: timestamp
environment:
- processing_duration=P1D
# Define a specific job for hourly processing
hourly:
extends: base
parameters:
- name: processing_datetime
environment:
- processing_duration=PT1H
The another project may want to access resources from project A, but within the context of the export
job. This
can be achieved by declaring the dependency in an imports
section within the project manifest:
# project.yml of another project
name: raw-exporter
imports:
# Import with no job and no (or default) parameters
- project: project_b
# Import project with specified job context
- project: project_a
# The job may even be a variable, so different job context can be imported
job: $period
arguments:
processing_datetime: $processing_datetime
Then you can easily access entities from project_a
and project_b
as follows:
mappings:
# You can access all entities from different projects by using the project name followed by a slash ("/")
sap_transactions:
kind: filter
input: project_b/transactions
condition: "transaction_code = 8100"
relations:
ad_impressions:
kind: alias
input: project_a/ad_impressions
jobs:
main:
parameters:
- name: processing_datetime
type: timestamp
environment:
# Set the variable $period, so it will be used to import the correct job
- period=daily