Mock Mapping
The mock
mapping works similar to the null
mapping in that it creates an empty output. But instead of
explicitly specifying a schema of the empty output, the mock
mapping will fetch the schema from a different mapping.
This mapping is most useful to be used in tests. In addition it is also possible to manually specify records to be
returned, which makes this mapping even more convenient for mocking.
Example
mappings:
# This will mock `some_mapping`
some_mapping:
kind: mock
mappings:
some_other_mapping:
kind: mock
mapping: some_mapping
records:
- [1,2,"some_string",""]
- [2,null,"cat","black"]
mappings:
some_mapping:
kind: mock
mapping: some_mapping
records:
- Campaign ID: DIR_36919
LineItemID ID: DIR_260390
SiteID ID: 23374
CreativeID ID: 292668
PlacementID ID: 108460
- Campaign ID: DIR_36919
LineItemID ID: DIR_260390
SiteID ID: 23374
CreativeID ID: 292668
PlacementID ID: 108460
Fields
kind
(mandatory) (type: string):mock
broadcast
(optional) (type: boolean) (default: false): Hint for broadcasting the result of this mapping for map-side joins.cache
(optional) (type: string) (default: NONE): Cache mode for the results of this mapping. Supported values areNONE
DISK_ONLY
MEMORY_ONLY
MEMORY_ONLY_SER
MEMORY_AND_DISK
MEMORY_AND_DISK_SER
mapping
(optional) (type: string): Specifies the name of the mapping to be mocked. If no name is given, then a mapping with the same name will be mocked. Note that this will only work when used as an override mapping in test cases, otherwise an infinite loop would be created by referencing to itself.records
(optional) (type: list:array) (default: empty): An optional list of records to be returned.
Outputs
The mock
mapping provides all outputs as the mocked mapping.
Remarks
Mocking mappings is very useful for creating meaningful tests. But you need to take into account one important fact:
Mocking for testing only works well if Flowman doesn’t need to read real data. Access to real data might occur when
Flowman doesn’t have all schema information in the specification and therefore falls back to Spark doing schema
inference. This is something you always should avoid. The best way to avoid automatic schema inference with Spark
is to explicitly specify schema definitions in all relations and/or to explicitly specify all columns with data types
in the readRelation
mapping.