mock relation works similar to a
empty relation in the sense that it does only return empty data.
The main difference is that a
mock relation picks up the schema from a different relation. It’s main use case is
within test cases where you want to replace physical data sources by empty mocked data sources with a minimum amount
relations: mocked_relation: kind: mock relation: real_relation
relations: some_relation: kind: mock relation: some_relation records: - [1,2,"some_string",""] - [2,null,"cat","black"]
relations: data_raw: kind: mock records: - Campaign ID: DIR_36919 LineItemID ID: DIR_260390 SiteID ID: 23374 CreativeID ID: 292668 PlacementID ID: 108460 - Campaign ID: DIR_36919 LineItemID ID: DIR_260390 SiteID ID: 23374 CreativeID ID: 292668 PlacementID ID: 108460
relation(optional) (string) (default: empty): Specify the base relation to be mocked. If no relation is specified, a relation with the same name will be mocked. Of course this doesn’t work within the same project on project level. But it works well when the
mockrelation is created inside a test.
records(optional) (type: list:array) (default: empty): An optional list of records to be returned. Note that this list needs to include values for any partition columns of the mocked relation. The partition values need to be appended at the end.
mock relation supports all output modes, each of them simply discarding all records.
Mocking relations is very useful for creating meaningful tests. But you need to take into account one important fact: Mocking for testing only works well if Flowman doesn’t need to read real data. Access to real data might occur when Flowman doesn’t have all schema information in the specification and therefore falls back to Spark doing schema inference. This is something you always should avoid. The best way to avoid automatic schema inference with Spark is to explicitly specify schema definitions in all relations.