Testing with Flowman

Testing data pipelines often turns out to be a difficult undertaking, since the pipeline relies on external data sources which need to be mocked. Fortunately, Flowman natively supports writing tests, which in turn support simple mocking of relations and/or mappings. This allows you to easily test the whole processing logic or even to test only some subsets of functionality represented by some mapping by mocking the output of its input mappings and just checking the results of the mapping under test.

Flowman Tests

Let’s have a look at the following example:

tests:
  test_facts:
    environment:
      - year=2013

    overrideMappings:
      measurements:
        kind: mock
        records:
          - year: $year
            date: $year-01-02
            time: 0100
            usaf: 999999
            wban: 63897
            wind_direction_qual: 9
            wind_speed_qual: 9
            air_temperature_qual: 9
          - year: $year
            date: $year-01-02
            time: 0100
            usaf: 99999
            wban: 63897
            wind_direction_qual: 9
            wind_speed_qual: 9
            air_temperature_qual: 9

      stations:
        kind: mock
        records:
          - usaf: 999999
            wban: 63897
            country: US
          - usaf: 999999
            wban: 1
            country: DE

    targets:
      - validate_stations_raw

    assertions:
      measurements_joined:
        kind: sql
        description: "Measurements are joined correctly"
        tests:
          - query: "SELECT year,usaf,wban,country FROM measurements_joined"
            expected:
              - [$year,999999,63897,US]
              - [$year,99999,63897,null]

In the example above, the mapping measurements_joined (not shown here) is tested inside the SELECT statement of the assertions block at the end. To be able to run this statement without accessing external data sources, some input relations (namely measurements and stations) have been mocked in the overrideMappings section.

Please find more details in the testing documentation

Running Tests

The easiest way to execute tests is to use the Flowman Shell, which provides a simple command test run, which will run all tests defined in your project.

Java/Scala Unittests

Flowman now also includes a flowman-testing library which allows one to write lightweight unittests using either Scala or Java. The library provides some simple test runner for executing tests and jobs specified as usual in YAML files.

Data Quality Tests

The testing framework above is meant for implementing unittests (i.e. self-contained tests without any dependency to external systems like databases for additional files). If you want to assess the data quality of either the source tables or the generated tables, you may want to have a look at documenting with Flowman and the validation and verify targets.