Execution Metrics#
Flowman also provides some execution metrics in addition to those already provided by Spark. Flowman metrics include execution times for individual build targets, number of records written etc. The documentation always contains an information whenever some metrics are collected.
In order to actually use these metrics, you need to configure two things:
A metrics sink (like Prometheus)
A metrics board containing the configuration of which metrics should be published
Example#
The mapping of how metrics should be exported is part of the job definition as follows:
jobs:
daily:
description: "Process whole range of periods"
parameters:
- name: processing_datetime
type: timestamp
description: "Specifies the date in yyyy-MM-dd for which the result will be generated"
targets:
- my_target
- my_other_target
# The following section configures the metric board, which selects the Flowman metrics of interest and also
# maps the Flowman metric names to possibly different names
metrics:
# Define labels which are attached to all published metrics below
labels:
force: ${force}
status: ${status}
phase: ${phase}
datetime: ${processing_datetime}
metrics:
# Collect everything
- selector:
name: .*
labels:
category: ${category}
kind: ${kind}
name: ${name}
# This metric contains the number of records per output. It will search all metrics called
# `target_records` and export them as `flowman_output_records`. It will also label each metric with
# the name of each Flowman build target (in case you have multiple targets)
- name: flowman_output_records
selector:
name: target_records
labels:
category: target
labels:
cube: ${name}
# This metric contains the processing time per output. Again the selector will search for all metrics
# named `target_runtime` provided by a `target` category and will export these metrics as
# `flowman_output_time` with a label called `output` containing the name of the Flowman build target
- name: flowman_output_time
selector:
name: target_runtime
labels:
category: target
labels:
output: ${name}
# This metric contains the overall processing time
- name: flowman_processing_time
selector:
name: job_runtime
labels:
category: job
Now you only need to provide a metric sink in the default-namespace.yml
file.
metrics:
kind: prometheus
url: $System.getenv('URL_PROMETHEUS_PUSHGW')
labels:
job: "daily"
instance: "default"
namespace: $System.getenv('NAMESPACE')
This configuration will commit all Flowman execution metrics to a Prometheus Push Gateway after every execution.