Setup & Configuration#
Installing Flowman is relatively simple once you have a working Apache Spark environment, like Cloudera CDP. But even without a Hadoop/Spark environment, you can resort to prebuilt Docker images for running Flowman in a non-distributed local mode which is especially useful for development.
Moreover, managed Spark environments like AWS EMR or Azure Synapse are supported, you will find instructions below:
Supported Spark Environments#
Flowman is available for many different Spark/Hadoop environments. Flowman provides different packages
for each of these environments to ensure a high degree of compatibility. Each variant is identified by its suffix
appended to the Flowman version, i.e. <flowman-version>-<flowman-variant>. So for example, the full version tag
of Flowman 1.2.0 for Cloudera CDP 7.1 and Spark 3.3 would be 1.2.0-cdp7-spark3.3-hadoop3.1.
The following environments are officially supported with corresponding build variants:
| Distribution | Spark | Hadoop | Java | Scala | Variant |
|---|---|---|---|---|---|
| Open Source | 3.0.3 | 3.2 | 11 | 2.12 | oss-spark3.0-hadoop3.2 |
| Open Source | 3.1.2 | 3.2 | 11 | 2.12 | oss-spark3.1-hadoop3.2 |
| Open Source | 3.2.3 | 3.3 | 11 | 2.12 | oss-spark3.2-hadoop3.3 |
| Open Source | 3.3.2 | 3.3 | 11 | 2.12 | oss-spark3.3-hadoop3.3 |
| AWS EMR 6.10 | 3.3.1 | 3.3 | 1.8 | 2.12 | emr6.10-spark3.3-hadoop3.3 |
| Azure Synapse | 3.3.1 | 3.3 | 1.8 | 2.12 | synapse3.3-spark3.3-hadoop3.3 |
| Cloudera CDP 7.1 | 3.2.1 | 3.1 | 11 | 2.12 | cdp7-spark3.2-hadoop3.1 |
| Cloudera CDP 7.1 | 3.3.0 | 3.1 | 11 | 2.12 | cdp7-spark3.3-hadoop3.1 |