Setup & Configuration#
Installing Flowman is relatively simple once you have a working Apache Spark environment, like Cloudera CDP. But even without a Hadoop/Spark environment, you can resort to prebuilt Docker images for running Flowman in a non-distributed local mode which is especially useful for development.
Moreover, managed Spark environments like AWS EMR or Azure Synapse are supported, you will find instructions below:
Supported Spark Environments#
Flowman is available for many different Spark/Hadoop environments. Flowman provides different packages
for each of these environments to ensure a high degree of compatibility. Each variant is identified by its suffix
appended to the Flowman version, i.e. <flowman-version>-<flowman-variant>
. So for example, the full version tag
of Flowman 1.2.0 for Cloudera CDP 7.1 and Spark 3.3 would be 1.2.0-cdp7-spark3.3-hadoop3.1
.
The following environments are officially supported with corresponding build variants:
Distribution | Spark | Hadoop | Java | Scala | Variant |
---|---|---|---|---|---|
Open Source | 2.4.8 | 2.6 | 1.8 | 2.11 | oss-spark2.4-hadoop2.6 |
Open Source | 2.4.8 | 2.7 | 1.8 | 2.11 | oss-spark2.4-hadoop2.7 |
Open Source | 3.0.3 | 2.7 | 11 | 2.12 | oss-spark3.0-hadoop2.7 |
Open Source | 3.0.3 | 3.2 | 11 | 2.12 | oss-spark3.0-hadoop3.2 |
Open Source | 3.1.2 | 2.7 | 11 | 2.12 | oss-spark3.1-hadoop2.7 |
Open Source | 3.1.2 | 3.2 | 11 | 2.12 | oss-spark3.1-hadoop3.2 |
Open Source | 3.2.3 | 2.7 | 11 | 2.12 | oss-spark3.2-hadoop2.7 |
Open Source | 3.2.3 | 3.3 | 11 | 2.12 | oss-spark3.2-hadoop3.3 |
Open Source | 3.3.2 | 2.7 | 11 | 2.12 | oss-spark3.3-hadoop2.7 |
Open Source | 3.3.2 | 3.3 | 11 | 2.12 | oss-spark3.3-hadoop3.3 |
AWS EMR 6.10 | 3.3.1 | 3.3 | 1.8 | 2.12 | emr6.10-spark3.3-hadoop3.3 |
Azure Synapse | 3.3.1 | 3.3 | 1.8 | 2.12 | synapse3.3-spark3.3-hadoop3.3 |
Cloudera CDH 6.3 | 2.4.0 | 3.0 | 1.8 | 2.11 | cdh6-spark2.4-hadoop3.0 |
Cloudera CDP 7.1 | 2.4.8 | 3.1 | 1.8 | 2.11 | cdp7-spark2.4-hadoop3.1 |
Cloudera CDP 7.1 | 3.2.1 | 3.1 | 11 | 2.12 | cdp7-spark3.2-hadoop3.1 |
Cloudera CDP 7.1 | 3.3.0 | 3.1 | 11 | 2.12 | cdp7-spark3.3-hadoop3.1 |