Override jar versions#
A common problem with Spark and specifically with many Hadoop environments (like Cloudera) are mismatches between application jar versions and jars provided by the runtime environment. Flowman is built with carefully set dependency version to match those of each supported runtime environment. But sometimes this might not be enough.
For example Cloudera ships with a rather old JDBC driver for MS SQL Server / Azure SQL Server which is not compatible
sqlserver relation type provided by the MS SQL Server plugin. This will result
MethodNotFound exceptions during execution. But it is still
possible to force Spark to use the newer JDBC driver by changing some config options.
You need to add the following lines to your custom
flowman-env.sh file which is stored in the
# Add MS SQL JDBC Driver. Normally this is handled by the plugin mechanism, but Cloudera already provides some # old version of the JDBC driver, and this is the only place where we can force to use our JDBC driver SPARK_JARS="$FLOWMAN_HOME/plugins/flowman-mssqlserver/mssql-jdbc-9.2.1.jre8.jar" SPARK_OPTS="--conf spark.executor.extraClassPath=mssql-jdbc-9.2.1.jre8.jar --conf spark.driver.extraClassPath=$FLOWMAN_HOME/plugins/flowman-mssqlserver/mssql-jdbc-9.2.1.jre8.jar"
The first line will explicitly add the plugin jar to the list of jars as passed to
spark-submit. But this is still
not enough, we also have to set
spark.driver.extraClassPath which will prepend the specified jars to the
class path of the executor.