Kerberos Authentication¶
Of course, you can also run Flowman in a Kerberos environment, as long as the components you use actually support Kerberos. This includes Spark, Hadoop and Kafka.
Configuring Kerberos¶
The simplest way to use Kerberos is to provide a customized flowman-env.sh
in the conf
directory. You simply
need to set the following variables and provide a Kerberos keytab at the correct location.
# flowman-env.sh
KRB_PRINCIPAL={{KRB_PRINCIPAL}}@MY-REALM.NET
KRB_KEYTAB=$FLOWMAN_CONF_DIR/{{KRB_PRINCIPAL}}.keytab
Of course this way, Flowman will always use the same Kerberos principal for all projects. Currently, there is no other
way, since Spark and Hadoop need to have the Kerberos principal set at startup. But you can simply use different
config directories and switch between them by setting the FLOWMAN_CONF_DIR
environment variable.
Impala Catalog Plugin¶
When you want to use the Impala Plugin with Kerberos authentication, then things get a little
bit more complicated, since you also need to specify a JAAS
file.
# system.yml
# We need to specify the impala plugin as a system plugin, since it is required to instantiate a namespace
plugins:
- flowman-impala
# default-namespace.yml
# Define the connection to Impala
connections:
impala:
kind: jdbc
url: jdbc:impala://$System.getenv('IMPALA_HOST'):21050
properties:
SocketTimeout: 0
AuthMech: 1
AuthType: 1
KrbRealm: MY-KERBEROS-REALM.NET
KrbHostFQDN: $System.getenv('IMPALA_HOST')
KrbServiceName: impala
AllowSelfSignedCerts: 1
CAIssuedCertsMismatch: 1
SSL: 1
# Setup Impala as an additional catalog besides Hive
catalog:
kind: impala
connection: impala
# flowman-env.sh
SPARK_DRIVER_JAVA_OPTS="-Djava.security.auth.login.config=jaas.conf"
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab=/path/to/{{KRB_PRINCIPAL}}.keytab
useTicketCache=true
principal={{KRB_PRINCIPAL}}@MY-REALM.NET
doNotPrompt=true
debug=false;
};