Impala is another “SQL on Hadoop” execution engine mainly developed and backed up by Cloudera. Impala allows you to access data stored in Hadoop and registered in the Hive metastore, just like Hive itself, but often at a significantly better performance. Unfortunately Impala requires that all changes to Hive tables are either performed directly via Impala itself or that the changed tables are synced from Hive to Impala after changes.
In order to better support environments which rely on Impala, Flowman also supports the second approach of automatically syncing all changes to Hive tables performed in Flowman. This feature is accomplished by the “Impala” plugin for Flowman. The plugin only needs to be enabled and properly configured, and then all changes to Hive will be automatically propagated to Impala
Configuration of Impala Plugin#
First you need to enable the Impala plugin. This needs to be done in the
system.yml configuration file in the
conf directory of Flowman:
# The system configuration loads plugin before namespaces are instantiated. The Impala plugin may already be required
# within a namespace to define an external catalog, therefore it needs to be loaded in advance.
Next you need to configure the Impala plugin as an “external catalog provider” within the namespace configuration file.
The default namespace is configured via
conf/default-namespace.yml. You need to add the following sections:
# Define the connection to Impala
# Setup Impala as an additional catalog besides Hive
Please consult the Cloudera Impala documentation for a comprehensive list of JDBC properties, which can be set under
properties in the connection above.
Using Impala with Kerberos#
When you are using Impala with Kerberos authentication, you also need to specify some additional details (please consult the Cloudera Impala documentation for a comprehensive list of JDBC properties):
# Define the connection to Impala with Kerberos enabled
In addition to the namespace configuration, you also need to provide valid Kerberos credentials stored in a keytab.
Impala then in turn requires a valid JAAS configuration file, which refers to that keytab. That file may look as
follows (of course you need to replace
Finally, you need to tell Flowman to read in this JAAS file. This can be done by specifying a Java command line option
conf/flowman-env.sh as follows: