Impala Plugin¶
The impala
plugin is responsible for executing REFRESH
and COMPUTE STATS
commands on Hive table that are created
or written to by Flowman.
Provided Entities¶
impala
catalog
Example¶
In order to be able to use the Impala catalog plugin, you have to add it to the system.yml
definition as follows:
# system.yml
# We need to specify the impala plugin as a system plugin, since it is required to instantiate a namespace
plugins:
- flowman-impala
Then you have to configure the catalog in default-namespace.yml
similar to the following code snippet, which also uses
Kerberos for authentication. Note that for using Kerberos with Impala, you actually also need a jass.conf
file. Other
authentication mechanisms will require different properties - please consult the Impala documentation for more details.
# default-namespace.yml
# Define the connection to Impala
connections:
impala:
kind: jdbc
url: jdbc:impala://$System.getenv('IMPALA_HOST'):21050
properties:
SocketTimeout: 0
AuthMech: 1
AuthType: 1
KrbRealm: MY-KERBEROS-REALM.NET
KrbHostFQDN: $System.getenv('IMPALA_HOST')
KrbServiceName: impala
AllowSelfSignedCerts: 1
CAIssuedCertsMismatch: 1
SSL: 1
# Setup Impala as an additional catalog besides Hive
catalog:
kind: impala
connection: impala
config:
# Enable COMPUTE STATS (already enabled by default)
- flowman.impala.computeStats=true
You can also directly embed the connection as follows:
# Setup Impala as an additional catalog besides Hive
catalog:
kind: impala
connection:
impala:
kind: jdbc
url: jdbc:impala://$System.getenv('IMPALA_HOST'):21050
properties:
SocketTimeout: 0
AuthMech: 1
AuthType: 1
KrbRealm: MY-KERBEROS-REALM.NET
KrbHostFQDN: $System.getenv('IMPALA_HOST')
KrbServiceName: impala
AllowSelfSignedCerts: 1
CAIssuedCertsMismatch: 1
SSL: 1
You can disable the statistics computation (COMPUTE STATS
) which is normally also performed by the plugin by
setting the following configuration variable:
flowman.impala.computeStats
(type: boolean) (default:true) If enabled (i.e. set totrue
), then Flowman will perform aCOMPUTE STATS
within the Impala Catalog plugin whenever a Hive table is updated. TheREFRESH
statements will always be executed by the plugin.