Configure Query Spotlight: Hive (RPM/DEB)
On This Page
Prerequisites
Before you begin configuring Query Spotlight for Hive query monitoring, ensure that your system meets the required prerequisites.
-
On every host where you are configuring Query Spotlight, Pepperdata must be installed (Installing Pepperdata) and Hadoop configured to use Pepperdata (Configure Hadoop to Use Pepperdata).
-
Your cluster uses a supported combination of Hive version and platform; see the entries for Query Spotlight 7.1.x in the table of Supported Hive-Distro Combinations by Query Spotlight Version.
Task 1: Enable Fetching of Hive Query Data
To enable Pepperdata to fetch Hive query data, add the environment variables for the Hive query JAR, Pepperdata configuration, and the Pepperdata activation script to the Hive configuration on the Hive server hosts.
Procedure
-
Beginning with any Hive server host, add the required environment variables.
-
Open the
hive-env.sh
file for editing. -
Add the environment variables in the following format.
For the
PD_HIVE_QUERY_JAR_NAME
variable:-
Be sure to replace the
your-hive-query-jar-name
placeholder with the actual name of the Hive Query JAR that you’re using. - Choose the one where the first set of numbers corresponds to the version of Hive that you’re using.
For all Hive 3.x versions, use the
PepperdataHiveQuery-3.1.2.jar
unified JAR. It supports all Hive 3.x-based distros.
For example, if you are using CDP 7.1.3, which is built with Hive 3.1.3, use thePepperdataHiveQuery-3.1.2.jar
JAR. -
For Hive 2.x, the distro version numbers refer to the earliest version number that is supported. If your version is later than any of the distro’s designations in the JAR filenames, use the latest-named one.
For example, if you are using Hive 2.1.1 with CDH 6.3.4, use the
PepperdataHiveQuery-2.1.1-cdh6.3.3.jar
JAR. -
Specify only the JAR name; do not include the path; for example:
export PD_HIVE_QUERY_JAR_NAME=PepperdataHiveQuery-3.1.2.jar
# The following are the supported Hive Query jars # PepperdataHiveQuery-3.1.2.jar # PepperdataHiveQuery-2.3.6.jar # PepperdataHiveQuery-2.1.1-cdh6.3.3.jar # PepperdataHiveQuery-2.1.1-cdh6.2.1.jar export PD_HIVE_QUERY_JAR_NAME=your-hive-query-jar-name export PD_HOME=${PD_HOME:-/opt/pepperdata} export PD_INSTALL_DIR="${PD_HOME}/supervisor" PEPPERDATA_HIVE_ACTIVATE_SCRIPT_PATH="${PD_INSTALL_DIR}/lib/hive-activate.sh" if [ -e $PEPPERDATA_HIVE_ACTIVATE_SCRIPT_PATH ]; then . $PEPPERDATA_HIVE_ACTIVATE_SCRIPT_PATH fi
-
-
Save your changes and close the file.
-
-
Repeat step 1 on every Hive server host.
-
On every Hive server, restart the
hiveserver2
service.
Task 2: (Optional) Encrypt the Connect String for the Hive Metastore
If you want to encrypt the connect string for the Hive metastore, regardless of whether you’ll store it in the Pepperdata site file or an external file, use the Pepperdata password encryption script.
At a minimum, the unencrypted connect string must include the jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/
string.
You can add as many connection properties/parameters as you need for your environment, separating them with a semicolon, ;
.
Example Connect Strings
- Without properties/parameters:
jdbc:hive2://localhost:10000/
- Add properties for authenticated environments:
jdbc:hive2://localhost:10000/;user=YOUR-USERNAME;password=YOUR-PASSWORD
- Multiple properties/parameters:
jdbc:hive2://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hiveserver2_namespace>
Procedure
-
Run the Pepperdata encryption script.
/opt/pepperdata/supervisor/encrypt_password.sh
-
At the
Enter the password to encrypt:
prompt, enter your connect string. -
Copy (or make note of) the resulting encrypted connect string.
For example, in the following output from the script, the encrypted connect string is the string
W+ONY3ZcR6QLP5sqoRqcpA=2
.Encrypted password is W+ONY3ZcR6QLP5sqoRqcpA=2
Use this encrypted result as the value for pepperdata.jdbcfetch.hive.connect.string.encrypted
(which you’ll configure later), or store it in the external file specified by the pepperdata.jdbcfetch.hive.connect.string.encrypted.file
property.
Task 3: Enable Fetching of Hive Databases and Tables’ Metadata
To enable Pepperdata to fetch data from the Hive metastore, add the required variables to the Pepperdata configuration.
Procedure
-
On any of the hosts that are configured to be a Hive client (and from which you launch Hive queries), open the Pepperdata site file,
pepperdata-site.xml
, for editing.It’s sufficient to add the variables to a single Hive client host. But if you want to replicate the configuration on every host—perhaps to ease configuration management—that is okay, too.
By default, the Pepperdata site file,
pepperdata-site.xml
, is located in/etc/pepperdata
. If you customized the location, the file is specified by thePD_CONF_DIR
environment variable. See Change the Location of pepperdata-site.xml for details. -
Add the property to configure the hostname.
Be sure to substitute your fully-qualified, canonical hostname for the
YOUR.CANONICAL.HOSTNAME
placeholder in the following code snippet.<property> <name>pepperdata.jdbcfetch.hive.pepagent.host</name> <value>YOUR.CANONICAL.HOSTNAME</value> <description>Host where the fetching should be enabled.</description> </property>
-
Configure the connect string.
Add one of the following properties, depending on your environment and security requirements.
Be sure to substitute your information for the
YOUR...
placeholders.-
Plain text connect string stored in the Pepperdata site file.
At a minimum, the connect string must include the
jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/
string. You can add as many connection properties/parameters as you need for your environment, separating them with a semicolon,;
.Example Connect Strings
- Without properties/parameters:
jdbc:hive2://localhost:10000/
- Add properties for authenticated environments:
jdbc:hive2://localhost:10000/;user=YOUR-USERNAME;password=YOUR-PASSWORD
- Multiple properties/parameters:
jdbc:hive2://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hiveserver2_namespace>
<property> <name>pepperdata.jdbcfetch.hive.connect.string</name> <value>jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/${;OPTIONAL-ADDITIONAL-PROPERTY}</value> <description>JDBC Connect string to be used.</description> </property>
- Without properties/parameters:
-
Plain text connect string stored in an external file:
<property> <name>pepperdata.jdbcfetch.hive.connect.string.file</name> <value>YOUR-PATH-TO-JDBCSTRING-FILE</value> <description>Path to file containing JDBC Connect string.</description> </property>
-
Encrypted connect string—the result from encrypting the string earlier in the configuration procedure—stored in the Pepperdata site file:
<property> <name>pepperdata.jdbcfetch.hive.connect.string.encrypted</name> <value>YOUR-ENCRYPTED-TEXT</value> <description>Encrypted JDBC Connect string to be used.</description> </property>
-
Encrypted connect string—the result from encrypting the string earlier in the configuration procedure—stored in an external file:
<property> <name>pepperdata.jdbcfetch.hive.connect.string.encrypted.file</name> <value>YOUR-PATH-TO-JDBCSTRING-FILE</value> <description>Path to file containing encrypted JDBC Connect string.</description> </property>
-
-
(Kerberized Clusters) If the
hiveserver2
service is Kerberized, add the properties for the Kerberos principal and keytab to the Pepperdata site file.-
Enable fetching from a Kerberized Hiveserver2.
<property> <name>pepperdata.jdbcfetch.hive.kerberos.enabled</name> <value>true</value> <description>Should kerberos be used when connecting to Hive?</description> </property>
-
Configure the principal and keytab.
If you already configured thePD_AGENT_PRINCIPAL
andPD_AGENT_KEYTAB_LOCATION
environment variables during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication), you do not need to configure them again, and you should skip this substep.Be sure to substitute your information for the
YOUR...
placeholders.<property> <name>pepperdata.jdbcfetch.hive.kerberos.principal</name> <value>YOUR_PRINICPAL/HOST@DOMAIN.COM</value> <description>The Kerberos principal to use to authenticate with the Hive client.</description> </property> <property> <name>pepperdata.jdbcfetch.hive.kerberos.keytab.location</name> <value>YOUR-PATH-TO-KEYTAB-FILE</value> <description>Path to the keytab file for the specified principal.</description> </property>
-
-
Validate the XML snippets that you added.
Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint
, after you edit any .xml configuration file. -
Save your changes and close the file.
-
Add the
hive-jdbc-*standalone.jar
JAR file to the PepAgent’s classpath on the host that you selected in step 1.-
Find the fully-qualified name of the JAR, which depends on the cluster’s distro.
-
The filename pattern is
hive-jdbc-*standalone.jar
. -
The location depends on the distro; for example, in Cloudera CDH/CDP Private Cloud Base, RPM/DEB installations, the path is
/usr/lib/hive/lib/
-
You can use the
find
command to locate all available JAR files, and output their names to the console; for example:find /opt/cloudera/parcels/CDH/jars/ /usr/lib/hive/lib/ /usr/lib/hive/jdbc/ -name "hive-jdbc-*standalone.jar" 2>/dev/null /usr/lib/hive/lib/hive-jdbc-standalone.jar
Make a note of the JAR file to use. You’ll need this information in the next substep, as the value for the
YOUR-HIVE-JDBC-JAR
placeholder. -
-
Open the Pepperdata configuration file,
/etc/pepperdata/pepperdata-config.sh
, for editing. -
Add the following variable.
Be sure to substitute the actual path and filename for the
YOUR-HIVE-JDBC-JAR
placeholder.export PD_EXTRA_CLASSPATH_ITEMS=YOUR-HIVE-JDBC-JAR
-
Save your changes and close the file.
-
-
Restart the PepAgent.
You can use either the
service
(if provided by your OS) orsystemctl
command:sudo service pepagentd restart
sudo systemctl restart pepagentd