Configure Query Spotlight: Impala (RPM/DEB)

Prerequisites

Before you begin configuring Query Spotlight for Impala query monitoring, ensure that your system meets the required prerequisites.

Task 1: Enable Fetching of Impala Query Data

To enable Pepperdata to fetch data from the Impala query data, add the required variables to the Pepperdata configuration.

If you already added Impala query metrics for Charts, you do not need to add them again, and you should skip this task.

Procedure

  1. On any coordinator—a host on which the Impala impalad daemon is running—open the host’s Pepperdata site file, pepperdata-site.xml, for editing.

    By default, the Pepperdata site file, pepperdata-site.xml, is located in /etc/pepperdata. If you customized the location, the file is specified by the PD_CONF_DIR environment variable. See Change the Location of pepperdata-site.xml for details.

  2. Add the properties to enable query monitoring, and (optionally) to configure a non-default location for where to read the query profiles.

    By default, the PepAgent reads profiles of completed queries from /var/log/impalad/profiles/.

    • To use the default location, omit the pepperdata.impala.query.queryLogDir property.

    • To use a different location, add the pepperdata.impala.query.queryLogDir property, and be sure to substitute your location for the your-impalad-profiles-location placeholder.

    <property>
      <name>pepperdata.impala.query.monitoring.enabled</name>
      <value>true</value>
    </property>
    
    <property>
      <name>pepperdata.impala.query.queryLogDir</name>
      <value>your-impalad-profiles-location</value>
    </property>
    
  3. (HTTPS impalad daemon endpoints) If your impalad daemon is configured for HTTPS instead of HTTP, add the pepperdata.agent.genericJsonFetch.impala.httpsEnabled property so that the fetcher for information about Impala queries in flight uses the HTTPS endpoint instead of the default HTTP endpoint (http://LOCALHOST:25000/queries?json).

    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name>
      <value>true</value>
    </property>
    
  4. (Digest authentication for the Impala Web UI for debugging) If the impalad daemon for your Impala Web UI for debugging is secured by digest authentication, add the authentication credentials.

    Be sure to substitute your username and password for the your-username and your-password placeholders in the following code snippet.

    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.http.authentication.type</name>
      <value>digest</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.auth.username</name>
      <value>your-username</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.auth.password</name>
      <value>your-password</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name>
      <value>true</value>
    </property>
    
    Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such as xmllint, after you edit any .xml configuration file.
  5. (Kerberos for the Impala Web UI for debugging) If the impalad daemon for your Impala Web UI for debugging is Kerberized, add the authentication credentials.

    Be sure to substitute your Kerberos principal and the path of the corresponding keytab file for the your-kerberos-principal and your-kerberos-keytab-pathname placeholders in the following code snippet.

    If you already configured the PD_AGENT_PRINCIPAL and PD_AGENT_KEYTAB_LOCATION environment variables during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication), except to override the cluster-level assignments.

    The fetcher properties (pepperdata.agent.genericJsonFetch.impala.kerberos.principal and pepperdata.agent.genericJsonFetch.impala.keytab.location) are inherited from the properties that were automatically assigned when you installed Pepperdata in the cluster.
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.http.authentication.type</name>
      <value>kerberos</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.kerberos.principal</name>
      <value>your-kerberos-principal</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.keytab.location</name>
      <value>your-kerberos-keytab-pathname</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name>
      <value>true</value>
    </property>
    
    Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such as xmllint, after you edit any .xml configuration file.
  6. Save your changes and close the file.

  7. Restart the PepAgent.

    You can use either the service (if provided by your OS) or systemctl command:

    • sudo service pepagentd restart
    • sudo systemctl restart pepagentd

    If any of the process’s startup checks fail, an explanatory message appears and the process does not start. Address the issues and try again to start the process.

  8. Repeat steps 1–7 on every coordinator host in your cluster.

  9. Contact Pepperdata Support to request that Impala query metrics be activated for your Pepperdata dashboard.

Task 2: (Optional) Encrypt the Connect String for the Hive Metastore

If you want to encrypt the connect string for the Hive metastore, regardless of whether you’ll store it in the Pepperdata site file or an external file, use the Pepperdata password encryption script.

At a minimum, the unencrypted connect string must include the jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/ string. You can add as many connection properties/parameters as you need for your environment, separating them with a semicolon, ;.

Example Connect Strings

  • Without properties/parameters: jdbc:hive2://localhost:10000/
  • Add properties for authenticated environments: jdbc:hive2://localhost:10000/;user=YOUR-USERNAME;password=YOUR-PASSWORD
  • Multiple properties/parameters: jdbc:hive2://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hiveserver2_namespace>

Procedure

  1. Run the Pepperdata encryption script.

    /opt/pepperdata/supervisor/encrypt_password.sh

  2. At the Enter the password to encrypt: prompt, enter your connect string.

  3. Copy (or make note of) the resulting encrypted connect string.

    For example, in the following output from the script, the encrypted connect string is the string W+ONY3ZcR6QLP5sqoRqcpA=2.

    Encrypted password is W+ONY3ZcR6QLP5sqoRqcpA=2

Use this encrypted result as the value for pepperdata.jdbcfetch.hive.connect.string.encrypted (which you’ll configure later), or store it in the external file specified by the pepperdata.jdbcfetch.hive.connect.string.encrypted.file property.

Task 3: Enable Fetching of Hive Databases and Tables’ Metadata

To enable Pepperdata to fetch data from the Hive metastore, add the required variables to the Pepperdata configuration.

Procedure

  1. On any of the hosts that are configured to be a Hive client (and from which you launch Hive queries), open the Pepperdata site file, pepperdata-site.xml, for editing.

    It’s sufficient to add the variables to a single Hive client host. But if you want to replicate the configuration on every host—perhaps to ease configuration management—that is okay, too.

    By default, the Pepperdata site file, pepperdata-site.xml, is located in /etc/pepperdata. If you customized the location, the file is specified by the PD_CONF_DIR environment variable. See Change the Location of pepperdata-site.xml for details.

  2. Add the property to configure the hostname.

    Be sure to substitute your fully-qualified, canonical hostname for the YOUR.CANONICAL.HOSTNAME placeholder in the following code snippet.

    <property>
      <name>pepperdata.jdbcfetch.hive.pepagent.host</name>
      <value>YOUR.CANONICAL.HOSTNAME</value>
      <description>Host where the fetching should be enabled.</description>
    </property>
    
  3. Configure the connect string.

    Add one of the following properties, depending on your environment and security requirements.

    Be sure to substitute your information for the YOUR... placeholders.

    • Plain text connect string stored in the Pepperdata site file.

      At a minimum, the connect string must include the jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/ string. You can add as many connection properties/parameters as you need for your environment, separating them with a semicolon, ;.

      Example Connect Strings

      • Without properties/parameters: jdbc:hive2://localhost:10000/
      • Add properties for authenticated environments: jdbc:hive2://localhost:10000/;user=YOUR-USERNAME;password=YOUR-PASSWORD
      • Multiple properties/parameters: jdbc:hive2://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hiveserver2_namespace>
      <property>
        <name>pepperdata.jdbcfetch.hive.connect.string</name>
        <value>jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/${;OPTIONAL-ADDITIONAL-PROPERTY}</value>
        <description>JDBC Connect string to be used.</description>
      </property>
      
    • Plain text connect string stored in an external file:

      <property>
        <name>pepperdata.jdbcfetch.hive.connect.string.file</name>
        <value>YOUR-PATH-TO-JDBCSTRING-FILE</value>
        <description>Path to file containing JDBC Connect string.</description>
      </property>
      
    • Encrypted connect string—the result from encrypting the string earlier in the configuration procedure—stored in the Pepperdata site file:

      <property>
        <name>pepperdata.jdbcfetch.hive.connect.string.encrypted</name>
        <value>YOUR-ENCRYPTED-TEXT</value>
        <description>Encrypted JDBC Connect string to be used.</description>
      </property>
      
    • Encrypted connect string—the result from encrypting the string earlier in the configuration procedure—stored in an external file:

      <property>
        <name>pepperdata.jdbcfetch.hive.connect.string.encrypted.file</name>
        <value>YOUR-PATH-TO-JDBCSTRING-FILE</value>
        <description>Path to file containing encrypted JDBC Connect string.</description>
      </property>
      
  4. (Kerberized Clusters) If the hiveserver2 service is Kerberized, add the properties for the Kerberos principal and keytab to the Pepperdata site file.

    1. Enable fetching from a Kerberized Hiveserver2.

      <property>
        <name>pepperdata.jdbcfetch.hive.kerberos.enabled</name>
        <value>true</value>
        <description>Should kerberos be used when connecting to Hive?</description>
      </property>
      
    2. Configure the principal and keytab.

      If you already configured the PD_AGENT_PRINCIPAL and PD_AGENT_KEYTAB_LOCATION environment variables during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication), you do not need to configure them again, and you should skip this substep.

      Be sure to substitute your information for the YOUR... placeholders.

      <property>
        <name>pepperdata.jdbcfetch.hive.kerberos.principal</name>
        <value>YOUR_PRINICPAL/HOST@DOMAIN.COM</value>
        <description>The Kerberos principal to use to authenticate with the Hive client.</description>
      </property>
      
      <property>
        <name>pepperdata.jdbcfetch.hive.kerberos.keytab.location</name>
        <value>YOUR-PATH-TO-KEYTAB-FILE</value>
        <description>Path to the keytab file for the specified principal.</description>
      </property>
      
  5. Validate the XML snippets that you added.

    Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such as xmllint, after you edit any .xml configuration file.
  6. Save your changes and close the file.

  7. Add the hive-jdbc-*standalone.jar JAR file to the PepAgent’s classpath on the host that you selected in step 1.

    1. Find the fully-qualified name of the JAR, which depends on the cluster’s distro.

      • The filename pattern is hive-jdbc-*standalone.jar.

      • The location depends on the distro; for example, in Cloudera CDH/CDP Private Cloud Base, RPM/DEB installations, the path is /usr/lib/hive/lib/

      • You can use the find command to locate all available JAR files, and output their names to the console; for example:

        find /opt/cloudera/parcels/CDH/jars/ /usr/lib/hive/lib/ /usr/lib/hive/jdbc/ -name "hive-jdbc-*standalone.jar" 2>/dev/null
        /usr/lib/hive/lib/hive-jdbc-standalone.jar
        

      Make a note of the JAR file to use. You’ll need this information in the next substep, as the value for the YOUR-HIVE-JDBC-JAR placeholder.

    2. Open the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh, for editing.

    3. Add the following variable.

      Be sure to substitute the actual path and filename for the YOUR-HIVE-JDBC-JAR placeholder.

      export PD_EXTRA_CLASSPATH_ITEMS=YOUR-HIVE-JDBC-JAR
      
    4. Save your changes and close the file.

  8. Restart the PepAgent.

    You can use either the service (if provided by your OS) or systemctl command:

    • sudo service pepagentd restart
    • sudo systemctl restart pepagentd