Adding Apache® Impala Query Metrics (RPM/DEB)

Supported Versions of Impala: Versions supported by any Cloudera CDH/CDP distribution that Pepperdata supports; see Pepperdata-Platform Support

Cluster administrators often need precise resource usage information, even down to the query level of detail, in order to create accurate chargeback reports. If you’re using Apache Impala, you can enable Pepperdata to collect Impala query metrics for CPU and memory usage. When the queries are finished, Pepperdata reads the Impala query profiles to calculate the resource usage.

For more information about Apache® Impala query metrics, see Monitoring Apache® Impala Query Metrics.

Pepperdata surfaces Apache® Impala query metrics for two distinct purposes: Charts and Query Spotlight.

• When you enable Impala query monitoring, the metrics appear in the Charts page.

• When you enable Impala query monitoring and Query Spotlight, the metrics appear in both the Charts and in Query Spotlight.

• If you do not enable Impala query monitoring, but you do enable Query Spotlight, Impala metrics do not appear in Query Spotlight, but Query Spotlight will monitor any other query types that you’ve configured Pepperdata to gather.

Prerequisites

Before you enable Impala query monitoring, ensure that your system meets the required prerequisites.

  • Pepperdata PepAgent (pepagentd) and PepCollector (pepcollectd) must be installed and running on all hosts on which the Impala impalad daemon is running.

Procedure

If you already added Impala query metrics for Query Spotlight, you do not need to add them again, and you should skip this procedure.
  1. On any coordinator—a host on which the Impala impalad daemon is running—open the host’s Pepperdata site file, pepperdata-site.xml, for editing.

    By default, the Pepperdata site file, pepperdata-site.xml, is located in /etc/pepperdata. If you customized the location, the file is specified by the PD_CONF_DIR environment variable. See Change the Location of pepperdata-site.xml for details.

  2. Add the properties to enable query monitoring, and (optionally) to configure a non-default location for where to read the query profiles.

    By default, the PepAgent reads profiles of completed queries from /var/log/impalad/profiles/.

    • To use the default location, omit the pepperdata.impala.query.queryLogDir property.

    • To use a different location, add the pepperdata.impala.query.queryLogDir property, and be sure to substitute your location for the your-impalad-profiles-location placeholder.

    <property>
      <name>pepperdata.impala.query.monitoring.enabled</name>
      <value>true</value>
    </property>
    
    <property>
      <name>pepperdata.impala.query.queryLogDir</name>
      <value>your-impalad-profiles-location</value>
    </property>
    
  3. (HTTPS impalad daemon endpoints) If your impalad daemon is configured for HTTPS instead of HTTP, add the pepperdata.agent.genericJsonFetch.impala.httpsEnabled property so that the fetcher for information about Impala queries in flight uses the HTTPS endpoint instead of the default HTTP endpoint (http://LOCALHOST:25000/queries?json).

    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name>
      <value>true</value>
    </property>
    
  4. (Digest authentication for the Impala Web UI for debugging) If the impalad daemon for your Impala Web UI for debugging is secured by digest authentication, add the authentication credentials.

    Be sure to substitute your username and password for the your-username and your-password placeholders in the following code snippet.

    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.http.authentication.type</name>
      <value>digest</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.auth.username</name>
      <value>your-username</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.auth.password</name>
      <value>your-password</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name>
      <value>true</value>
    </property>
    
    Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such as xmllint, after you edit any .xml configuration file.
  5. (Kerberos for the Impala Web UI for debugging) If the impalad daemon for your Impala Web UI for debugging is Kerberized, add the authentication credentials.

    Be sure to substitute your Kerberos principal and the path of the corresponding keytab file for the your-kerberos-principal and your-kerberos-keytab-pathname placeholders in the following code snippet.

    If you already configured the PD_AGENT_PRINCIPAL and PD_AGENT_KEYTAB_LOCATION environment variables during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication), except to override the cluster-level assignments.

    The fetcher properties (pepperdata.agent.genericJsonFetch.impala.kerberos.principal and pepperdata.agent.genericJsonFetch.impala.keytab.location) are inherited from the properties that were automatically assigned when you installed Pepperdata in the cluster.
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.http.authentication.type</name>
      <value>kerberos</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.kerberos.principal</name>
      <value>your-kerberos-principal</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.keytab.location</name>
      <value>your-kerberos-keytab-pathname</value>
    </property>
    <property>
      <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name>
      <value>true</value>
    </property>
    
    Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such as xmllint, after you edit any .xml configuration file.
  6. Save your changes and close the file.

  7. Restart the PepAgent.

    You can use either the service (if provided by your OS) or systemctl command:

    • sudo service pepagentd restart
    • sudo systemctl restart pepagentd

    If any of the process’s startup checks fail, an explanatory message appears and the process does not start. Address the issues and try again to start the process.

  8. Repeat steps 1–7 on every coordinator host in your cluster.

  9. Contact Pepperdata Support to request that Impala query metrics be activated for your Pepperdata dashboard.