Monitoring Apache® Impala Query Metrics (YARN)

Supported Versions of Impala: Versions supported by any Cloudera CDH/CDP distribution that Pepperdata supports; see Pepperdata-Platform Support

Monitoring of Apache® Impala Query Metrics is applicable only for YARN clusters.

Cluster administrators often need precise resource usage information, even down to the query level of detail, in order to create accurate chargeback reports. If you’re using Apache Impala, you can enable Pepperdata to collect Impala query metrics for CPU and memory usage. When the queries are finished, Pepperdata reads the Impala query profiles to calculate the resource usage.

Typical Use Cases

The Pepperdata dashboard shows charts and tables of data about your Impala queries. You can:

  • Determine which Impala queries are resource hogs.

  • Learn how a query impacts the system by filtering the metrics on a per Query, DB, connection user, fragment, and so on basis.

  • Determine who (database or connection user) contributes the biggest impact over time, to the system as a whole, by monitoring the Chargeback metrics—aggregate CPU/memory usage numbers per database or connection user).

Metrics for Impala Queries In Flight

Pepperdata collects metrics for Impala queries in flight, which provide information about the query’s state (CREATED, INITIALIZED, COMPILED, RUNNING, FINISHED, and EXCEPTION). In flight queries refer to currently running queries as reported in the queries page of the Impala impalad daemon’s debug web UI at http://impalaserverhostname:25000/queries. These metrics enable you to create alarms and alerts, such as queries in the RUNNING state for more than a given amount of time and too many queries are in the EXCEPTION state for the last 10 minutes.

The table describes the queries in flight metrics.

Metrics for Impala Queries In Flight
Metric Name Description
impala.in_flight_queries.duration-secs Current duration: difference between now and when the query began.
impala.in_flight_queries.progress‐percent Progress through the Impala SQL statement. For SQL queries, this represents how many of the target rows of the table(s) have already been processed.
impala.in_flight_queries.rows‐fetched Number of rows in the query result set.
impala.in_flight_queries.state Enumeration of possible states of a query:
  • CREATED = 1
  • INITIALIZED = 2
  • COMPILED = 3
  • RUNNING = 4
  • FINISHED = 5
  • EXCEPTION = 6
impala.in_flight_queries.waiting User-controlled flag to indicate that a query's execution is finished and is waiting for manual inspection and resource cleanup.
impala.in_flight_queries.waiting‐time‐secs Length of time that the query's impala.in_flight_queries.waiting flag has been true.

Related Information

  • For information about creating alarms from the applicable metrics’ charts, see Create Alarms From a Chart View.

  • For information about the Impala impalad daemon’s debug web UI, refer to Queries Page  or the comparable page for your version of Impala.

  • For one approach for using the impala.in_flight_queries.waiting and impala.in_flight_queries.waiting‐time‐secs metrics, refer to the impala-user mailing list archives message, queries “waiting to be closed” .

Show Chart View of Impala Query Metrics Data

To display a group of Impala query metrics, navigate to the dashboard’s Charts page, and use the Metrics filter bar to search for “Impala”. To show a single metric, select it. To show all the metrics in a group, select the All… checkbox for the group. After you show the metrics, you can proceed as usual to optionally select breakdowns and apply filters.

Procedure

  1. In the left-nav menu, select Charts.

  2. Choose the metric(s) that you want.

    1. In the filter bar, click Metrics.

    2. In the search box, clear any previously selected metrics, and enter the search term, “impala”.

    3. Select the metric(s) that you want to see.

  3. Select the Impala container type.

    1. In the filter bar, click Breakdown By.

    2. Select the Container Type series breakdown, and filter it for Impala by clicking the drop down list and clearing all the container types except Impala.

  4. (Optional) Select additional breakdowns and apply additional filters.

  5. Click Apply.

Show Tabular View of Impala Query Metrics Data

To show tabular data of Impala metrics that are grouped by Impala database or Impala query, first show the charts, and then switch to the table view. To highlight issues, such as queries that took a lot of resources, sort the tables by memory usage, or CPU runtime. You can also sort the tables by database or query, which lets you focus on specific databases or queries of interest.

Procedure

  1. Show the charts of Impala query metrics, and filter as you want; see Show Charts of the Impala Query Metrics.

  2. In the upper-right, click View as Table.

Filters and Breakdowns for Impala Query Charts

As with other metrics charts that you view on the Pepperdata dashboard, you can filter Impala query charts for specific series of interest, such as host or user; exclude a series you’re not interested in; or explicitly include a series that is filtered out by default. Likewise, you can filter Impala query charts by Impala-specific criteria, such as query, query state, Impala database on which the query was run, and so on. You can specify regular text for exact matching or use regular expressions to match patterns.

For detailed instructions for applying filters, see Filter the Charts & Tables by Dimensions: Hosts, Users, Etc..

The table describes the Impala-specific series breakdowns.

Breakdown Description
Query State Final state of the query (when it finishes): FINISHED, UNKNOWN, or EXCEPTION
Impala DB Database on which the query was run
Impala ConnUser Connected user; if the query is run from an external client (for example, Apache Hue), the connected user could be different from the user
Impala Query ID of the query
Impala Fragment ID of the fragment, which is a smaller unit of work that is distributed across the cluster
Impala Instance ID of the subtask of a fragment