Application Spotlight Overviews & Reports

Application Spotlight shows an application-centric view of everything that’s happening in the cluster, for the selected range and filters.

Procedure

  • To show an Application Spotlight overview or report, use the “left-nav” menu, expand the App Spotlight sub-menu, and click the overview you want.

    Screenshot of App Spotlight left-nav menu

    The selected overview page appears.

Applications Overview

The Applications Overview highlights which apps are the biggest contributors to excess resource usage and poor cluster performance.

In This Section

Highlight Tiles

The tiles in the top part of the page vary by the environment, as shown in the table.

Tile YARN Kubernetes
Top CPU-Wasting Apps
Top CPU-Cost Apps
Top Memory-Wasting Apps
Top Memory-Cost Apps
Top Small-Files Apps
Top CPU-Requesting Apps
Top Memory-Requesting Apps
Top GPU-Allocating Apps
Top GPU-Wasting Apps

To best understand a tile and what its data represents, mouse over the tooltip icon in the upper-right corner of the tile (info). For example, the tooltip for the Top CPU-Wasting Apps (YARN) tile explains the chart at the top of the page and which apps are in the Top 5 list.

Screenshot of the toolitp for the Top CPU-Wasting Apps tile

To show a table of all apps (not only the worst offenders), with the same breakdowns as a tile, click the tile’s title.

Matched Apps Table

Below the tiles on the overview page, a table shows apps that match the selected range and filters. Although much of the data is the same for YARN and Kubernetes apps—such as basic app statistics and Pepperdata recommendations—there are a few differences; for example, peak resource usage for YARN, and namespace and GPU information for Kubernetes. To find problem apps, look for the color-highlighted outlier values, or sort by the columns for areas of concern (such as duration and peak memory).

If you apply a long time range filter, it’s common for the table header text to say Showing Top 1000 Items (XXXX Total) By Peak Memory (instead of XXX Items Found). This happens because Pepperdata limits the query results to 1,000 so as to improve response time and save memory.

If there are more than 1,000 matches, the 1,000 with the largest maximum Peak Memory values are shown. The XXXX Total represents the number of applications that reported Peak Memory values during the applied time range.

(Kubernetes) For Kubernetes environments, a Spark app’s status encompasses both its behavior (state) and its final status. This two-part approach is needed because sometimes when an app finishes, its Pod terminates before Pepperdata can collect the Pod status phase metric.

  • State: Either “Running” or “Finished”.

  • Status: If the Spark driver Pod terminated before Pepperdata collected the status phase metric, the app’s status appears as “—”; otherwise the status is either “Succeeded” or “Failed”.

Workflows Overview

The Workflows Overview table shows cluster resource consumption (metrics that vary by environment) broken down by workflow (multiple applications for a single purpose, and that are defined by a single workflow Id).

  • In YARN clusters, the Oozie and Hive workflow schedulers automatically assign their own workflow Ids, but to enable Pepperdata workflow-related functionality—chargeback reporting, series breakdowns in charts and tables, and grouping data in the Workflows Overview—you must manually configure a Pepperdata workflow Id; see Pepperdata Workflow Id: YARN Clusters (or the comparable page for a Supervisor version other than the latest).

    The Workflows Overview data includes CPU, memory, disk I/O (local and HDFS), and shuffle data.

  • In Kubernetes clusters, workflows launched by Apache Airflow are supported. To configure workflow Ids for Pepperdata monitoring, see Pepperdata Workflow Id: Kubernetes Clusters (or the comparable page for a Supervisor version other than the latest).

    The Workflows Overview data includes the workflow DAG (Directed Acyclic Graph), CPU, and memory.

Application Profiler Report

The Pepperdata Application Profiler uses heuristics—rules and triggering/firing thresholds against which Pepperdata compares the actual metrics values for your applications—to generate recommendations. The Application Profiler report shows every instance of a triggered/fired heuristic; each instance is identified as an incident. Incidents are assigned a severity—critical, severe, moderate, or low—depending on the difference between your application’s metrics values and the heuristic’s firing threshold.

There is not a 1:1 correspondence between heuristics and recommendations. For example, a single heuristic might have a low and a high threshold, from which Pepperdata can provide distinct recommendations such as “Too long average task runtime” and “Too short average task runtime”.

Screenshot of Application Profiler report with callouts of its features

Filter bars for customizing the time range, filtering by application type (or show all), and/or grouping by user or queue.
Heuristics that were triggered/fired for your application. To expand/collapse the details of a heuristic, click anywhere in its row.
An expanded heuristic.
  • The number of incidents at each severity is shown.
  • To see the details of the heuristic's associated metrics and the firing thresholds for each severity level, click See full explanation....
  • The Items Found table shows which group-by element—user or queue—was responsible for the incident, as well as other incident details.

Application Status Report

The Application Status Report is applicable only for YARN clusters.

The Application Status Report shows how many applications ran during the selected time range, including counts of how many app apps failed, are still running (when you selected the report—the display is static, not dynamically-updated), succeeded, or were killed. Use this report to understand trends in applications’ status over time; for example, are application failures generally clustered around certain times or days.

Screenshot of the Application Status Report with callouts of its features

To select more or fewer applications, use the page-level time range filter.
Toggle the chart display between absolute counts (stacked) and percentages.
Export the chart as a PNG image, or send an email message (from "reports@pepperdata.com") containing a screenshot of the chart, to up to 50 people.
Bar charts of the number or percentage of total apps for each app status: failed, running, succeeded, and killed. To show a popup with time and count/percentage detail numbers, mouse over the bars.
Chart legend.
  • To highlight the chart bars for a given status, mouse over the status label in the legend.
  • To toggle the chart bars off/on for a given status, click the status label in the legend.

Reference: Elements of an Overview Page

Overview pages share a common format for displaying information, and provide similar navigation controls.

Screnshot of spotlight overview page, with callouts for navigation elements

Title of the overview page. The title matches the name of the currently selected left-nav's Spotlight menu item; for example, selecting App Spotlight > Applications shows the Applications Overview.
Hide/show filter bars and, for pages with highlight tiles, hide/show highlight tiles.
Filter Bars; for details, see Filter Bars.
Highlight tiles (not in all overviews), filtered per the current filter bar settings.
Table of items that meet the applied filters' criteria; for details about table elements and filtering, see Tables.