If you’re just starting troubleshooting because things seem awfully slow, it may be more useful to identify resource bottlenecks or to find problem hosts before you look for problem users. But if you already know, for example, that a YARN cluster is running out of CPU or memory, you can use the Users table to identify which user has a high CPU usage rate.
The Users Overview highlights which users are the biggest resource hogs by showing the cluster resource consumption—used and requested metrics for CPU and memory; and for YARN, disk I/O (local and HDFS) and shuffle data—broken down by user.
When viewing the data, be sure to note the units, which are often different for the YARN and Kubernetes clusters.
For example, in YARN the user CPU is expressed as a percentage of the monitored components CPU time that was consumed by the given user.
But in Kubernetes, CPU used and CPU requests are expressed as Kubernetes cpu units —normalized, aggregated values representing the portion of CPU requested across all cores.
.1 means one-tenth of a single CPU core,
1 means 100% of a single core, and
10 means the equivalent of 10 CPU cores.
(Kubernetes) In Kubernetes clusters, you can assign or change the label that Pepperdata uses as an abstraction for attributes such as an application’s user; see Configure Labels (or the comparable page for a Supervisor version other than the latest).
From the “left-nav” menu, select Platform Spotlight > Users.
Click each sub-column heading in turn to sort its data to show which users have the greatest resource usage.
Color highlights for outlier values indicate the distance (in number of standard deviations) from the column’s average value. The darker (closer to red) the color, the greater the difference in the value and the column’s average value.
Click the View as Chart control next to the username of each high-usage user, in turn.
The resultant Charts page shows the same metrics as the original Users Overview table, but is broken down and filtered by the selected user (for YARN clusters) or (for Kubernetes clusters) by the given Pod and selected user.
With this information, you can not only determine the problem users, but provide detailed metrics to them so that they can make changes to reduce their system load.