The Queues Overview shows which queues currently have the best (shortest wait) and worst (longest wait) times to run apps. For YARN clusters, highlight tiles show the number of running and backlogged (pending) containers (tasks) for the largest and most backlogged queues, and the current and requested memory by the top five apps. Tabular data shows metrics that are applicable for the type of cluster.
In YARN, it’s queue scheduler metrics such as queue duration, active tasks, and resource usage.
In Kubernetes, it’s usage information for associated apps, such as CPU/memory/GPU used and allocated/requested.
Queues are fundamental to the operation of multi-tenant clusters, and define how resources are apportioned among users.
YARN schedules applications to run on the cluster based on the resource allocation definitions for the queues.
Kubernetes queues function more as virtual queue groups, which can be selected as breakdowns in many dashboard Overview pages and the Chargeback Report. (For information about configuring queues in Kubernetes, see Configure Labels (or the comparable page for a Supervisor version other than the latest).)
From the “left-nav” menu, select Platform Spotlight > Queues.
Click each sub-column heading in turn to sort its data to show which queues have the greatest resource usage.
Color highlights for outlier values indicate the distance (in number of standard deviations) from the column’s average value. The darker (closer to red) the color, the greater the difference in the value and the column’s average value.
Click the View as Chart control next to the name of each high-usage queue, in turn.
The Charts page appears, with metrics dependent on the environment:
(YARN clusters) The same metrics as the original Queues Overview table—the basic metrics group—plus
job current queue duration secondsand
queue info absolute used capacity, all broken down and filtered by the selected queue.
(Kubernetes clusters) The same metrics as the original Queues Overview table—selected metrics from the
App Basic Metrics,
App State Metrics, and
App GPU Metricsgroups.
With this information, you can not only determine the problem queues, but provide detailed metrics about them so that admins can make changes to reduce queue wait times and loads.