Reports

This section’s procedures describe the reports that Pepperdata provides, and explains how to configure and interpret the Chargeback and Capacity Planning reports.

  • Reports: Administrative
    Administrative reports help you with your cluster administrative tasks by providing details about hosts upload status and daily usage patterns.

  • Reports: Application Spotlight
    Application Spotlight reports help you to understand who is using a cluster, why, and how efficiently they’re using the cluster resources.

  • Reports: Platform Spotlight
    Platform Spotlight reports are system summaries that enable you to make informed decisions (for example, “do I need more hardware for a cluster?”).

  • Configuring Usage/Chargeback Costs
    Accurate Chargeback reports and display of cost-related tiles in the dashboard rely on accurately configured usage costs for physical memory (GB-hour), CPU (core-hour), and when applicable, GPU (GPU-hour). Before using the cost data from a Chargeback report or cost-related tile, you should ensure that the costs have been changed from their default values so that they are accurate for your deployment. When the cost values are changed, they remain in effect for all users on the cluster, until they are changed again.

  • Assigning Workflow Ids for Grouping and Chargeback Reporting
    A workflow Id identifies all the applications/jobs that function together for a single purpose. Grouping or filtering metrics by workflow Id enables chargeback reporting, filtering charts, viewing resource consumption by workflow, and associating Apache® Impala queries with given workflows. Some workflow schedulers automatically assign workflow Ids, but for other schedulers, you must manually configure the pepperdata.workflow.id key to enable workflow Id-related functionality.

  • Interpreting the Capacity Planning Report
    The Capacity Planning report provides resource usage data and insight into possible scheduling problems. It shows CPU, memory, I/O (disk bandwidth), and disk (storage) trends over time, across your cluster. You can use this information for future capacity planning and for spotting possible scheduling problems.