What’s New

This What’s New document describes recently added Pepperdata features as seen from the dashboard perspective, significant documentation updates, major changes to existing functionality, and important changes to your workflow such as revised menus and navigation in the Pepperdata dashboard.

For Supervisor release-specific details—enhancements and bug fixes—see the Pepperdata Release Notes.

October 28, 2022

  • General

Adds a global cluster filter to the Pepperdata dashboard. The filter is sticky across all the pages in EMR and Dataproc realms – you don’t need to set it each time you change pages. This makes it easy to view the dashboard for a specific cluster.

Highlights of this feature:

  • Renames the existing Cluster filter to Realm.
  • Adds a new Cluster filter to the top navigation to enable filtering on the cluster name or cluster ID.
  • Cluster filter is sticky across all pages.
  • Adds a Clusters Overview page to Platform Spotlight.
  • Adds a date range filter to the dashboard landing page.

The following pages do not support the new cluster filtering: Query Spotlight, Application Profiler, Application Status, Alarms, and a few reports.

  • Platform Spotlight

Fixes a bug occurring when you create alarms in encrypted realms.

  • Supervisor

Supervisor 7.1.12 provides the following (see Pepperdata Release Notes for details):

  • Calculates Pepperdata resources more accurately when Capacity Optimizer is enabled on Kubernetes clusters.

  • Fixes bugs concerning license expiration, Helm support, and mutating pods.

September 30, 2022

  • Application Spotlight

The Kubernetes dashboard now displays both the Spark application name and the workload name on the Applications Overview and Details pages.

Application search now works properly on encrypted dashboards.

  • Capacity Optimizer

Adds the ability to compare configurations and download the comparison as a table in the cluster comparison tab of the Capacity Optimizer page.

Adds autoscaling-related configurations to the Charts page of the Kubernetes dashboard.

  • Query Spotlight

The Hive explain plan no longer uses the job duration instead of the stage duration for stages.

The Hive on MapReduce query details page now displays the interstage delay.

Deprecates Impala memory bytes-seconds and adds the Impala peak memory (max downsample) metric.

  • Platform Spotlight

The filter bar on the charts page is now sticky. It no longer rolls off the page when you scroll down.

  • Supervisor

New versions 7.1.10 and 7.1.11 provide the following (see Pepperdata Release Notes for details):

  • Optimizing autoscaling on Microsoft Azure AKS clusters.

  • Simpler port use by PepAgents.

  • Additional cloud instance metadata for EMR.

  • Bug fixes.

August 26, 2022

  • Capacity Optimizer

Adds a cluster comparison feature on the Capacity Optimizer page to compare the optimization metrics for two time ranges of a cluster or two clusters in a cloud realm.

  • Application Spotlight

Fixes an issue where some config values were still displayed as encrypted in the application details page although the encryption key was applied to the dashboard.

Adds search capability in the configuration section of the application details page.

  • Platform Spotlight

Makes each metric’s section of the Metric Definitions page collapsible for easy navigation.

  • Supervisor

New minor version 7.1.8 and update 7.1.9 provide the following (see Pepperdata Release Notes for details):

  • Support for autoscaling for Amazon EKS clusters.

  • Support for optimizing more types of workloads on Amazon EKS clusters.

  • Several bug fixes.

July 29, 2022

  • Query Spotlight

Adds more resource usage graphs to the hive query details page.

  • Application Spotlight

Adds HBase metric definitions in the metrics definition page.

  • Capacity Optimizer

(Amazon EKS clusters) Adds the Instance Hours tile to the Capacity Optimizer Overview page, to provide the number of (raw) instance-hours used.

Fixes a bug that displayed the status of Capacity Optimizer on the Capacity Optimizer page incorrectly.

Moves optimization related metrics under a new metric category called Optimization in the Metrics selection dropdown in the Charts page and renames the following metrics:

  • Is Capacity Optimizer: Nodes with Capacity Optimization enabled.
  • Capacity Optimizer base size in bytes: Base Pepperdata memory allocations capacity.
  • Capacity Optimizer base cores: Base Pepperdata core allocations capacity.
  • Capacity Optimizer current size in bytes: Revised current Pepperdata memory allocations capacity.
  • Capacity Optimizer current cores: Revised current Pepperdata cpu allocations capacity.
  • Capacity Optimizer used size in bytes: App-claimed Pepperdata memory allocations.
  • Capacity Optimizer used cores: App-claimed Pepperdata core allocations
  • Capacity Optimizer percentage of memory free: Free memory percent as seen by Capacity Optimizer.
  • Capacity Optimizer CPU idle time permil: Idle CPU time permil as seen by Capacity Optimizer.
  • Capacity Optimizer disk idle time permil: Idle disk time permil as seen by Capacity Optimizer.
  • Is pessimistic mode enabled: Pessimistic mode enabled.
  • Pepperdata created containers: Estimated count of extra containers using memory allocations added by Pepperdata.
  • Capacity Optimizer removed: Estimated number of container removal attempts by Pepperdata.

  • Supervisor v7.0.13 release

Supervisor v7.0.13 provides usability and downscaling optimization enhancements for Amazon EMR clusters; and fixes two autoscaling optimization related issues in Capacity Optimizer.

June 24, 2022

  • Autoscaling optimization in Capacity Optimizer (YARN)

    • (YARN clusters) Adds metrics to the dashboard for autoscaling optimization statistics.

      To view the applicable metrics, navigate to the Charts page, and search for “autoscale”. The new metrics are in the Optimization metrics category > AutoScale metrics sub-category > AutoScale Configs metrics group:

      • Is Pepperdata autoscaling enabled
      • Majority node percentage
      • Cluster max nodes threshold ratio
      • Max allocation percentage allowed
      • Inaction timeout
      • First batch number of task nodes
      • Scaleup factor
      • Is partial instance group support enabled
      • Is Pepperdata autoscaling active
      • Resource threshold percentage
    • Consolidates the previously-separate configuration procedures for Capacity Optimizer, which were for environments with and without automatic scaling enabled, into a single procedure that is applicable for both environments; see Configure Capacity Optimizer (Cloud) (or corresponding page for Pepperdata Supervisor versions newer than v7.0).

  • Supervisor v7.0.12 release

    New Supervisor v7.0.12 adds Query Spotlight support for Kerberized impalad daemons for the Impala Web UI for debugging; for cloud environments, changes Capacity Optimizer’s default for pessimistic mode from enabled to disabled, and fixes an edge-case bug that caused premature triggering of pessimistic mode operation.

  • Documentation

    Adds the procedure for enabling Pepperdata to find a custom Certificate of Authority (CA) file that contains custom or non-standard certificates/chains (such as self-signed certificates) that are not included in the set of standard certificates typically included in internet browsers.

    See Configure a Custom Certificate of Authority (or corresponding page for Pepperdata Supervisor versions newer than v7.0).

May 27, 2022

  • Capacity Optimizer

    (Amazon EMR and Google Dataproc clusters) Adds the Instance Hours tile to the Capacity Optimizer Overview page, to provide the number of (raw) instance-hours used.

  • Spark on Kubernetes applications

    Adds observability of all services/workloads that are running on a Kubernetes cluster.

    To view the applicable metrics, navigate to the Charts page, and search for “Controller”.

    Screenshot of the Kubernetes controller metrics in the Chart search results

  • Supervisor v7.0.11 release

    New Supervisor v7.0.11 provides usability and Pepperdata debugging enhancements for Amazon EMR clusters; fixes edge case bugs in Capacity Optimizer; and fixes a Query Spotlight configuration issue for specifying Presto authentication.

  • Documentation

    Aligns terminology with industry standard usage:

    • Globally changes references from “HDaaS (Hadoop as a Service)” to “cloud”.
    • Changes references from “Pepperdata managed autoscaling in Capacity Optimizer” to “autoscaling optimization.”
    • Globally changes references from “real-time” to “near real-time”.

    Although page titles have changed, the URLs remain the same.

April 29, 2022

  • Platform and System Support

    • Qualifies Query Spotlight v7.0.x support for Trino Release 370 on Cloudera CDP Private Cloud Base 7.2.4.

    • New Supervisor v7.0.8 release adds support for:

      • Apache Spark 3.2.x
      • Amazon EMR 6.5.x
      • Amazon EMR 5.34.x
  • Capacity Optimizer

    • (YARN, multi-cluster realms) Adds the Filter By filter to the Capacity Optimizer Overview page, which enables you to show data for any one cluster in the multi-cluster realm.

    • New Supervisor v7.0.9 release:

      • (Kubernetes clusters) Adds capability to enable honoring the Kubernetes Guaranteed Quality of Service (QoS) class.

      • (Pepperdata autoscaling optimization) By default, effectively disables the maximum calculation interval (inactionTimeout).

      • (Kubernetes clusters) Fixes an issue that prevented configuration changes from being picked up on a helm upgrade command on the PepAgent Helm chart.

  • Application Spotlight

    In the App Details page, adds top-level tabs for directly accessing the Spark Details Report (for Spark apps), and the Cluster Share Report (for apps of any type).

    Screenshot of Top-Level Tabs in the App Details Page

    You can still use the original navigation from an application’s App Details page:

    • Spark Details Report. From the lower-left set of tabs—the Issues section of the page—select the Spark tab, and click   Analyze Spark App.

    • Cluster Share Report. From the right side of the page header, in the Cluster Share tab, click See Cluster Share Report.

  • Query Spotlight

    • In the Query Details page, adds start latency—the interval between the start time of a Hive query session and when the query spawns its first YARN job/stage—to the TIME section of the header.

    • Adds File bytes read/written and HDFS bytes read/written to the explain plan visualization (which you show by selecting the Explain tab in a Query Details page).

March 25, 2022

  • Platform Spotlight

    Qualifies Supervisor v7.0.x support for HPE Ezmeral Container Platform 5.3.x.

  • Application Spotlight

    • (Technical Preview) Adds the Application Status Report to the dashboard, to show how many applications ran during the selected time range, including counts of how many app apps failed, are still running (when you selected the report—the display is static, not dynamically-updated), succeeded, or were killed. Use this report to understand trends in applications’ status over time; for example, are application failures generally clustered around certain times or days.

      For details, see Application Status Report.

  • Capacity Optimizer

    (Technical Preview) Adds the Kubernetes’ Capacity Optimizer Overview page to the dashboard. For details about this feature, see Capacity Optimizer Overview in the Pepperdata Dashboard (or corresponding page for Pepperdata Supervisor versions newer than v7.0).

  • Dashboard

    (Technical Preview) Adds a New Chargeback Report. Although similar to the Legacy Chargeback Report, the new report:

    • Merges allocated and used data for a given resource’s columns (memory or CPU) into a single column. If data for allocated metrics are available, that is what’s shown. Otherwise, the used data is what’s shown.

    • Adds data for all queries.

    • Moves Hive query data that was associated with a given Hive user to the actual, real (human, Pepperdata) user who ran the query. No breakdowns are shown for a user if that user did not run any queries or YARN apps; for example, if the user ran only standalone Spark apps (without YARN).

    To show the New Chargeback Report: use the left-nav menu; select Reports > Chargeback Report; and below the page’s filter bars (but above the Chargeback Report table), click New Chargeback Report.

  • Supervisor releases

    • New Supervisor v7.0.7 adds (1) EMR exponential backoff and jitter retry logic for describe-cluster commands; (2) fixes an issue that caused Pepperdata to overlook nodes when their fully-qualified hostnames are different in the ResourceManager and the AWS API server; and (3) fixes a Supervisor v7.0.5-introduced issue that caused PepAgents to crash in certain cloud-Hive environments.

    • New Supervisor v6.5.29 adds (1) EMR exponential backoff and jitter retry logic for describe-cluster commands; and (2) fixes an issue that caused Pepperdata to overlook nodes when their fully-qualified hostnames are different in the ResourceManager and the AWS API server.

February 25, 2022

  • Dashboard

    • Moves the usage/Chargeback costs configuration from the Chargeback Report page to the Cluster Config page. For procedure details, see Configuring Usage/Chargeback Costs (version v7.1).

    • Adds allocation charge metrics to the App History tab in the App Details page.

  • Platform Spotlight

    Adds clickable links to alarm notification email messages for the detail pages of apps for which the alarm fired. This addition saves you from having to copy-and-paste the App Id from the email message into the Applications Overview filter bar and manually searching.

  • Query Spotlight

    Adds more statistics to the gathered Impala metrics. To chart these metrics, navigate to the Charts, and search for the Impala Stats metrics group.

    • Impala Bytes Read DataNodeCache Name (impalaStats.bytesReadDataNodeCacheName): The total number of bytes read from data node cache

    • Impala Cache File Handles Hit Count (impalaStats.cacheFileHandlesHitCount): Total number of file handles opened where the file handle was present in the cache

    • Impala Cache File Handles Miss Count (impalaStats.cacheFileHandlesMissCount): Total number of file handles opened where the file handle was not in the cache

    • Impala Data Cache Hit Bytes (impalaStats.dataCacheHitBytes): Total bytes of data cache hit

    • Impala Data Cache Hit Count (impalaStats.dataCacheHitCount): Total count of data cache hit

    • Impala Data Cache Miss Bytes (impalaStats.dataCacheMissBytes): Total bytes of data cache miss

    • Impala Data Cache Miss Count (impalaStats.dataCacheMissCount): Total count of data cache miss

    • Impala Data Cache Partial Hit Count (impalaStats.dataCachePartialHitCount): Total count of data cache partial hit

  • Supervisor v7.0.x releases

    • New Supervisor v7.0.2 major release adds Capacity Optimizer on Kubernetes, adds support for ARM64 on Amazon EMR for Graviton/Graviton 2 instances, drops support for distributions that are no longer supported by the vendor and for distributions based on Hadoop 2.8.x and earlier, and drops support for components associated with now-unsupported distributions.

      As well, this release migrates Pepperdata software to internally use Python 3 instead of Python 2. The sunset date for Python 2 was January 1, 2020, and it is no longer supported nor receiving security updates. The migration created significant changes, which motivated the update to a new major Supervisor release.

    • Subsequent Supervisor v7.0.5 maintenance release adds the ability to tune Capacity Optimizer on Kubernetes.

  • Supervisor CSD (customer service descriptor) 3.1.2

    New CSD 3.1.2 minor release adds service configuration parameters for easily:

    • Securing (via HTTPS) the PepAgent and Supervisor web interfaces in clusters that are enabled for Auto-TLS.

    • Configuring JobHistory monitoring for Application Spotlight.

    • Enabling debug logging for PepAgent fetchers (Kafka, HBase, and so on).

January 28, 2022

  • Platform Spotlight. Qualifies support for Cloudera Data Platform (CDP) Public Cloud 7.2.2–7.2.10 (inclusive).

  • Query Spotlight. (Technical Preview) Adds a Comparing Queries page to the dashboard. For details about this feature, see Comparing Queries.

  • Dashboard. For charts pages, replaces the “Charts” text in browser tabs and bookmarks with an explanatory string to highlight the applied breakdown(s), time range, and metrics.

    The image shows the before (impossible to distinguish one bookmark’s contents from another) and the after (meaningful, identifiable information about the Charts page contents).

    Screenshot of original bookmarks text Screenshot of improved bookmarks text
  • Tez Applications in Application Spotlight. Adds support for the ENDED Tez application state.

  • Capacity Optimizer. The Capacity Optimizer Overview page is out of Technical Preview status, and released for general availability (GA). For details, see Capacity Optimizer Overview in the Pepperdata Dashboard (or corresponding page for Pepperdata Supervisor versions newer than v6.5).

December 22, 2021

  • Spark Applications in Application Spotlight

    Improves performance of displaying the Spark App Details Report for applications that experienced a large number of failed stages.

    Instead of displaying all the stages at once, the 100 longest stages and stages with errors are initially shown. To show the remaining stages, 100 at a time, click Show 100 more stages.

  • Query Spotlight

  • Platform Spotlight

    • Qualifies support for Amazon EMR 5.33.0, EMR 6.3.x, and EMR 6.4.x.

    • Qualifies support for Amazon EKS—Kubernetes 1.21 and Microsoft AKS—Kubernetes 1.17–1.21.

    • (EMR) Adds support for running Command Line Interface (CLI) commands through a proxy, and for custom AWS CLI API endpoints.

    • (HDFS Data Temperature Report) Adds Warm Files Reports and Hot Files Reports, in the same format as the Cold Files Reports.

November 24, 2021

  • Spark on Kubernetes applications

    • Adds waste data that can be used for chargeback reporting, which you can leverage to reduce cluster resource waste.

      • The Top CPU-Wasting Apps and Top Memory-Wasting Apps tiles are now applicable for Kubernetes clusters (in addition to YARN clusters).

      • To be consistent with the other GPU-related charts (and Kubernetes in general), changes the Top GPU-Wasting Apps tile to use absolute values instead of percentages.

    • CPU waste, memory waste, and GPU waste data is now included in the Chargeback report (as it already was for YARN clusters).

    • CPU waste, memory waste, and GPU waste are now included in the tabular data in the following overviews (as it already was for YARN clusters):

      • Applications Overview
      • Waste Overview
    • (GPU-using Spark applications) Adds GPU allocated and used data to the following Platform Spotlight pages:

      • Namespaces Overview
      • Pods Overview
  • Platform Spotlight

    • (YARN) In the Pepperdata dashboard Home page (the cluster view), renames the YARN Resources section to YARN and HDFS Resources, and adds a new tile for HDFS Usage.

      The HDFS Usage tile appears only if you are using Supervisor 6.5.8 or later.
    • (HBase 2.x) New Supervisor v6.5.24 maintenance release adds support for configuring independent HTTP service policies for HBase and YARN by configuring the new pepperdata.agent.hbase.http.policy property. For details, see Pepperdata Release Notes: v6.5.24.

  • Query Spotlight

    Adds Pepperdata recommendations for Hive on Tez queries:

    • Excessive GC duration
    • Imbalanced work across Mappers
    • Imbalanced work across Reducers

    For details, see Hive on Tez Recommendations.

October 29, 2021

  • Query Spotlight

    • Adds the Largest by Partition Count tile to the Databases Overview (which is displayed by selecting Query Spotlight > Databases) and the Database Details (which appears when you click a database link in a Databases Overview’s tile or its table of databases).

    • Reorganizes the Table Details page: table details and daily trends appear in the default top-level tab (Details), and information about the table’s queries is moved to a new, top-level Queries tab.

      • To navigate to the Details page, select Query Spotlight > Databases > a database link in a tile or the table of databases > a table link in a tile or the table of (database) tables.

        The page provides summary information about the database table, and daily trend charts for partition count, row count, file count, total size, and raw data size.

      • The Queries page is accessed by the Queries top-level tab that’s alongside the Details tab.

        The page shows the same information about the table’s queries that was previously shown on the Table Details page.

  • Application Spotlight

    • Adds ability to filter the dashboard’s Applications Overview page by Running applications. For details about filtering the pages in the dashboard displays, see Breakdowns and Filters.

    • Improves the Resource Usage display in the App Details page by removing distracting, non-actionable details; adding clarifying text; and moving the link for the Cluster Share Report (formerly called Cluster Weather) to the header section’s Cluster Share tab so that it’s more generally accessible.

    • New Supervisor v6.5.23 adds Pepperdata recommendations for Streaming Spark applications.

      For details, see Pepperdata Release Notes: v6.5.23.

  • Platform Spotlight

    • Qualifies support for Google Dataproc 2.0.x-debian10.

    • (YARN clusters) Adds a YARN Resources section to the Pepperdata dashboard (home page), with tiles for CPU and Memory.

      • The tiles show YARN usage level relative to base allocation level. Both absolute and percentage values are shown.

      • When you click either tile’s title, a custom table appears with the associated resource allocation (CPU or memory) by queue.

    • (Technical Preview) For cloud (HDaaS) multi-cluster realms, adds the Cloud Consumption Report. It shows Core-Hours and GB-Hours usage data for all the clusters, which enables you to track usage throughout a billing period, and to learn which clusters are the heaviest resource users.

    • (Kubernetes clusters) Adds queues data for Spark on Kubernetes applications.

      • In the App Details page, adds the queue name to the profile information in the page header.

      • The Queues Overview in the dashboard is now applicable for both YARN and Kubernetes clusters; for details, see Queues Overview.

      • You can assign queues as labels (abstractions for attributes of the applications that you run); for details, see Configure Labels (or corresponding page for Pepperdata Supervisor versions newer than v6.5).

  • Supervisor v6.5.x releases

    New Supervisor maintenance releases (v6.5.22, v6.5.23) provide significant performance enhancements, recommendations for Spark streaming applications in YARN clusters, and (for Spark on Kubernetes) easier installation. For details, see Pepperdata Release Notes: v6.5.22 and Pepperdata Release Notes: v6.5.23.

  • Support and Troubleshooting

    (Kubernetes clusters) Adds Hadoop version and label configuration data to the Debug report. To generate the report, which you should include with any support requests, use the dashboard’s “top-nav” navigation’s Help icon (), and click Download Debug Report.

September 24, 2021

  • Platform Spotlight

    Adds Role Based Access Control (RBAC) for the maximum number of alarms and the maximum number of custom dashboards that a user can create per realm. For details about which resources can be granted or restricted and how the content of the Pepperdata dashboard display changes based on whether or not a user is a restricted user—a user who have access to only a subset of resources—see Managing Users and Roles with RBAC (or corresponding page for Pepperdata Supervisor versions newer than v6.5).

  • Spark on Kubernetes Dashboard

    When you navigate to the Chargeback Report from a cost tile—Top CPU-Requesting Apps or Top Memory-Requesting Apps—the Chargeback Report now includes hyperlinked application Ids. To show the App Details for an application, click its Id link.

  • Streaming Spotlight

    In the Broker Details and Topic Details pages, moves the tables of Broker Topics and Topic Brokers, respectively, from their original location in the Details page to their own page that is accessed from a new top-level tab. This change greatly decreases the load time for the broker’s and topic’s metrics data, and enables better side-by-side viewing (via separate browser tabs or windows) of all a broker’s and topic’s information.

    Screenshot of Broker Topics top-level tab Screenshot of Topic Brokers top-level tab
  • Documentation

    Adds detailed references for the Pepperdata recommendations in Application Spotlight—MapReduce, Spark, and Tez apps—and the Impala recommendations in Query Spotlight.

    The entry page is Recommendations in Pepperdata, which is included in the Quick Links tile on the home page.Screenshot of the Quick Links tile showing the link to the Recommendations page

  • Supervisor v6.5.19–6.5.21 Releases

    Several patch releases contain minor enhancements and bug fixes. For details and links to related documentation, see the Pepperdata Release Notes.

August 27, 2021

  • Platform Support

    Qualifies/adds support for the following platforms:

    • Google Dataproc 2.0.11-debian10 (Debian 10, Hadoop 3.2, Spark 3.1) on Supervisor v6.2.38 and later.

    • (Query Spotlight) Adds Query Spotlight to Pepperdata v6.5.10 and later on EMR and Dataproc. No configuration is needed; support is enabled by default when you install Pepperdata in supported versions of EMR and Dataproc clusters (see Pepperdata-Platform Support).

    • (Spark on Kubernetes) Adds support for HPE Ezmeral Container Platform 5.2.x to Application Spotlight and Platform Spotlight, for Supervisor v6.5.15 and later.

  • Platform Spotlight

    Adds daily Analysis Reports to Role Based Access Control (RBAC).

    For details about which resources can be granted or restricted, how the combination of access settings for cluster users and queues affects what a Pepperdata user can see (has access to), and how the content of the Pepperdata dashboard display changes based on whether or not a user is a restricted user—a user who have access to only a subset of resources—see Managing Users and Roles with RBAC (or corresponding page for Pepperdata Supervisor versions newer than v6.5).

  • Dashboard

    • Adds waste data to the Applications Overview table. The following new columns are omitted by default; to show them, select the table’s Show/Hide Columns control (), and select the waste columns that you want to see.

      • CPU Wasted
      • Memory Wasted
    • Adds an app’s recommendations to its App History page.

    • (YARN clusters) Enables customizing the threshold size for what’s considered a small-files apps. The default size remains 1 MB. To customize the size to some other value for your realm, contact Pepperdata Support.

    • Adds a feature for emailing any analysis report to anyone, on-demand.

      From the left-nav, select Reports > Analysis Reports; in the row for the report of interest, click Send Email (mail_outline); enter the email address(es), a subject, and optional message; and click Send.

July 30, 2021

  • Spark on Kubernetes

    • Qualifies Kubernetes 1.18, 1.19, 1.20 on Amazon Elastic Kubernetes Service (EKS).

    • (Application Spotlight) Adds a message to the Applications Overview page if the Pepperdata dashboard is unable to fetch data from the cluster due to incomplete or incorrect instrumentation, and provides a link to the installation documentation.

    • (Application Spotlight) Adds the Application Status tile to the Applications section of the Cluster View in the Pepperdata dashboard. (This tile was already available for YARN clusters.)

      Screenshot of the Application Status tile in a Kubernetes cluster

    • (Application Spotlight) Adds the Allocation Charge—the application’s cost—to the header section of the App Details page. (This information was already available for YARN clusters.)

      Screenshot of the Allocation Charge details in an App Details page for a Spark on Kubernetes app

  • Capacity Optimizer

    • (Technical Preview) Adds the Capacity Optimizer tile to the Cluster Health section of the Pepperdata dashboard home page, and adds the detailed Capacity Optimizer Overview page to the dashboard, to provide an at-a-glance view of the resources saved (absolute values and percentage uplift), charted savings over time, and peak core and memory added.

      Users whose access policies grant them access to all users and queues can view the page via the new left-nav menu item for Capacity Optimizer.

      For details, see Capacity Optimizer Overview in the Pepperdata Dashboard (or corresponding page for Pepperdata Supervisor versions newer than v6.5).

    • Adds display of the Capacity Optimizer configuration values to the Pepperdata dashboard. To see the current configuration for Capacity Optimizer, use the metrics charts, and navigate to the Capacity Optimizer Configs metrics group.

  • Supervisor v6.5.x releases. New Supervisor maintenance releases (v6.5.9, v6.5.10, and v6.5.13) provide significant enhancements to the HDFS Data Temperature Report (and change the report generation from daily to only after the FsImage file is successfully read), as well as general, minor bug fixes. For details and links to related documentation, see Pepperdata Release Notes: v6.5.9, Pepperdata Release Notes: v6.5.10, and Pepperdata Release Notes: v6.5.13.

  • Dashboard

    • Improves the page-level Time Range filter so that the time zone selector and the From/To option interact in a more intuitive manner.

      Now when you select the TZO (time zone), the From and To times are automatically updated to the new time zone (so that their UTC times remain constant). You can of course use the From/To calendar pickers to customize the times as you want. The displayed times will always match the timezone shown in the Time Range filter. For more information, see Page-Level Time Range Filter.

    • (Custom Dashboards) Adds a New Dashboard option to the left-nav Dashboards menu to enable restricted users—users who do not have access to other users’ custom dashboards—to create their own custom dashboards.

  • CSD release for Pepperdata Supervisor. The new CSD v3.0.3 release Fixes a dependency bug that prevented HDFS Tiering operations from running on HDFS NameNodes.

  • Documentation. For Kubernetes environments, adds information about Upgrading and Uninstalling Pepperdata; see Installing or Upgrading Pepperdata (Kubernetes).

June 25, 2021

  • Capacity Optimizer. In the Cluster Health section of the Pepperdata dashboard, replaces the Pepperdata Task Boost tile with the Pepperdata Container Boost tile.

    Screenshot of the Container Boost tile

    For more information about how Capacity Optimizer boosts your resources, see Quantifying the Benefits of Capacity Optimizer.

  • Application Spotlight. The Comparing Apps feature is out of Technical Preview status, and released for general availability (GA). For details about this feature, see Comparing Apps.

  • Pepperdata Dashboard and Documentation. Improves dashboard and helpsite security by adding the HTTP Strict-Transport-Security (HSTS) header.

  • Supervisor v6.5.x releases

    A new Supervisor minor release (v6.5.5) and subsequent maintenance release (v6.5.8) include significant enhancements to existing products, and roll up the Supervisor v6.4 patches’ (“dot-releases”) enhancements and bug fixes.

    Highlights include:

    • Platform Spotlight
      • Adds support for Kubernetes on Amazon Elastic Kubernetes (Amazon EKS).
      • Adds collection of instance type for cloud (HDaaS) clusters—EMR, Dataproc, and Qubole.
    • Parcel Installations of Pepperdata
      • Changes the default user for Pepperdata from root to pepperdata.
      • Decouples (separates) the release packages for the Pepperdata Supervisor and the Parcel CSDs.
      • Adds native-CSD options for configuring the HDFS Data Temperature Report and Streaming Spotlight.
    • Application Spotlight. Adds collection/display of JVM Native Memory metrics: total reserved, total committed, heap reserved, heap committed, and so on.

    • Capacity Optimizer. Adds support for the custom automatic scaling policy in EMR.

    For details and links to related documentation, see Pepperdata Release Notes: v6.5.5 and Pepperdata Release Notes: v6.5.8.

May 28, 2021

  • Capacity Optimizer. (EMR/Dataproc) Supervisor release v6.4.16 fixes an issue that caused the ResourceManager to report an incorrectly high value for total memory, which could negatively affect autoscaling functionality.

  • Platform Spotlight

    • Adds custom dashboards to Role Based Access Control (RBAC). For details about which resources can be granted or restricted, how the combination of access settings for cluster users and queues affects what a Pepperdata user can see (has access to), and how the content of the Pepperdata dashboard display changes based on whether or not a user is a restricted user—a user who have access to only a subset of resources—see Managing Users and Roles with RBAC (or corresponding page for Pepperdata Supervisor versions newer than v6.5).

    • (EMR 6.1 and later) Supervisor release v6.4.16 fixes an issue that prevented the JobHistory Monitor from starting, which in turn caused PepAgent to repeatedly restart.

    • (HBase 2.x) Adds HBase Master and RegionServer metrics. To see these metrics in the Pepperdata dashboard, navigate to Charts page in the dashboard, and search for HBase Master and HBase RegionServer.

    • (HBase 2.x) Supervisor release v6.4.17 adds support for independent authentication settings (Kerberos or not) for HBase and the remaining services in the cluster. This means that a cluster can be Kerberized for other services, but not for the HBase Web UI, or vice versa (Kerberized HBase Web UI but nothing else Kerberized in the cluster).

  • Streaming Spotlight

    • Supervisor release v6.4.17 adds SSL client authentication support for Kafka Admin monitoring, which is performed by the host that you choose and configure as your-kafka-admin-host when you enable Kafka monitoring (see Configure Streaming Spotlight (or corresponding page for Pepperdata Supervisor versions newer than v6.4))

    • Supervisor release v6.4.17 enhances security by using encryption for all Kafka-related passwords that are specified in the Pepperdata configuration (see Configure Streaming Spotlight (or corresponding page for Pepperdata Supervisor versions newer than v6.4)).

  • Documentation

    Reorganizes the Downloads page to be by distro, which makes it much easier to find the installation packages that you need.

April 30, 2021

  • Capacity Optimizer

    Improves the Pepperdata Task Boost tile in the Cluster Health section of the dashboard. Instead of just a number of tasks, the tile shows the peak throughput increase of the last 24 hours, as well as a chart of the throughput increase over time for the last 24 hours.

    Screenshot of the improved Task Boost tile

  • Dashboard

  • Application Spotlight

    Adds an I/O chart to the Resource Usage tab of an application’s App Details page. The chart includes HDFS Read/Write and Shuffle Read rates, as well as (local) I/O Read and (local) I/O Write metrics that were previously in the Basic Metrics chart.

    As for all charts, you can show a popup of the series values by mousing over any data point.

    Screenshot of the I/O chart in the Resource Usage tab of an App Details page

  • Platform Spotlight

    Adds more granular alarms options to Role Based Access Control (RBAC). For details about RBAC, see Managing Users and Roles with RBAC (or corresponding page for Pepperdata Supervisor versions newer than v6.4).

    Screenshot of the Alarms section in the RBAC Create New Policy wizard

  • Query Spotlight

    • Adds query data processing stats—file bytes and HDFS bytes read and written—to the Hive tab of a query’s Query Details page. Also included in this section is data for rows read/written, which was previously in the Tables Accessed card (in the lower-right corner of the Hive tab).

      Screenshot of the Hive tab in a Query Details page

    • (Technical Preview) Query Profiler for Impala recommendations. A new tile in the Recommendations section of the Pepperdata Dashboard shows how many recommendations, and their severity levels, were made during the last 24 hours.

      • To view the tile, scroll to the Recommendations section. The new tile is located after any available application recommendation tiles.

        Screenshot of the Impala Recommendations tile

      • To see a table of all the queries that received recommendations at a given severity level, click the associated severity text; for example, “Critical: 455”.

      • Or to navigate from the tile to the Query Profiler report, click the tile’s title.

      The Query Profiler report shows counts of all the recommendations that fired for all the profiled Impala queries over the last 24 hours. (If a recommendation was not activated for a given heuristic, that heuristic is omitted from the report.)

      You can navigate to the Query Profiler report from any page by using the “left-nav” menu: Query Spotlight > Query Profiler.

      Query Profiler will remain in Technical Preview while we continue to improve its performance.

March 26, 2021

  • Pepperdata Cloud for Amazon EMR

    The new Pepperdata Cloud for Amazon EMR provides much easier and faster installations than the previously manual process of uploading templates, configuring Pepperdata components, and managing access via Identity and Access Management (IAM) roles.

  • Support and Troubleshooting

    If you have any issues with Pepperdata operation, the new Debug report option lets you easily collect general information such as your account name, permissions, versions of the dashboard and Supervisor, and browser console messages. Before you contact Pepperdata Support, generate a new report—the assigned filename is the current Unix time—and attach it to your support ticket.

    In the dashboard’s “top-nav” navigation, click the dashboard’s Help icon (), and click Download Debug Report.

    Screenshot of the Download Debug Report control

  • Streaming Spotlight

    Adds Resource Usage charts to the Broker Details page.

    Screenshot of the Resource Usage charts in the Broker Details page

  • Platform Spotlight

    Adds support for Role Based Access Control (RBAC) for alarms.

  • Documentation

    Adds a Capacity Optimizer tile to the home/index pages.

    Screenshot of the Capacity Optimizer tile

February 26, 2021

  • Streaming Spotlight

    • Supervisor v6.4.11 adds password authentication support for Kafka server JMX connections. For details, see Configure Streaming Spotlight (or corresponding page for Pepperdata Supervisor versions newer than v6.4).

    • Adds a Config tab to the Broker Details and Topic Details pages in the dashboard. Values are collected at 6-hour intervals for configuration keys with per-broker or per-topic override capability, respectively.

  • Application Spotlight

    • Supervisor v6.4.11 adds SQL support for Spark 2.0. When you display the App Details page for a Spark 2.0 job that includes SQL statements, you can see the first 100 SQL statements/queries that are encountered in the code, as well as the Spark-generated query plans for those statements/queries.
  • Dashboard

    • (Technical Preview) Greatly improves the Comparing Apps page, which is accessed from the App History tab of an application’s App Details page. More data is compared, such as Key Performance Indicators, recommendations, and configuration differences, which can highlight why an app performed better in one run vs. another.

    • Adds a regex-like filter for application names (/app-name/) and a filter for not a given application name (!app-name). This is particularly useful for creating alarms for all future runs of a given application or all future runs for applications other than a given application. To create such alarms, use the filters in the Charts page, and from the resulting charts, create alarms as usual.

    • In Resource Usage charts, changes Other Memory to Native Memory. This change affects the charts in the App Details page and, for Hive apps, the Query Details page.

January 29, 2021

  • Capacity Optimizer. Adds support for autoscaling optimization for EMR environments. For detailed configuration steps, see Configure Autoscaling Optimization: EMR (or corresponding page for Pepperdata Supervisor versions newer than v6.4).

  • Query Spotlight. Adds Kerberos support for fetching data from the Hive metastore.

  • Application Spotlight. Adds anonymization for sensitive data that is collected from the application’s history server.

  • Dashboard. Adds a user-configurable, persistent time zone setting.

  • Supervisor v6.4.x releases. A new Supervisor release (and subsequent maintenance release) includes significant enhancements to existing products, and rolls up the Supervisor v6.3 patches’ (“dot-releases”) enhancements and bug fixes.

    For details and links to related documentation, see Pepperdata Release Notes: v6.4.6 and Pepperdata Release Notes: v6.4.10.

November 20, 2020

We’re on the cusp of releasing Pepperdata v6.4, so there are just a few updates this month for Pepperdata v6.3.

  • Platform Spotlight

    • Adds support for RHEL 8.2/CentOS-8.2.x and RHEL 7.8/CentOS-7.8/OL-7.8 in all patch/”.dot” releases of Pepperdata v6.3 and Pepperdata v6.2.

    • (Requires Supervisor v6.3.17 or later) For ephemeral clusters in cloud environments, adds capability to filter the Overview pages, charts, and tables in the Pepperdata dashboard by Ephemeral Cluster ID and/or Ephemeral Cluster Name.

  • Streaming Spotlight

  • Documentation

October 30, 2020

This month sees many product enhancements.

  • Documentation

    Adds copy-to-clipboard functionality for code snippets. Particularly for long, scrolling lines of code or many-line snippets with critical whitespace requirements, this reduces copy-paste errors.

    Screenshot of a code snippet with the copy-to-clipboard icon

  • Capacity Optimizer

    • (Requires Supervisor v6.3.13 or later) Adds support for autoscaling optimization for Qubole-managed clusters. For the detailed configuration steps, see Configure Autoscaling Optimization: Qubole (or corresponding page for Pepperdata Supervisor versions newer than v6.3).

    • (Requires Supervisor v6.3.15 or later) For autoscaling optimization in Cloud clusters, adds consideration of a cluster’s configured max instances value when determining whether it’s appropriate to trigger autoscaling. This avoids over-allocating a cluster when the cluster is configured at its max instance level.

  • Application Spotlight

    • Adds a multi-select checkbox form for filtering the applications by recommendations. Instead of having to select a minimum severity, where the filtered results included the selection and every worse level, you can now select exactly the levels you want. Screenshot of the filter selections for recommendations

    • Adds Peak Cores data to the Applications Overview.

      By default, this data is not shown. To add this data to the table, navigate to the Applications Overview page in the dashboard, scroll to the Applications table, and in the table’s upper-right corner, click the Table Columns control (). Select the Peak Cores column name, optionally drag-and-drop the column names to re-order them, and click Apply.

    • Adds the following recommendations for Resilient Distributed Datasets (RDDs) in Spark applications:

      • Too many recomputations for an RDD. To speed up your app, we recommend caching RDDs that are recomputed more than once.

      • Unecessary caching of an RDD. To reduce storage memory waste, we recommend that applications not cache RDDs that are never recomputed.

  • Query Spotlight

    • Adds support for Role Based Access Control (RBAC) for databases.

    • (Requires Supervisor v6.3.16 or later) Adds Hive 3 support (and by extension, CDP Private Cloud Base 7.0.x and 7.1.x, which use Hive 3) to Query Spotlight. For supported combinations of Hive version and platform distro, see Query Spotlight in Pepperdata-Platform Support.

  • Platform Spotlight

    • Adds Java 11 support for all Hadoop distros that themselves support Java 11. For details, see JDK Support by Pepperdata Version in the System Requirements.

    • Adds search capability for users, for user management. This makes it easier to find a given user, especially when there are so many users that you previously had to look through many entries.

      For more information, see Managing Users and Roles with RBAC (or corresponding page for Pepperdata Supervisor versions newer than v6.3).

    • (Beta) Multi-cluster view is added to the dashboard to show consolidated data for all your clusters: overall health, application statistics, recommendations’ severities, and more. To obtain access to this beta feature, contact Pepperdata Support. Screenshot of the dashboard's multi-cluster view

September 25, 2020

This month sees significant enhancements to the Pepperdata Supervisor, Streaming Spotlight, and the documentation (this helpsite), as well as dashboard updates to support new Supervisor and Query Spotlight features.

  • Supervisor v6.3 adds:

    • Capacity Optimizer support for autoscaling optimization in Google Dataproc environments, and the following associated autoscale metrics in the dashboard’s charts page:

      • Scale up factor
      • Majority of nodes percent for action
      • Primary max instance count
      • Secondary max instance count
      • Max primary instances allowed
      • Max secondary instances allowed
      • Autoscale reason
      • Average cluster pending memory in MB
      • Average current allocation in MB
      • Number of nodes in the cluster
      • Number participating nodes
      • Number of fully utilized nodes
      • Primary nodes ratio
      • Current primary max instance count
      • Current secondary max instance count
      • Desired max primary instances
      • Desired max secondary instances
    • Support for Spark 3 to Application Spotlight.

    • Option for Parcel installations to create the default Pepperdata user as a non-root user.

    • HDFS Data Temperature report.

    • Support for monitoring Spark 3 jobs, which require Java 11, on a YARN 2 (Hadoop 2) cluster.

    • Support for monitoring container-launched Spark jobs where the Spark driver and the PepAgent are on different hosts.

    • All the enhancements and bug fixes from the v6.2.x maintenance (“dot”) releases.

    For details and links to related documentation, see the Pepperdata Release Notes.

  • Maintenance releases for Supervisor v6.2 add:

    • Support for monitoring container-launched Spark jobs where the Spark driver and the PepAgent are on different hosts

    • (YARN 3) Support for more than two ResourceManager HA hosts in the auxiliary pre-check utility, YarnHttpAccessTester. (This was already supported for YARN 2)

    • Support for running a different version of Hadoop filesystem client code from the cluster’s Hadoop version. For example, you can run Hadoop 3.x filesystem client code on Hadoop 2.x clusters. Likewise, you can run Hadoop 2.x filesystem client code on Hadoop 3.x clusters.

  • Query Spotlight adds support for Impala queries and adds associated Pepperdata recommendations.

  • Streaming Spotlight adds:

    • Kafka consumer group metrics and support for Simple Authentication and Security Layer (SASL)-secured Kafka clusters.

    • Topic filtering in the dashboard’s Topics Overview page.

  • Documentation updates:

    • Improved procedure flow for Parcel-based installations and upgrades, to avoid repeated restarts of management nodes.

    • Improved procedure flow for Cloud-based installations, to ensure that all configuration is completed before bootstrapping a new cluster.

    • Instead of a single Upgrade Guide, there are separate upgrade guides for each installation framework (Parcel, Cloud, and RPM/DEB).

  • Dashboard updates:

    • Adds a new Application Status tile to the home page of the dashboard: Screenshot of Application Status tile, including the mouse over tooltip

    • For chargeback reports:

      • Now supports up to four (4) decimal places for entering the cost values for chargeback reports.
      • Costs are now calculated based on allocated resources instead of used resources.
    • For Streaming Spotlight:

      • Adds topic lags by consumer groups.
      • Supports Role Based Access Control (RBAC) for topics.
    • For Query Spotlight:

      • For Hive queries, displays the queue name and the waiting in queue time.
      • Adds a memory usage chart to the Hive Query Details.

August 6, 2020

This month sees important updates to the Pepperdata Supervisor and the documentation (this helpsite), as well as usability enhancements to the dashboard.

  • Maintenance releases for the Pepperdata Supervisor add:

    • (Beta) Spark recommendations for excessive executor memory wasted, excessive driver memory wasted, and GC (garbage collection) duration too high. As beta recommendations, they are still being tuned based on internal testing and user feedback. Be cautious when applying this recommendation because the result might not be optimal. For questions or feedback, contact Pepperdata Support.

    • Capacity Optimizer support for honoring YARN 2 queue preemption settings.

    • Support for running a different version of Hadoop filesystem client code from the cluster’s Hadoop version. For example, you can run Hadoop 3.x filesystem client code on Hadoop 2.x clusters. Likewise, you can run Hadoop 2.x filesystem client code on Hadoop 3.x clusters.

  • Instead of a single Installation Guide, there are separate installation guides for each installation framework:

  • Usability enhancements to the Pepperdata dashboard include:

    • A Clear all metrics button in the metrics selector.
    • A link to the Alarms in the “left-nav” menu.
    • Improved legend display for metrics charts.

June 29, 2020

This month sees significant enhancements to Streaming Spotlight, updates to the Pepperdata dashboard and documentation, and a new Supervisor .dot (patch) release.

  • Streaming Spotlight

    • Added three predefined alarms for Kafka:

      • Kafka Offline Partitions—fires when the offline partitions count > 0.
      • Too Many Kafka Broker Active Controllers—fires when the active controller count > 1.
      • Too Few Kafka Broker Active Controllers—fires when the active controller count < 1.
    • Refined the titles and headers for tiles and tables to more clearly identify the insights derived from the metrics data.

    • Added the following metrics to the Brokers Overview table:

      • Purgatory size for producer and consumer
      • Failed request rates (requests/second) for producer and consumer
      • Request latency (milliseconds) for producer and consumer
      • Leader election rate (leader election happens when the current leader of a partition becomes unavailable)
      • Unclean leader election rate (when there is no in-sync replica of a failed partition)
    • In the Broker Details page:

      • Added a Requests time series chart to show request latency, purgatory size, and failed requests for producers and consumers.

      • In the Summary, added a hyperlink to the broker host. When you click the host name, the corresponding Hosts Overview page appears.

    • In the Topics Overview table, added Failed request rates (requests/second) for producer and consumer.

    • In the Topic Details page, added a Requests time series chart to show failed requests for producers and consumers.

  • Supervisor:

    • Added v6.2.24: For YARN 2.9/3, do not require a keytab file for a Kerberized NodeManager.

    • Pepperdata support for RHEL 7.7, 7.6 is newly qualified.

  • Dashboard:

    • Fixed a bug in the calculation of the CPU-bound bottleneck. The corrected values will typically be much larger than before, especially on larger clusters.

    • Polished up the user interface with some more modern icons for accessing many of the dashboard pages.

    • You can now export a custom dashboard to another cluster. From any dashboard view, click New Dashboard as usual, and select Export Dashboard to Another Cluster.

    • Throughout the user interface, changed all mentions of dynamic allocation to capacity optimizer to clarify that the applicable metrics, charts, and reports are for Pepperdata Capacity Optimizer. And related, added a Waste Calculation group for waste-related metrics that were previously shown in the Capacity Optimizer group.

  • Documentation:

    • To make it easier to which versions of Pepperdata support your environment, moved the system requirements from the version-specific installation instructions into a consolidated, non-versioned System Requirements page.

    • Add parcels steps for setting up a Pepperdata proxy; see Set Up a Pepperdata Proxy (or corresponding page for Pepperdata Supervisor versions newer than v6.2).

    • In the EOM Schedule for Pepperdata Supervisor Releases, reorganized the tables to make it easier to see if your version of Pepperdata is still covered by the maintenance policy.

    • Added Cloudera Parcel steps to configure and enable the collection of Apache® Impala query metrics; see Adding Apache® Impala Query Metrics (or corresponding page for Pepperdata Supervisor versions newer than v6.2).

May 27, 2020

  • There’s a new Supervisor release, with significant new products and features. And there are also new .dot releases for smaller enhancements and bug fixes.

    • New Supervisor release v6.2.20—the first public release for Supervisor v6.2.x—provides GA (general availability) for Streaming Spotlight, Query Spotlight, and Pepperdata in Hadoop-as-a-Service (cloud) environment. For details and links to Quick Start and configuration information, see v6.2.20 (2020/05/12) in the Pepperdata Release Notes.

    • New Supervisor .dot releases for v6.2.x and v6.1.x add support for JKS certificates for PepAgent ports, for monitoring apps that change queues after they’re launched, and for ensuring that the correct Pepperdata package has been installed for the version of Hadoop that’s installed. Several edge case bugs are also fixed. For details, see the Pepperdata Release Notes.

  • Documentation is added for the Supervisor v6.2 release. In particular, note the following:

  • Dashboard:

    • In the Workflows Overview page in Application Spotlight, workflow Ids are now active links. When you click the link, the result is the Applications Overview page that is filtered by the selected workflow Id.

      Screenshot of Workflows Overview table with active links for workflow Ids

    • Support added for localized currency codes, where the numbers use the Western Arabic numerical system (1, 2, 3); are formatted left-to-right, with left being the largest units; and that have 1/100 minor units (such as the 100 cents in USD/US Dollars). To configure this for your cluster, contact Pepperdata Support.

    • You can choose how waste is displayed: currency–monetary cost, in your cluster’s units of currency—or resources/hour. To make your selection, which is persisted from session-to-session, and is unique to your user account, click the gear icon (), expand the Waste Cost menu item, and select Currency or Resources/Hour.

      Screenshot of settings menu showing the Waste Cost option expanded

April 29, 2020

  • The Dashboard has received lots of behind-the-scenes work for reliability and performance improvements, more granular error messages, and an upcoming major release.

  • New Supervisor .dot releases provide a two-phase approach for fetching app/job history and a connection timeout parameter for REST fetches from the Spark History Server. For details, see the Pepperdata Release Notes.

  • In the helpsite documentation, Parcel for Cloudera steps are added for many procedures.

March 31, 2020

  • Dashboard updates are largely performance-related and fine-tuning the look-and-feel for newer pages and reports. Feature additions include:

    • Ability to save a charts page as a custom dashboard. (In the upper-right corner of the Charts page, click click Save As Dashboard.)
    • Easier creation of bottleneck alarms, directly in the Bottlenecks tab.
    • Clearer status messages when data is not available for display.
  • Several new supervisor releases provide enhancements and bug fixes. For details, see the Pepperdata Release Notes.

    • (Technical Preview) Support added to Application Profiler for Tez applications in Hortonworks Data Platform (HDP®) 3.x. To enable this support, contact Pepperdata Support.

    • Fixes provided for several edge case bugs.

    • A new utility script, encrypt_password.sh, enables encrypting the password used to secure (via SSL) the ports that Pepperdata uses for listening (port 50510 for the PepperdataSupervisor, and port 50505 for PepAgents).

    • Security is enhanced by masking passwords and other sensitive information in Pepperdata logs.

  • In the helpsite documentation, added Parcels steps for securing PepAgent Ports (50510/50505), changing PD_LOG_DIR log file directory and other log retention and disk usage policies settings for data that is stored on customer hosts, and configuring Capacity Optimizer. In the applicable procedures, click the Parcel for Cloudera tab.

February 27, 2020

  • Query Spotlight is launched!

    The Query Spotlight overviews—Queries and Databases—show time-correlated query views, database and infrastructure metrics, query run summaries, tables accessed, and maps of query plans and their execution stages. This data enables you to quickly perform root cause analysis and obtain detailed visibility into query workloads, including delayed and most expensive queries and top CPU-wasting queries.

    For documentation, see the Query Spotlight User’s Guide, which appears in a new Query Spotlight tile on the helpsite home page and the home (index) pages for older versions.

    Screenshot of helpsite Home page with Query Spotlight tile

  • Supervisor v6.1.2 includes an important enhancement: surfacing stack trace data for non-zero exist status from all containers (not just Spark executors).

    To see the stack trace data, navigate to the App Details for the associated application, click the Errors tab, and click Stack Trace.

    Screenshot of Errors tab for non-zero exit status from containers

  • Dashboard enhancements:

    • You can now override the default email address for alarm alert notifications for predefined alarms (not just for custom cluster alarms). Navigate to the Alarms page, and click the alarm’s edit (edit) icon.

    • When viewing charts in custom dashboards, you can now zoom in to any given time by clicking-and-dragging an area on the chart just as you can for other charts.

    • To make it easier to find data in tables that are many pages long, you can now enter a specific page number.

    • You can now create alarms in the App History tab of recent runs in an application’s App Detail page, as well as in that application’s App History table (page) that lists all the runs from the last 30 days.

    • To make it easier to implement Pepperdata recommendations, the proposed values now appear in the Config tab of an application’s App Detail page, in addition to appearing in the Recommendations tab, as they always have.

    • Performance improvements for loading metrics data.

January 22, 2020

This month’s updates include a new Pepperdata Supervisor version that adds support to the JobHistory Monitor for Spark History Server 302 redirects; and documentation updates to improve search results, improve navigation of the Downloads page, and to add the 2020 dates to the Pepperdata Holidays schedule.

December 20, 2019

Another busy month of improvements to everything Pepperdata.

  • Maintenance releases for Pepperdata Supervisor add support for Hadoop 2.9 on EMR, support for Capacity Optimizer for cluster hosts that are decommissioned in the cloud, and fix an edge case bug for HiveServer2.

  • The dashboard has the following updates:

    • Numerous performance improvements, especially for faster chart and table loading.

    • Changed the filter bar labels to more obviously differentiate Breakdown By scenarios—metrics charts and custom tables, where you can change which breakdowns to show—from Filter By scenarios—Spotlight overviews with preconfigured breakdowns.

  • The helpsite has the following documentation updates:

November 25, 2019

This month sees important updates to everything Pepperdata.

  • Maintenance releases for the Pepperdata Supervisor collect many more ResourceManager metrics, add SSL support for Pepperdata near real-time monitoring views via Web servlets (see Monitoring Pepperdata for your Supervisor version), add support for Hadoop 2.9, and fix a few edge case bugs. For details, see the Pepperdata Release Notes.

  • The dashboard has the following updates:

    • Several performance improvements have greatly decreased loading times for many dashboard pages, especially the overviews.

    • To enhance security, removed the option to Sign in with Google. This prevents users from creating personal Google accounts to access the Pepperdata dashboard for their employer’s clusters, and then continuing to use such accounts after their work access is terminated.

    • In the Alarms & Alerts page, https://dashboard.pepperdata.com/{cluster-name}/alarms, the Cluster Alarms display is now paginated, and you search for alarms by entering text in the Look Up Alarm box to match to any of the alarm column values.

    • The Spark Details Page features improved highlighting to help you correlate charts and tables data.

    • You can now directly edit your alarms—alarms that you’ve created—in the Alarms tab of an application’s App Details page instead of having to navigate to the Alarms & Alerts page.

    • You can now apply a time range to filter the Hosts Upload Status report, https://dashboard.pepperdata.com/{cluster-name}/reports/hostsuploadstatus.

  • The helpsite has the following documentation updates:

    • Managing Users and Roles with RBAC fully explains Role Based Access Control (RBAC), which replaces the previous Account Admin function for user management.

    • Added the EOM Schedule for Pepperdata Supervisor Releases to clearly state the currently-supported releases and their scheduled EOM dates.

October 25, 2019

This quarter’s changes are BIG: new Supervisor release, new dashboard rollout, and new helpsite rollout.

  • The Supervisor v6.1.0 release adds Hadoop 3.x support for Pepperdata Capacity Optimizer.

  • The new dashboard is now live!

    • The URL remains the same (https://dashboard.pepperdata.com/), and you should use the same login credentials you used before.

    • All the original functionality is now available in the new dashboard, but with an easier workflow, greater emphasis on usability and quick navigation, and a refreshed, modern look-and-feel.

    • Filters let you search for specific apps, users, and so on in a much more granular fashion than before.

    • For a high-level introduction, see Pepperdata Technical Quick Start Guide (Adobe PDF file icon)

  • The new helpsite is now live!

    • The URL remains the same (https://help.pepperdata.com), and you should use the same login credentials you used before.

    • Redesigned home pages for each Pepperdata Supervisor version make it easier to find the documents you need, for the product you’re using.

    • The look-and-feel is refreshed to coordinate with the new dashboard.

    • With the new Search technology, you can find the terms you’re looking for much more reliably: auto-complete helps you find the right term, and you can search any version of the docs (not just the latest version).

    • A version-switcher on versioned pages—pages that apply to a specific Supervisor version instead of dashboard-focused pages that are applicable to all Pepperdata Supervisor versions—lets you easily navigate back and forth between versions. This is particularly helpful during and immediately after upgrades.

    • And there’s important new and revised content: Technical Support Handbook, Pepperdata Maintenance Policy, and Enable SAML-SSO for User Authentication.

July 15, 2019

This month’s changes are broad, encompassing the Pepperdata Supervisor, the “next” Dashboard, and the “next” Helpsite.

  • The big new news is our new Pepperdata Supervisor v6.0.x release!

    It adds support for Hadoop 3.x and distros built on Hadoop 3.x, introduces Universal Packages that make it simpler to choose the right one to install, adjusts default configuration settings for a better operating environment, and rolls up the Pepperdata Supervisor v5.7 “dot-release” enhancements. For details, see Pepperdata Release Notes.

  • We’re continuously improving the Release Preview next-generation dashboard.

    • General improvements include adding more metrics to overview pages, reducing the load times for many pages, and adding color-coding to tables for outlier values.

    • To make it easier for you to correlate your system with items mentioned in this What’s New page and the Pepperdata Release Notes, you can determine your version of the next-dashboard and Pepperdata Supervisor by mousing over the gear icon in the upper-right corner of the next-dashboard page.

      Screenshot of the popup that shows Dashboard and Pepperdata Supervisor version info

  • The help docs (both the “old” and new, “next-help” helpsites) now include detailed snippets and information for Cloudera Parcels installations so you do not have to figure out the equivalent Parcels instructions for general steps that mention manual configuration files.

    In addition, the next Helpsite makes it even easier to follow the procedures. Just click the Cloudera Parcels Installations tab where ever you see it.

    In this example, from the installation procedure, the image on the left shows the default, RPM/DEB Installations tab and the instruction to click the tab for your distro and configuration manager. The image on the right shows how the display changes after you click the Cloudera Parcels Installations tab.

    Screenshot of default (non-parcel) tab Screenshot of selected parcel tab

June 12, 2019

We’re pleased to highlight recent improvements to the Pepperdata Supervisor, the “next” Dashboard, and the corresponding “next” Helpsite.

  • Pepperdata Supervisor v5.7.13–v5.7.18 (for details, see Pepperdata Release Notes):

    • Added a new configuration property for customizing how much disk time must be available—time when the disk is not busy performing seeks, rotations, or data transfer—in order for Capacity Optimizer to operate.

    • Added support for MapR 6.1. For detailed Hadoop distro support information, see Pepperdata-Platform Support.

    • Bug fixes and enhancements for certificate handling by the Pepperdata Collector’s HTTPS uploader.

  • The next-generation Dashboard is officially released (as a Release Preview)! Check it out at https://dashboard.pepperdata.com/. (Use the same user name and password that you’ve been using for the “regular” Pepperdata Dashboard.) We’re still adding features and polishing the displays. If you have any questions, contact Pepperdata Support.

  • The next-generation Helpsite is also ready! You can find it at https://next-help.pepperdata.com. (Use the same user name and password that you’ve been using for the “regular” Pepperdata Dashboard and Helpsite.)

    Highlights include:

    • Redesigned home pages for each Pepperdata Supervisor version, to make it easier to find the documents you need, and to correspond to the new Dashboard’s look-and-feel.

    • Vastly improved Search utility. You can find the terms you’re looking for much more reliably, auto-complete helps you find the right term, and you can search any version of the docs (not just the latest version).

    • A version-switcher on versioned pages—pages that apply to a specific Supervisor version instead of Dashboard-focused pages that are applicable to all Pepperdata Supervisor versions—lets you easily navigate back and forth between versions. This is particularly helpful during and immediately after upgrades.

April 5, 2019

We’ve recently released a new Supervisor version, made some updates to the Dashboard, and are working on refreshing the Documentation Helpsite to align with the next generation Dashboard we’ve been working on.

  • Pepperdata Supervisor v5.7.12 collects the used resources metrics for Fair Scheduler and improves the accuracy of the queue duration metric. For details, see Pepperdata Release Notes.

  • The Dashboard has the following updates:

    • Analysis Reports provide detailed data about the types of applications running on your cluster; the amount of memory and CPU (and dollars) wasted, sorted by jobs and by users; and the memory uplift (and dollars) saved by using Capacity Optimizer.

      To access the Analysis Reports, use the dashboard navigation bar to navigate to Reports > Analysis Reports.

      Screenshot of Analysis Reports option on the navigation bar

      We are gradually rolling out the reports to our customer base. If you’d like to begin using them but the Dashboard says “No reports available”, contact Pepperdata Support.

    • If you are using encrypted data, you can now perform full-text searches for application names, although partial matches are unsupported. For example, if the application name is “hive”, searching for “hive” yields the expected result. But when encryption is applied to perform the search, the encrypted value for a partial search string will not match any portion of the encrypted value of the full string. Therefore “hiv” cannot be matched if the full name of the application is “hive”.

  • There are no significant documentation changes recently, but we’ve been developing a new look-and-feel for the Pepperdata Helpsite to correspond to the next generation Dashboard that’s in the works. We’ll be implementing new, vastly better search technology, adding a “version switcher” to make it easier to change between docs for different Supervisor versions, and improving the navigation. Here’s a sneak peak of the new Helpsite Home page.

    Screenshot of the sneak preview of the Helpsite 2.0 Home Page

March 1, 2019

This month’s updates to the Pepperdata Dashboard include:

  • Exposure of the cpu info group of metrics for charting.

    The metrics show how much computing power a host has, such as how many physical processors the host has, how many cores each processor has, the clock speed of each core, and so on. This data can help you answer environment-specific questions such as whether Hyper-Threading is turned on (in Intel SMT implementations), whether the cores are physical or virtual, or even who the vendor is for the host’s processors.

    To chart these metrics, use the dashboard navigation bar to navigate to Charts & Tables > One Metric per Row or Charts & Tables > Compact Layout, enter “cpu” into the search, scroll to the cpu info group, select it, and click Go.

    Screenshot of cpu info metrics group in the metrics search results

  • Performance improvements for application searching.

  • Bug fixes for several edge case display bugs.

February 8, 2019

This month we released several incremental updates to the Pepperdata Supervisor, associated updates to the Dashboard, and corresponding additions to the documentation:

  • Pepperdata Supervisor version v5.7.6 adds Tez support and monitoring to Application Profiler. (Regardless of whether you’ve previously enabled Application Profiler on your cluster, you must contact Pepperdata Support to enable Tez monitoring and recommendations.)

    Recommendations for monitored Tez applications appear on the Pepperdata Dashboard, in the Application Detail; see Recommendations Tab. Granular severity data for Mappers heuristics appears in the Application Profiler Report; see Application Profiler Report.

  • Pepperdata Supervisor versions v5.7.3–v5.7.5 provide performance improvements and small edge case bug fixes; for details, see Pepperdata Release Notes.

January 4, 2019

This month’s highlights are new documentation and Pepperdata Supervisor releases.

  • If you’re new to Pepperdata (or even if you’re an old hand but interested to see what else Pepperdata can do for you), check out the new Pepperdata Technical Quick Start Guide (Adobe PDF file icon).

  • Pepperdata Supervisor v5.7.2 added a new configuration property for customizing how much CPU must be available in order for Capacity Optimizer to operate, and added an ignore-match rule for custom program monitoring to provide more granular string matching rules. For information about all releases, see Pepperdata Release Notes.

December 5, 2018

Although our front-end team is largely working on the upcoming “Dashboard 2.0”, we’re still making enhancements to what you’re using today:

  • The Hosts Reporting Status report now shows up to 5,000 hosts instead of the previous limit of 20 hosts. To show the report, click the Home icon () in the navigation bar to display the Cluster Health Summary, and click the Nodes tile.

  • The dashboard now supports Service Provider (SP) initiated SSO (Single Sign-On) for organizations that have implemented SSO via SAML (Security Assertion Markup Language).

We’ve also released two updates in the last month:

  • Pepperdata Supervisor v5.6.15 adds collection of Spark metrics for Spark jobs that you deploy in client mode on hosts without Pepperdata services.

  • Pepperdata Supervisor v5.6.16 fixes a bug that caused some JobHistory Monitor fetches from HTTPS REST endpoints to fail.

For information about all releases, see Pepperdata Release Notes.

November 7, 2018

This month there are small changes to the Pepperdata dashboard, the Pepperdata Supervisor, and the documentation:

  • Improves the performance for alarms and queries that involve app/job Ids.

  • Enhances security of cookie management and auto-complete for logins.

  • Fixes an edge case issue with subquery filter matching for a device.

  • Replaces the job group breakdown for charts by the monitored domain breakdown; see Filter the Charts & Tables by Dimensions: Hosts, Users, Etc..

  • New Pepperdata Supervisor releases v5.6.13 and v5.6.14 add metrics collection for Impala queries in flight, for impalad daemons configured for either HTTP or HTTPS. See Pepperdata Release Notes.

  • Clarifies which browsers are supported by the Pepperdata dashboard; see Before You Begin.

  • Changes the URL for the Pepperdata User’s Guide (formerly called the Pepperdata Cluster Analyzer User’s Guide) from /cluster-analyzer to pd-ug. Redirects are in place, and any bookmarks you have will resolve to the correct page.

October 5, 2018

There are no significant dashboard changes this month. Our fabulous front-end team is busy developing the next generation UX for Pepperdata. Look for more information in the coming months, but here’s a sneak preview of the new dashboard Home page.

Screenshot of the sneak preview of the Dashboard 2.0 Home Page

September 5, 2018

This month’s highlights are focused on new documentation and updates for the recent Pepperdata Supervisor v5.6 release:

August 7, 2018

  • Previously, executors that were removed by Spark dynamic allocation were included in the failed executors count, and were counted as application errors. Now these executors are not included in the failed counts or as application errors in the App Detail, but appear in the Stage Summaries and Events section of the Spark Details Report as gray flags, not red. See Elements of the Spark Details Report.

  • Reclassified some warn and info log entries that were erroneously categorized as error.

  • Revised text in the Application Detail to more clearly explain the status when an application does not return full data; for example, the percentage of hosts that are reporting information.

  • Support added for Ubuntu 16.04 LTS and Ubuntu 18.04 LTS, and dropped for Ubuntu 10.04 LTS and 12.04 LTS.

July 5, 2018

  • Additional criteria added to App Search.

    • You can search for apps of any type or just those of type Spark, MapReduce, Tez, or Hive.
    • You can search for apps of all statuses or just those that are completed, failed, or killed. For details about searching for apps, see Finding Your Applications.
  • In the Recommendations column of the Matched Apps Table—the results of searching for apps—a new value, “unavailable”, indicates that recommendations are not available, likely because the app hasn’t completed, the data is incomplete, or Application Profiler is not enabled. Previously the value was shown as “none”, which is now used only for apps that ran fine (and which therefore have no recommendations).

  • Added a Time Spent in Straggler Task State condition to the Bottlenecks tab of the App Detail page. As with other bottleneck conditions, you can receive notifications of this bottleneck for future app runs by clicking Create Alarm. For information about configuring the alarm, see Create Alarms from the App Details Page.

  • Added metrics for idle Spark executors:

    • spark_executor_percent—percentage of runtime that the executor was idle
    • idle_spark_executor_time—total time the Spark executor was idle

    The metrics are named Idle spark executor percent and Executor idle seconds, respectively, in the Metric selection on the Charts page. At the app level, these metrics let you see the amount of time when executors were not running any tasks. If you break them down by Spark executor with an app filter, you can see the idle times for each executor.

  • For Hive on MapReduce apps that ran on clusters with Application Profiler enabled, added a SQL tab to the app’s App Detail page.

  • For Spark and MapReduce apps, added a Config tab to the app’s App Detail page to show the configuration settings that were in effect for the app’s run.

  • In the NodeManagers tile on the Cluster Health Summary page, the details now show the change in the total number of NodeManagers reporting to the YARN ResourceManager (instead of separate counts for the number of arrivals and departures).

  • Added a Metrics API query string argument that lets you omit aggregate series from the results: omitaggregates; see Metrics API Query String Arguments.

June 5, 2018

  • New metrics for straggler tasks—the last remaining task for an app whose progress is greater than 90%, and has more than 50 containers in the past are now available.

    You can configure alarms to fire for apps that have spent more than a given amount of time in the straggler task state or more than a given percentage of overall runtime in the straggler task state. For details, see Straggler Tasks.

  • To enhance security, the Pepperdata dashboard requires your browser to be enabled for TLS 1.2. The dashboard no longer supports TLS 1.0/1.1.

May 17, 2018

  • In the App Search workflow:

    • Simplified the controls for renaming and copying saved searches.

    • You can specify a custom time range for app searches. From the Find results from last options, click custom.

    • The Matched Apps Table of search results now includes App type and Recommendations columns. You can toggle the display to show only apps with recommendations or show all apps. See Finding Your Applications.

  • In the App Detail page:

    • Runs on the App History tab are now ordered in reverse chronological order.

    • Added a new App History page that lists the 30 most recent runs for a given app. To access the App History page, navigate to the App History tab of the App Detail, and click  See more history.

    • For Spark-related recommendations that are shown on the Recommendations tab, you can quickly navigate to the applicable error information by clicking the new Spark Tab link in the recommendation’s text.

    • On the Alarms tab, added disable/enable, edit, and delete controls, which take effect for all future runs of the app. See Alarm Tab.

    • On the Bottlenecks tab, added create and edit controls for the Time Spent Waiting In Queue and Time Spent In GC conditions. See Bottlenecks Tab.

  • In the Apps Table (from the navigation bar, click Charts & Tables > Apps), the workflow data is now “linkified”. When you click a workflow Id link the table changes from the Apps Table to the Workflows Table, with data for the selected workflow’s breakdown.

April 5, 2018

  • In the App Detail page on the Pepperdata dashboard:

    • Improved the editing alarms workflow.
    • Added disable/enable alarms control in the Alarms tab.
    • Added tool tips to charts in the Resource Usage tab.
    • Increased the speed of loading the App Detail page.
  • Expanded the Capacity Optimizer documentation. See Overview and Quantifying the Benefits of Capacity Optimizer (or corresponding pages for Pepperdata Supervisor versions newer than v6.5).

March 5, 2018

  • In the App Detail page on the Pepperdata dashboard:

    • Added See advanced charts for this app links to Resource Usage tab’s charts for easy navigation to detailed charts of underlying metrics.
    • Fixed minor display bugs in the App Detail page.
  • Added Monitoring Apache® Impala Query Metrics to the documentation.

February 5, 2018

App Search and App Detail are out of Technical Preview status and released for general availability (GA). Access to App Search moved from the “Beta” panel to the top navigation bar, some menu options are consolidated, and some menu text is replaced by icons.

  • In the App Search, you can now save your searches and rerun them anytime.

  • The App Detail page collects, analyzes, and consolidates even more information about your app into one place, and provides easier-to-follow recommendations for improving app performance. See App Details Page.

  • Menus are consolidated and text is replaced by icons as described in the table.

    Feature Old Menu/Access New Menu/Access
    App Search Address entry in the browser or the link in the Beta pane App Search on the navigation bar
    Charts Charts on the navigation bar Charts & Tables on the navigation bar
    Tables Tables on the navigation bar Charts & Tables on the navigation bar
    Alarms Alarms on the navigation bar Bell icon () on the navigation bar
    Help Help on the navigation bar Question icon () on the navigation bar
    Admin Admin on the navigation bar Gear icon () on the navigation bar
    Accounts Account on the navigation bar Gear icon () on the navigation bar