Upgrade Hadoop Distribution and Pepperdata

Supported Pepperdata upgrade paths: from any earlier Pepperdata 6.x version

If you’re upgrading from any Pepperdata v5.x version, contact Pepperdata Support.

To upgrade a cluster’s Hadoop distribution and the Pepperdata software, begin with any host, and stop the Pepperdata agents, remove the Pepperdata software, upgrade the Hadoop distribution and the Pepperdata software, and restart the Hadoop services and Pepperdata agents.

This upgrade procedure is for Cloudera Distribution of Hadoop (CDH) and Cloudera Data Platform (CDP) Private Cloud Base.

If you want to perform an upgrade in a CDP Public Cloud environment, you must create a new environment and Data Hub cluster, and install the Pepperdata Supervisor version that you want; see Installing Pepperdata (CDP Public Cloud).

Task 1: Stop the Pepperdata Agents


  • In Cloudera Manager, select the Stop action for the Pepperdata service.

Task 2: Upgrade Your Hadoop Distribution


  1. Upgrade the Hadoop distribution according to the distribution’s instructions.

  2. Verify that the snippet below is still included in the appropriate template(s), based on which services are configured to run on the host.

    • YARN ResourceManager/NodeManager: YARN > Configs > Advanced > Advanced yarn-env > yarn-env template
    • HBase Master or HBase RegionServer: HBase > Configs > Advanced > Advanced hbase-env > hbase-env template
    • Apache Spark: Spark > Configs > Advanced > Advanced spark-env > spark-env template
    • Apache Spark 2: Spark 2 > Configuration > Gateway > Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh

Task 3: Install the New Pepperdata Supervisor Parcel and/or CSD

This procedure includes migration-upgrade steps that are applicable only if you are upgrading from a PEPPERDATA-1.x.jar CSD for the Pepperdata Supervisor (that is, from Supervisor v6.4.17 or earlier).

For such migration-upgrades, this procedure explains how to use the PepperdataMigration service (the PEPPERDATA-MIGRATION-X.Y.jar file) before you actually install Pepperdata 8.1 (which uses a non-root user, PD_USER). The PepperdataMigration service manages the directory ownership changes that are required for upgrades from user root to any other user.

If you do not need to perform a migration-upgrade, your upgrade process is referred to in these procedures as the standard upgrade.


  1. Download the installation files that you need—Parcel, CSD(s), or both—from the Downloads page to any local directory, and copy them to the Cloudera Manager Server.

  2. Install the installation files for your upgrade (parcel and/or CSD).

    1. (Parcel) Extract the contents of the TGZ archive, and move the parcel (the *.parcel file) and corresponding SHA checksum file (*.parcel.sha) to the /opt/cloudera/parcel-repo directory.

    2. (CSD) Remove any existing PEPPERDATA-X.Y.Z.jar (and PEPPERDATA-X.Y.jar) files from the /opt/cloudera/csd directory.

    3. (CSD) Move one custom service descriptor (CSD) file to the /opt/cloudera/csd directory:

      • (Standard upgrade) Move the PEPPERDATA-X.Y.Z.jar CSD JAR file
      • (Migration-upgrade) Move the PEPPERDATA-MIGRATION-X.Y.jar CSD JAR file
  3. Restart the Cloudera Service and Configuration Manager (SCM) server (service: cloudera-scm-server).

    service cloudera-scm-server restart

    After the restart, the new parcel and the Pepperdata service (in the CSD JAR file that you moved—PEPPERDATA-X.Y.Z.jar or PEPPERDATA-MIGRATION-X.Y.jar) are available for activation.

  4. (Parcel upgrade) In Cloudera Manager, distribute and activate the Pepperdata Supervisor parcel—the *.parcel file.

  1. (Migration-upgrade only) Perform the migration.

    1. Run the PepperdataMigration service.

      In the configuration wizard, you can optionally change the values of any/all of the following variables. If you change them, be sure to make note of your changed (non-default) values; you’ll need them later, in step 7.e.

      1. Pepperdata user (default=pepperdata)
      2. Pepperdata YARN group; typically this should be the group that the ResourceManager/NodeManager daemons are running as
      3. Logging and configuration directories

      After running the PepperdataMigration service, the directory permissions will correctly support running Pepperdata as a non-root user.

    2. Remove the PepperdataMigration service.

      1. In Cloudera Manager, stop and remove the PepperdataMigration service. (For details, see the Cloudera documentation for your version of Cloudera Manager.)

      2. Delete the PepperdataMigration service CSD JAR (PEPPERDATA-MIGRATION-X.Y.jar) from the /opt/cloudera/csd directory.

    3. Ready the Pepperdata service installation.

      Move the PEPPERDATA-X.Y.Z.jar CSD JAR to the /opt/cloudera/csd directory.

  2. (Migration-upgrade only) Install the Pepperdata service, which is contained in the PEPPERDATA-X.Y.Z.jar CSD JAR file that you already moved to the csd directory.

    1. Restart the Cloudera Service and Configuration Manager (SCM) server (service: cloudera-scm-server).

      service cloudera-scm-server restart

      After the restart, the new Pepperdata service (in the PEPEPRDATA-X.Y.Z.jar CSD JAR file) is available for activation.

    2. Use the Cloudera Manager interface to restart the Cloudera Management service.

  3. (Migration-upgrade only) Add the Pepperdata service to Cloudera Manager.

    Use Cloudera Manager to perform this procedure, which adds the Pepperdata service and the custom service descriptor (CSD) to the Cloudera Manager environment.

    1. Select your cluster, click Actions > Add Service, in the Service Type column, select Pepperdata, and click Continue.

    2. Select Dependencies page.

      • (Kerberized clusters) If the core services of the ResourceManagers and the MapReduce Job History Server are Kerberized (secured with Kerberos), select Optional Dependencies. (The YARN dependency is required so that Pepperdata can fetch YARN-related values to use for the Pepperdata configuration.)

      • (Clusters without Kerberos) Select No Optional Dependencies.

    3. Assign Roles page. Customize the Role Assignments:

      • Click PepAgent, select all hosts, and click OK.
      • Click Supervisor, select all the ResourceManager hosts, click OK, and click Continue.
      Do not assign the PepMetrics role. It is now unsupported and unneeded.
    4. In the Review Changes page, enter your custom information.

      1. For the Pepperdata License Specification, enter data:// and then (without any additional spaces) the contents of the license file that we emailed you. If the data:// string is already shown, do not enter it a second time.

      2. For the Pepperdata Dashboard Cluster Realm Name, enter the cluster name exactly as shown in the license email. Be sure to use the same capitalization.

      3. (Non-Hadoop Clusters) If you’re installing Pepperdata on a cluster without Hadoop, such as a Kafka-only cluster for Streaming Spotlight, the Pepperdata PepAgent must be configured to run without Hadoop.

        If you’re installing Pepperdata in a cluster that has Hadoop, skip this substep. If you perform this substep in a Hadoop cluster, Pepperdata will not operate correctly.

        Locate the Run Pepperdata in Non-Hadoop Environment parameter, and select it.

      4. (Kerberized clusters) If the core services of the ResourceManagers and the MapReduce Job History Server are Kerberized (secured with Kerberos), locate the Enable Access to Kerberized Cluster Components parameter, and ensure that it is selected.

        • Newer versions of Cloudera Manager automatically detect that Kerberos is enabled on a cluster. In this case, the option will already be selected, and you must be careful to not cancel the option by selecting (clicking) it again.

        • Older versions of Cloudera Manager do not detect that Kerberos is enabled, so you must select this option.

      5. Click Continue.

    5. Complete the steps as prompted by the Add Service wizard, all the way through (and including) clicking Finish.

    6. In Cloudera Manager, navigate to Pepperdata > Configuration, and reconfigure the user.

      If you changed the System Group and logging and configuration directories when you ran the PepperdataMigration service (step 5.a), be sure to update the the System Group and logging and configuration directories variables, as well.

      • System User—Set the value to match the value of the Pepperdata user (default=pepperdata).
      • System Group—Set the value to match the Pepperdata YARN group.
      • Logging and configuration directories.
    7. In Cloudera Manager, select the Start action for the Pepperdata service.

Task 4: (Upgrades from Supervisor v6.2 or earlier) Update Paths to Pepperdata JARs and Scripts

If you are upgrading from any Supervisor v6.3.x release or later, skip this procedure because the paths are already correct for Supervisor v8.1.

To update the paths for Pepperdata JARs and scripts to their new locations, use Cloudera Manager to edit the snippets in the applicable templates.


  1. Revise the paths for YARN instrumentation.

    Use Cloudera Manager to edit the snippets in the templates. If there is no corresponding entry in a given template—for example, you are not using Spark 2, so the Spark 2 template is empty—do not add the new snippet.

    1. YARN (MR2 Included) > Configuration > ResourceManager > Java Configuration Options for ResourceManager:

      Old value: -javaagent:/opt/pepperdata/lib/PepperdataSupervisor.jar

      New value: -javaagent:/opt/cloudera/parcels/PEPPERDATA_SUPERVISOR/lib/PepperdataSupervisor.jar

    2. YARN (MR2 Included) > Configuration > NodeManager > Java Configuration Options for NodeManager

      Old value: -javaagent:/opt/pepperdata/lib/PepperdataSupervisor.jar

      New value: -javaagent:/opt/cloudera/parcels/PEPPERDATA_SUPERVISOR/lib/PepperdataSupervisor.jar

    3. Spark > Configuration > Gateway > Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh

      Old value: PEPPERDATA_ACTIVATE_SCRIPT_PATH="/opt/pepperdata/supervisor/lib/pepperdata-activate.sh"

      New value (entered as two separate lines):

    4. Spark2 > Configuration > Gateway > Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh

      Old value: PEPPERDATA_ACTIVATE_SCRIPT_PATH="/opt/pepperdata/supervisor/lib/pepperdata-activate.sh"

      New value (entered as two separate lines):


  2. (Query Spotlight) If Query Spotlight is enabled for the cluster, use Cloudera Manager to edit the snippets in the templates.

    1. Hive > Configuration > Client Java Configuration Options

      Old value: -javaagent:/opt/pepperdata/supervisor/lib/internal-jars/query/hive/PLACEHOLDER-FOR-YOUR-HIVE-QUERY-JAR-NAME

      New value: -javaagent:/opt/cloudera/parcels/PEPPERDATA_SUPERVISOR/supervisor/lib/internal-jars/query/hive/PLACEHOLDER-FOR-YOUR-HIVE-QUERY-JAR-NAME

    2. Hive > Configuration > Java Configuration Options for HiveServer2

      Old value: -javaagent:/opt/pepperdata/supervisor/lib/internal-jars/query/hive/PLACEHOLDER-FOR-YOUR-HIVE-QUERY-JAR-NAME

      New value: -javaagent:/opt/cloudera/parcels/PEPPERDATA_SUPERVISOR/supervisor/lib/internal-jars/query/hive/PLACEHOLDER-FOR-YOUR-HIVE-QUERY-JAR-NAME

  3. (HBase monitoring) If the cluster includes HBase and you’ve enabled Pepperdata to monitor it, use Cloudera Manager to edit the snippets in the templates.

    • HBase > Configuration > RegionServer > Java Configuration Options for HBase RegionServer

      Old value: -javaagent:/opt/pepperdata/lib/PepperdataSupervisor.jar

      New value: -javaagent:/opt/cloudera/parcels/PEPPERDATA_SUPERVISOR/lib/PepperdataSupervisor.jar

Task 5: Restart the Required Services


  1. In Cloudera Manager, navigate to your cluster’s YARN (MR2 Included) service > Instances, select all ResourceManager and NodeManager hosts, and in the Actions for Selected list, select Restart.

  2. (If using HBase) Navigate back to the cluster view, and for the HBase service, select the Rolling Restart action, and then select only the HBase RegionServers.

  3. (If using Hive) Restart the required service according to your version of Cloudera’s Distribution of Hadoop (CDH) or Cloudera Data Platform (CDP) Runtime.

    • CDP 7.x:

      Navigate back to the cluster view, and for the Hive on Tez service, select the Restart action.

    • CDH 6.x:

      Navigate to the Hive Service instances, select all HiveServer2 hosts, and in the Actions for Selected list, select Restart.

Task 6: Restart the Pepperdata Agents


  • In Cloudera Manager, select the Start action for the Pepperdata service.

Task 7: (Parcel Upgrade) Remove the Old Pepperdata Parcels

If you’re upgrading only the CSD, not the Parcel, skip this task.


  • In Cloudera Manager, remove all old Pepperdata Supervisor parcels. (For details, see the Cloudera documentation for your version of Cloudera Manager.)