Installing Pepperdata (RPM/DEB)

To install Pepperdata, first install the package or parcel for your distro/environment; next, open up listen ports as necessary; and then, optionally, reconfigure Pepperdata properties for settings such as Unix utility command locations. Repeat this installation process on every host in your cluster.

Run all commands as the root user.

Task 1: Install the Pepperdata Software

A single RPM/DEB package contains all the Pepperdata products.

Procedure

  1. Obtain the appropriate Pepperdata <supervisor-package-name> RPM/DEB package. There are version-specific Pepperdata packages for some Hadoop versions. In such cases, the Pepperdata package name includes the Hadoop version number. See the Downloads page.

  2. Depending on the management of the cluster, install the RPM/DEB package by running the appropriate command for your environment or by using site-specific administrative tools.

    The table describes the locations of the Pepperdata files after you install the package. Except for the primary installation target, the locations are created by symlinks.

    Directory Description
    /opt/pepperdata/supervisor-<your-version> Primary installation target, containing many subdirectories and files
    /opt/pepperdata/lib/ JAR and library files
    /etc/init.d/ Initialization scripts
    /etc/pepperdata/ Configuration files, configuration templates, and site-specific configuration files

If the installation fails on any host, contact Pepperdata Support.

Task 2: Copy Configuration Template Files

Navigate to the etc/pepperdata directory and copy the following configuration template files:

  • pepperdata-config.sh-template -> pepperdata-config.sh
  • pepperdata-site.xml-template -> pepperdata-site.xml

After you finish installing Pepperdata on all the hosts in your cluster, subsequent steps will explain how to edit the Pepperdata configuration file, pepperdata-config.sh, to configure Pepperdata for your environment.

Task 3: Add the Pepperdata License

Copy the license.txt file that we emailed to you to the license file location. By default, the license file location is the /etc/pepperdata/ directory. If you customized the license file location (Manage the License Key File), the directory is specified by the pepperdata.license.key.specification property in pepperdata-site.xml.

Be sure that the file permissions for license.txt permit the PD_USER user and YARN Resource Manager process to read the license file. The permissions must be at least ----r--r--r (0444 in octal or a+r in symbolic notation).

(By default, the Pepperdata site file, pepperdata-site.xml, is located in /etc/pepperdata. If you customized the location, the file is specified by the PD_CONF_DIR environment variable. See Change the Location of pepperdata-site.xml for details.)

Task 4: (Kerberized Clusters) Enable Kerberos Authentication

If the core services of the ResourceManagers, the MapReduce Job History Server, and, for Tez support in Application Profiler, the YARN Timeline Server are Kerberized (secured with Kerberos), add the Kerberos principal and the path of the corresponding keytab file to the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh.

Prerequisites

  • Be sure that the PepAgent user has read access to the keytab file. (To determine the PepAgent user name, see the PD_USER entry in the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh.)

Procedure

  1. (Optional) Create a new user principal and keytab file to use for Pepperdata.

    Although you can reuse an existing principal and keytab file, best practice is to create a new one for Pepperdata. Separate users let you apply ACLs (access control lists) in accordance with your organization’s security policies. User principals, unlike service principals, do not include the hostname.

    (Cloudera Manager) If the cluster configuration is managed by Cloudera Manager, the path to the keytab file is dynamic. In this case, copy the keytab file to /etc/pepperdata/, and use the copied, static file to enable Kerberos authentication. This is unnecessary if you are using Cloudera Parcels, a different configuration manager, or manually managing your cluster (even on clusters with a Cloudera CDH distro).

  2. Verify that the Kerberos principal and keytab file are valid.

    1. Obtain and cache a Kerberos ticket-granting ticket by using the kinit command, which should return without error. Be sure to substitute your user name, realm name, and the location of your keytab file for the <your-kerberos-user>, <your-realm-name>, and <path-of-your-keytab-file> placeholders.

      kinit <your-kerberos-user>@<your-realm-name> -kt <path-of-your-keytab-file>

    2. Authenticate and connect by using the curl --negotiate command.

      Be sure to substitute your ResourceManager domain for the resourcemanager.example.com placeholder.

      • For non-secured endpoints (HTTP):

        curl -L --tlsv1.2 --negotiate -u : http://resourcemanager.example.com:8088

      • For secured endpoints (HTTPS):

        curl -L --tlsv1.2 --negotiate -u : https://resourcemanager.example.com:8090

      If you can connect, you’ve confirmed that the Kerberos principal and keytab file are valid. Otherwise, debug the connection failure.

  3. Add the Kerberos principal and the path of the corresponding keytab file to the Pepperdata configuration.

    1. Open the /etc/pepperdata/pepperdata-config.sh for editing.

    2. Add the required environment variables. Be sure to substitute your user name, realm name, and the location of your keytab file for the your-kerberos-user, your-realm-name, and path-of-your-keytab-file placeholders.

      export PD_AGENT_PRINCIPAL=your-kerberos-user@your-realm-name
      export PD_AGENT_KEYTAB_LOCATION=path-of-your-keytab-file
      
    3. Save your changes and close the file.

  4. (Hadoop clusters with YARN 3.x) For YARN 3.x environments (which typically align with Hadoop 3.x-based distros), add authentication properties to the Pepperdata configuration to enable REST access.

    If you’re installing Pepperdata on a cluster without Hadoop, such as a Kafka-only cluster for Streaming Spotlight, skip this step.
    1. On the ResourceManager host, open the Pepperdata site file, pepperdata-site.xml, for editing.

      By default, the Pepperdata site file, pepperdata-site.xml, is located in /etc/pepperdata. If you customized the location, the file is specified by the PD_CONF_DIR environment variable. See Change the Location of pepperdata-site.xml for details.

    2. Add the required properties.

      Be sure to substitute your HTTP service policy—HTTP_ONLY or HTTPS_ONLY—for the your-http-service-policy placeholder in the following code snippet.

      For Kerberized clusters, the HTTP service policy is usually HTTPS_ONLY. But you should check with your cluster administrator or look for the value of the yarn.http.policy property in the cluster’s yarn-site.xml file or the Hadoop configuration.

      <property>
        <name>pepperdata.agent.yarn.http.authentication.type</name>
        <value>kerberos</value>
      </property>
      <property>
        <name>pepperdata.agent.yarn.http.policy</name>
        <value>your-http-service-policy</value>
      </property>
      
      Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such as xmllint, after you edit any .xml configuration file.
    3. Save your changes and close the file.

Task 5: (Rarely Required) Open Port for Listening

PepAgents listen on port 50505, whether they’re running on ResourceManager hosts, as we recommend, or on NodeManager hosts.

In most environments this port is available for use and is not blocked by internal firewalls. However, in rare situations you might need to open/unblock this port or reconfigure which port Pepperdata uses.

In Pepperdata Supervisor v7.1.10 or earlier, PepAgents on the ResourceManager hosts also listen on port 50510 for communication in addition to the common port 50505.

To enable SSL support, see Configure SSL Near Real-Time Monitoring on Port 50505.

For information about accessing the stats that are provided via the Web servlets associated with this port, with either HTTP or SSL-secured HTTPS communication, see Pepperdata Status Views via Web Servlets.

Next: Configuring Pepperdata