(Application Spotlight) Configure Application Profiler (RPM/DEB)

Supported versions: See the entries for Pepperdata 7.1.x in the table of Supported Platforms by Pepperdata Version

Prerequisites

Before you begin configuring Application Profiler, ensure that your system meets the required prerequisites.

  • Pepperdata must be installed on the host running the MapReduce Job History Server
  • MapReduce Job History Server must be running
  • (Spark Monitoring) Spark History Server must be running
  • Your cluster uses a supported authentication protocol; see Supported Authentication Protocols for Application Profiler, below

Supported Authentication Protocols for Application Profiler

To enable Application Profiler to fetch application data from the MapReduce Job History Server/Spark History Server, your cluster must use a Pepperdata-supported authentication protocol:

  • No authentication.

  • Pseudo auth (also known as Hadoop’s simple authentication)—the server authenticates requests based on the user.name query string parameter contained in the request.

  • Kerberos.

  • Basic access (BA) authentication—uses standard fields in the HTTP header to specify the user name and password; for details, see https://en.wikipedia.org/wiki/Basic_access_authentication .

Task 1: Configure Pepperdata to Monitor Application History

Procedure

  1. If there is no /etc/pepperdata/pepperdata-config.sh file, copy /etc/pepperdata/pepperdata-config.sh-template to /etc/pepperdata/pepperdata-config.sh.

  2. Edit the /etc/pepperdata/pepperdata-config.sh file as follows.

    • Modify the value of PD_JOBHISTORY_MONITOR_ENABLED to 1.

    • To enable Spark application monitoring, add the configuration according to your environment.

      • If the spark-defaults.conf file contains the correct assignment for spark.yarn.historyServer.address for the first Spark History Server, configure the SPARK_CONF_DIR environment variable to match:

        export SPARK_CONF_DIR=your-path-to-first-spark-conf-directory

        Where your-path-to-first-spark-conf-directory is the directory that contains the spark-defaults.conf file.

      • If the spark-defaults.conf file does not include spark.yarn.historyServer.address or its value is incorrect, and you can edit the spark-defaults.conf file:

        1. Edit the spark-defaults.conf file so that it includes the correct assignment for spark.yarn.historyServer.address for the first Spark History Server.

        2. In the pepperdata-config.sh file, configure the SPARK_CONF_DIR environment variable to match: export SPARK_CONF_DIR=your-path-to-first-spark-conf-directory, where your-path-to-first-spark-conf-directory is the directory that contains the spark-defaults.conf file.

      • For all other cases, edit the pepperdata-config.sh file to include the PD_SPARK_HISTORY_SERVER_ADDRESS environment variable, and set its value to the first Spark History Server’s fully-qualified URL.

    Example of modifications to the pepperdata-config.sh file

    export PD_JOBHISTORY_MONITOR_ENABLED=1
    # For Spark Application Monitoring
    export SPARK_CONF_DIR=your-path-to-spark-conf-directory
    # Or, if the spark-defaults.conf file does not contain the correct assignment
    # for the spark.yarn.historyServer.address, and you cannot edit it:
    # export PD_SPARK_HISTORY_SERVER_ADDRESS=http(s)://url-to-your-spark-history:port
    
  3. Save your changes and close the file.

Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication

If the core services of the ResourceManagers, the MapReduce Job History Server, and, for Tez support in Application Profiler, the YARN Timeline Server are Kerberized (secured with Kerberos), add the authentication type for the auxiliary HTTP/HTTPS endpoint service to the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh.

Prerequisites

  • Be sure that the Kerberos principal has access to the ResourceManager, MapReduce Job History Server, and, for Tez support in Application Profiler, YARN Timeline Server endpoints (HTTP or HTTPS).

  • Be sure that you added the required environment variables—PD_AGENT_PRINCIPAL and PD_AGENT_KEYTAB_LOCATION—to the Pepperdata configuration during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication).

Procedure

  1. For your Kerberized ResourceManager host, determine its authentication type by running the following cURL command, where {your-protocol} is http or https:

    curl --tlsv1.2 -kI {your-protocol}://RM_HOST:PORT/ws/v1/cluster/info | grep WWW-Authenticate

    • If the returned response is WWW-Authenticate: Negotiate, the authentication type (your-rm-auth-type) is kerberos.
    • Otherwise nothing is returned, and the authentication type (your-rm-auth-type) is simple.
  2. For your Kerberized MapReduce Job History Server host, determine its authentication type by running the following cURL command, where {your-protocol} is http or https:

    curl --tlsv1.2 -kI {your-protocol}://JHS_HOST:PORT/ws/v1/history | grep WWW-Authenticate

    • If the returned response is WWW-Authenticate: Negotiate, the authentication type (your-jhs-auth-type) is kerberos.
    • Otherwise nothing is returned, and the authentication type (your-jhs-auth-type) is simple.
  3. For your Kerberized YARN Timeline Server host, determine its authentication type by running the following cURL command, where {your-protocol} is http or https:

    curl --tlsv1.2 -kI {your-protocol}://TIMELINE_SERVER_HOST:PORT/ws/v1/timeline | grep WWW-Authenticate

    • If the returned response is WWW-Authenticate: Negotiate, the authentication type (your-timeline-server-auth-type) is kerberos.
    • Otherwise nothing is returned, and the authentication type (your-timeline-server-auth-type) is simple.
  4. Add the environment variables for the HTTP/HTTPS endpoint’s authentication type for the ResourceManager and the MapReduce Job History Server.

    1. Open the /etc/pepperdata/pepperdata-config.sh for editing.

    2. Add the required environment variables.

      Be sure to substitute the authentication type (simple or kerberos, as you determined in the previous step) for the your-rm-auth-type, your-jhs-auth-type, and your-timeline-server-auth-type placeholders.

      # For ResourceManager:
      export PD_JOBHISTORY_RESOURCE_MANAGER_HTTP_AUTH_TYPE=your-rm-auth-type
      # For MapReduce Job History Server:
      export PD_JOBHISTORY_MR_HISTORY_SERVER_HTTP_AUTH_TYPE=your-jhs-auth-type
      
    3. Save your changes and close the file.

  1. (Hadoop clusters with YARN 3.x) For YARN 3.x environments (which typically align with Hadoop 3.x-based distros), add authentication properties to the Pepperdata configuration to enable REST access.

    1. On the ResourceManager host, open the Pepperdata site file, pepperdata-site.xml, for editing.

      By default, the Pepperdata site file, pepperdata-site.xml, is located in /etc/pepperdata. If you customized the location, the file is specified by the PD_CONF_DIR environment variable. See Change the Location of pepperdata-site.xml for details.

    2. Add the required properties.

      Be sure to substitute your HTTP service policy—HTTP_ONLY or HTTPS_ONLY—for the your-http-service-policy placeholder in the following code snippet.

      For Kerberized clusters, the HTTP service policy is usually HTTPS_ONLY. But you should check with your cluster administrator or look for the value of the yarn.http.policy property in the cluster’s yarn-site.xml file or the Hadoop configuration.

      <property>
        <name>pepperdata.agent.yarn.http.authentication.type</name>
        <value>kerberos</value>
      </property>
      <property>
        <name>pepperdata.agent.yarn.http.policy</name>
        <value>your-http-service-policy</value>
      </property>
      
      Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such as xmllint, after you edit any .xml configuration file.
    3. Save your changes and close the file.

Task 3: (Basic Access Authentication) Add BA Authentication Credentials

For Basic access (BA) authentication, add the BA authentication credentials for the monitored applications’ servers to the Pepperdata configuration.

Procedure

  1. Open the pepperdata-config.sh file for editing.

  2. Add the required environment variables. Be sure to substitute your user name and password for the your-username and your-password placeholders. (The same environment variables are used to configure the BA authentication credentials for the ResourceManager, MapReduce Job History Server, and YARN Timeline Server.)

    # For ResourceManager, MapReduce Job History Server, YARN Timeline Server
    export PD_AGENT_SIMPLE_OR_BASIC_AUTH_USERNAME=your-username
    export PD_AGENT_BASIC_AUTH_PASSWORD=your-password
    
    # For Spark History Server
    export PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_USERNAME=your-username
    export PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_PASSWORD=your-password
    
  3. Save your changes and close the file.

Task 4: Activate Application History Monitoring

On the MapReduce Job History Server host, start/restart the PepAgent.

You can use either the service (if provided by your OS) or systemctl command:

  • sudo service pepagentd restart
  • sudo systemctl restart pepagentd

Task 5: Access the Application Profiler on the Pepperdata Dashboard

The Application Profiler interface is integrated into the Pepperdata Dashboard.

The Applications and Recommendations sections of the dashboard Cluster View show pertinent data for every application that is monitored by Application Profiler.

  • To view detailed metrics of a highlighted application, click its link in a tile.
  • To view the table of applications that have Pepperdata recommendations of a given severity, click that severity in the applicable Recommendations tile (for the app type).
  • To view data for all monitored applications, use the left-nav menu, and select App Spotlight > Application Profiler.

Task 6: (Hadoop 2) Confirm Near Real-Time Data Collection

To confirm that the Application Profiler is correctly configured for near real-time monitoring in Hadoop 2, view the data collection process stats (MapReduce Job History Server retrieval).

Be sure to replace the your-jobhistory-server-host placeholder with the URL of your actual MapReduce Job History Server.

http://your-jobhistory-server-host:50505/JobHistoryMonitor.