(Application Spotlight) Configure Application Profiler (Parcel)

Supported versions: See the CDH and CDP Private Cloud Base entries for Pepperdata 7.1.x in the table of Supported Platforms by Pepperdata Version

Prerequisites

Before you begin configuring Application Profiler, ensure that your system meets the required prerequisites.

  • Pepperdata must be installed on the host running the MapReduce Job History Server
  • MapReduce Job History Server must be running
  • (Spark Monitoring) Spark History Server must be running
  • Your cluster uses a supported authentication protocol; see Supported Authentication Protocols for Application Profiler, below

Supported Authentication Protocols for Application Profiler

To enable Application Profiler to fetch application data from the MapReduce Job History Server/Spark History Server, your cluster must use a Pepperdata-supported authentication protocol:

  • No authentication.

  • Pseudo auth (also known as Hadoop’s simple authentication)—the server authenticates requests based on the user.name query string parameter contained in the request.

  • Kerberos.

  • Basic access (BA) authentication—uses standard fields in the HTTP header to specify the user name and password; for details, see https://en.wikipedia.org/wiki/Basic_access_authentication .

Task 1: Configure Pepperdata to Monitor Application History

Procedure

  1. In Cloudera Manager, locate the Enable JobHistory Monitoring parameter, and select it.

  2. To enable Spark application monitoring, enable the Spark dependency, and enable the associated Pepperdata CSD parameter.

    1. Locate the SPARK_ON_YARN Service dependency, and for Pepperdata (Servie-Wide), select Spark.

    2. Locate the Enable Spark Application History parameter, and select it.

Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication

If the core services of the ResourceManagers or the MapReduce Job History Server are Kerberized (secured with Kerberos), add the authentication type for the auxiliary HTTP/HTTPS endpoint service to the Pepperdata configuration.

Prerequisites

  • Be sure that the Kerberos principal has access to the ResourceManager and MapReduce Job History Server endpoints (HTTP or HTTPS).

  • (CDH and CDP Private Cloud Base) Be sure that you selected Enable Access to Kerberized Cluster Components during the installation process (Task 2: Add Pepperdata Service to Cloudera Manager). (For CDP Public Cloud, Pepperdata automatically enables this for Kerberized clusters.)

Procedure

  1. For your Kerberized ResourceManager host, determine its authentication type by running the following cURL command, where {your-protocol} is http or https:

    curl --tlsv1.2 -kI {your-protocol}://RM_HOST:PORT/ws/v1/cluster/info | grep WWW-Authenticate

    • If the returned response is WWW-Authenticate: Negotiate, the authentication type (your-rm-auth-type) is kerberos.
    • Otherwise nothing is returned, and the authentication type (your-rm-auth-type) is simple.
  2. For your Kerberized MapReduce Job History Server host, determine its authentication type by running the following cURL command, where {your-protocol} is http or https:

    curl --tlsv1.2 -kI {your-protocol}://JHS_HOST:PORT/ws/v1/history | grep WWW-Authenticate

    • If the returned response is WWW-Authenticate: Negotiate, the authentication type (your-jhs-auth-type) is kerberos.
    • Otherwise nothing is returned, and the authentication type (your-jhs-auth-type) is simple.
  3. (simple authentication type) Add the environment variables for the HTTP/HTTPS endpoint’s simple authentication type for the ResourceManager and the MapReduce Job History Server.

    By default, Pepperdata assigns the authentication type to be kerberos.

    • If your authentication type is kerberos, skip this step.
    • If your authentication type is simple, perform this step to override the default values and assign them to simple.

    Use Cloudera Manager to add the following snippet to the Pepperdata > Configuration > PepAgent > PepAgent Environment Advanced Configuration Snippet (Safety Valve) template.

    # For ResourceManager:
    PD_JOBHISTORY_RESOURCE_MANAGER_HTTP_AUTH_TYPE=simple
    # For MapReduce Job History Server:
    PD_JOBHISTORY_MR_HISTORY_SERVER_HTTP_AUTH_TYPE=simple
    

Task 3: (Basic Access Authentication) Add BA Authentication Credentials

For Basic access (BA) authentication, add the BA authentication credentials for the monitored applications’ servers to the Pepperdata configuration.

Procedure

  • Use Cloudera Manager to add the following snippet to the Pepperdata > Configuration > PepAgent > PepAgent Environment Advanced Configuration Snippet (Safety Valve) template.

    Be sure to substitute your user name and password for the your-username and your-password placeholders. (The same environment variables are used to configure the BA authentication credentials for the ResourceManager and MapReduce Job History Server.)

    # For ResourceManager and MapReduce Job History Server
    PD_AGENT_SIMPLE_OR_BASIC_AUTH_USERNAME=your-username
    PD_AGENT_BASIC_AUTH_PASSWORD=your-password
    
    # For Spark History Server
    PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_USERNAME=your-username
    PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_PASSWORD=your-password
    

Task 4: Activate Application History Monitoring by Restarting PepAgent

On the MapReduce Job History Server host, use Cloudera Manager to start/restart the PepAgent.

Procedure

  • If the Pepperdata services are not yet running, select the Start action for the Pepperdata service.

  • Otherwise, select the Restart action for the Pepperdata service.

Task 5: Access the Application Profiler on the Pepperdata Dashboard

The Application Profiler interface is integrated into Application Spotlight in the Pepperdata Dashboard.

Task 6: (Hadoop 2) Confirm Near Real-Time Data Collection

To confirm that Application Profiler is correctly configured for near real-time monitoring in Hadoop 2, view the data collection process stats (MapReduce Job History Server retrieval).

Be sure to replace the your-jobhistory-server-host placeholder with the URL of your actual MapReduce Job History Server.

http://your-jobhistory-server-host:50505/JobHistoryMonitor.