(Application Spotlight) Configure Application Profiler (Parcel)

Supported versions: See the CDH and CDP Private Cloud Base entries for Pepperdata 7.1.x in the table of Supported Platforms by Pepperdata Version

On This Page

Prerequisites
Supported Authentication Protocols for Application Profiler
Task 1: Configure Pepperdata to Monitor Application History
Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication
Task 3: (Basic Access Authentication) Add BA Authentication Credentials
Task 4: Activate Application History Monitoring by Restarting PepAgent
Task 5: Access the Application Profiler on the Pepperdata Dashboard
Task 6: (Hadoop 2) Confirm Near Real-Time Data Collection

Prerequisites

Before you begin configuring Application Profiler, ensure that your system meets the required prerequisites.

Pepperdata must be installed on the host running the MapReduce Job History Server
MapReduce Job History Server must be running
(Spark Monitoring) Spark History Server must be running
Your cluster uses a supported authentication protocol; see Supported Authentication Protocols for Application Profiler, below

Supported Authentication Protocols for Application Profiler

To enable Application Profiler to fetch application data from the MapReduce Job History Server/Spark History Server, your cluster must use a Pepperdata-supported authentication protocol:

No authentication.
Pseudo auth (also known as Hadoop’s simple authentication)—the server authenticates requests based on the user.name query string parameter contained in the request.
Kerberos.
Basic access (BA) authentication—uses standard fields in the HTTP header to specify the user name and password; for details, see https://en.wikipedia.org/wiki/Basic_access_authentication .

Task 1: Configure Pepperdata to Monitor Application History

Procedure

In Cloudera Manager, locate the Enable JobHistory Monitoring parameter, and select it.
To enable Spark application monitoring, enable the Spark dependency, and enable the associated Pepperdata CSD parameter.
1. Locate the SPARK_ON_YARN Service dependency, and for Pepperdata (Servie-Wide), select Spark.
2. Locate the Enable Spark Application History parameter, and select it.
Note: If you’re using Application Profiler to fetch history data for Spark apps, you can customize the connection timeout value and/or add a second Spark History Server for monitoring. See Configure Spark History Servers.

Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication

If the core services of the ResourceManagers or the MapReduce Job History Server are Kerberized (secured with Kerberos), add the authentication type for the auxiliary HTTP/HTTPS endpoint service to the Pepperdata configuration.

Note: Even when a daemon’s core services use kerberos authentication, the auxiliary services, such as HTTP/HTTPS, can use either simple or kerberos authentication.

Prerequisites

Be sure that the Kerberos principal has access to the ResourceManager and MapReduce Job History Server endpoints (HTTP or HTTPS).
(CDH and CDP Private Cloud Base) Be sure that you selected Enable Access to Kerberized Cluster Components during the installation process (Task 2: Add Pepperdata Service to Cloudera Manager). (For CDP Public Cloud, Pepperdata automatically enables this for Kerberized clusters.)

Procedure

Note: The authentication type can be different for each type of component (ResourceManagers and the MapReduce Job History Server). Be sure to separately determine each authentication type.

For your Kerberized ResourceManager host, determine its authentication type by running the following cURL command, where {your-protocol} is http or https:

curl --tlsv1.2 -kI {your-protocol}://RM_HOST:PORT/ws/v1/cluster/info | grep WWW-Authenticate
- If the returned response is WWW-Authenticate: Negotiate, the authentication type (your-rm-auth-type) is kerberos.
- Otherwise nothing is returned, and the authentication type (your-rm-auth-type) is simple.
For your Kerberized MapReduce Job History Server host, determine its authentication type by running the following cURL command, where {your-protocol} is http or https:

curl --tlsv1.2 -kI {your-protocol}://JHS_HOST:PORT/ws/v1/history | grep WWW-Authenticate
- If the returned response is WWW-Authenticate: Negotiate, the authentication type (your-jhs-auth-type) is kerberos.
- Otherwise nothing is returned, and the authentication type (your-jhs-auth-type) is simple.
(simple authentication type) Add the environment variables for the HTTP/HTTPS endpoint’s simple authentication type for the ResourceManager and the MapReduce Job History Server.

By default, Pepperdata assigns the authentication type to be kerberos.

• If your authentication type is kerberos, skip this step.
• If your authentication type is simple, perform this step to override the default values and assign them to simple.

Use Cloudera Manager to add the following snippet to the Pepperdata > Configuration > PepAgent > PepAgent Environment Advanced Configuration Snippet (Safety Valve) template.
```
# For ResourceManager:
PD_JOBHISTORY_RESOURCE_MANAGER_HTTP_AUTH_TYPE=simple
# For MapReduce Job History Server:
PD_JOBHISTORY_MR_HISTORY_SERVER_HTTP_AUTH_TYPE=simple
```

Task 3: (Basic Access Authentication) Add BA Authentication Credentials

For Basic access (BA) authentication, add the BA authentication credentials for the monitored applications’ servers to the Pepperdata configuration.

Procedure

Use Cloudera Manager to add the following snippet to the Pepperdata > Configuration > PepAgent > PepAgent Environment Advanced Configuration Snippet (Safety Valve) template.

Be sure to substitute your user name and password for the your-username and your-password placeholders. (The same environment variables are used to configure the BA authentication credentials for the ResourceManager and MapReduce Job History Server.)
```
# For ResourceManager and MapReduce Job History Server
PD_AGENT_SIMPLE_OR_BASIC_AUTH_USERNAME=your-username
PD_AGENT_BASIC_AUTH_PASSWORD=your-password

# For Spark History Server
PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_USERNAME=your-username
PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_PASSWORD=your-password
```

Task 4: Activate Application History Monitoring by Restarting PepAgent

On the MapReduce Job History Server host, use Cloudera Manager to start/restart the PepAgent.

Procedure

If the Pepperdata services are not yet running, select the Start action for the Pepperdata service.
Otherwise, select the Restart action for the Pepperdata service.

Task 5: Access the Application Profiler on the Pepperdata Dashboard

The Application Profiler interface is integrated into Application Spotlight in the Pepperdata Dashboard.

Task 6: (Hadoop 2) Confirm Near Real-Time Data Collection

To confirm that Application Profiler is correctly configured for near real-time monitoring in Hadoop 2, view the data collection process stats (MapReduce Job History Server retrieval).

Be sure to replace the your-jobhistory-server-host placeholder with the URL of your actual MapReduce Job History Server.

http://your-jobhistory-server-host:50505/JobHistoryMonitor.