(Application Spotlight) Configure Application Profiler (RPM/DEB)
Supported versions: See the entries for Pepperdata 8.1.x in the table of Supported Platforms by Pepperdata Version
On This Page
- Prerequisites
- Supported Authentication Protocols for Application Profiler
- Task 1: Configure Pepperdata to Monitor Application History
- Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication
- Task 3: (Basic Access Authentication) Add BA Authentication Credentials
- Task 4: Activate Application History Monitoring
- Task 5: Access the Application Profiler on the Pepperdata Dashboard
- Task 6: (Hadoop 2) Confirm Near Real-Time Data Collection
Prerequisites
Before you begin configuring Application Profiler, ensure that your system meets the required prerequisites.
- Pepperdata must be installed on the host running the MapReduce Job History Server
- MapReduce Job History Server must be running
- (Spark Monitoring) Spark History Server must be running
- Your cluster uses a supported authentication protocol; see Supported Authentication Protocols for Application Profiler, below
Supported Authentication Protocols for Application Profiler
To enable Application Profiler to fetch application data from the MapReduce Job History Server/Spark History Server, your cluster must use a Pepperdata-supported authentication protocol:
-
No authentication.
-
Pseudo auth (also known as Hadoop’s simple authentication)—the server authenticates requests based on the
user.name
query string parameter contained in the request. -
Kerberos.
-
Basic access (BA) authentication—uses standard fields in the HTTP header to specify the user name and password; for details, see https://en.wikipedia.org/wiki/Basic_access_authentication .
Task 1: Configure Pepperdata to Monitor Application History
Procedure
-
If there is no
/etc/pepperdata/pepperdata-config.sh
file, copy/etc/pepperdata/pepperdata-config.sh-template
to/etc/pepperdata/pepperdata-config.sh
. -
Edit the
/etc/pepperdata/pepperdata-config.sh
file as follows.-
Modify the value of
PD_JOBHISTORY_MONITOR_ENABLED
to1
. -
To enable Spark application monitoring, add the configuration according to your environment.
Note: If you’re using Application Profiler to fetch history data for Spark apps, you can customize the connection timeout value and/or add a second Spark History Server for monitoring. See Configure Spark History Servers.-
If the
spark-defaults.conf
file contains the correct assignment forspark.yarn.historyServer.address
for the first Spark History Server, configure theSPARK_CONF_DIR
environment variable to match:export SPARK_CONF_DIR=your-path-to-first-spark-conf-directory
Where
your-path-to-first-spark-conf-directory
is the directory that contains thespark-defaults.conf
file. -
If the
spark-defaults.conf
file does not includespark.yarn.historyServer.address
or its value is incorrect, and you can edit thespark-defaults.conf
file:-
Edit the
spark-defaults.conf
file so that it includes the correct assignment forspark.yarn.historyServer.address
for the first Spark History Server. -
In the
pepperdata-config.sh
file, configure theSPARK_CONF_DIR
environment variable to match:export SPARK_CONF_DIR=your-path-to-first-spark-conf-directory
, whereyour-path-to-first-spark-conf-directory
is the directory that contains thespark-defaults.conf
file.
-
-
For all other cases, edit the
pepperdata-config.sh
file to include thePD_SPARK_HISTORY_SERVER_ADDRESS
environment variable, and set its value to the first Spark History Server’s fully-qualified URL.
-
Example of modifications to the pepperdata-config.sh file
export PD_JOBHISTORY_MONITOR_ENABLED=1 # For Spark Application Monitoring export SPARK_CONF_DIR=your-path-to-spark-conf-directory # Or, if the spark-defaults.conf file does not contain the correct assignment # for the spark.yarn.historyServer.address, and you cannot edit it: # export PD_SPARK_HISTORY_SERVER_ADDRESS=http(s)://url-to-your-spark-history:port
-
-
Save your changes and close the file.
Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication
If the core services of the ResourceManagers, the MapReduce Job History Server, and, for Tez support in Application Profiler, the YARN Timeline Server are Kerberized (secured with Kerberos), add the authentication type for the auxiliary HTTP/HTTPS endpoint service to the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh
.
kerberos
authentication, the auxiliary services, such as HTTP/HTTPS, can use either simple
or kerberos
authentication.Prerequisites
-
Be sure that the Kerberos principal has access to the ResourceManager, MapReduce Job History Server, and, for Tez support in Application Profiler, YARN Timeline Server endpoints (HTTP or HTTPS).
-
Be sure that you added the required environment variables—
PD_AGENT_PRINCIPAL
andPD_AGENT_KEYTAB_LOCATION
—to the Pepperdata configuration during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication).
Procedure
-
For your Kerberized ResourceManager host, determine its authentication type by running the following cURL command, where
{your-protocol}
ishttp
orhttps
:curl --tlsv1.2 -kI {your-protocol}://RM_HOST:PORT/ws/v1/cluster/info | grep WWW-Authenticate
- If the returned response is
WWW-Authenticate: Negotiate
, the authentication type (your-rm-auth-type
) iskerberos
. - Otherwise nothing is returned, and the authentication type (
your-rm-auth-type
) issimple
.
- If the returned response is
-
For your Kerberized MapReduce Job History Server host, determine its authentication type by running the following cURL command, where
{your-protocol}
ishttp
orhttps
:curl --tlsv1.2 -kI {your-protocol}://JHS_HOST:PORT/ws/v1/history | grep WWW-Authenticate
- If the returned response is
WWW-Authenticate: Negotiate
, the authentication type (your-jhs-auth-type
) iskerberos
. - Otherwise nothing is returned, and the authentication type (
your-jhs-auth-type
) issimple
.
- If the returned response is
-
For your Kerberized YARN Timeline Server host, determine its authentication type by running the following cURL command, where
{your-protocol}
ishttp
orhttps
:curl --tlsv1.2 -kI {your-protocol}://TIMELINE_SERVER_HOST:PORT/ws/v1/timeline | grep WWW-Authenticate
- If the returned response is
WWW-Authenticate: Negotiate
, the authentication type (your-timeline-server-auth-type
) iskerberos
. - Otherwise nothing is returned, and the authentication type (
your-timeline-server-auth-type
) issimple
.
- If the returned response is
-
Add the environment variables for the HTTP/HTTPS endpoint’s authentication type for the ResourceManager and the MapReduce Job History Server.
-
Open the
/etc/pepperdata/pepperdata-config.sh
for editing. -
Add the required environment variables.
Be sure to substitute the authentication type (
simple
orkerberos
, as you determined in the previous step) for theyour-rm-auth-type
,your-jhs-auth-type
, andyour-timeline-server-auth-type
placeholders.# For ResourceManager: export PD_JOBHISTORY_RESOURCE_MANAGER_HTTP_AUTH_TYPE=your-rm-auth-type # For MapReduce Job History Server: export PD_JOBHISTORY_MR_HISTORY_SERVER_HTTP_AUTH_TYPE=your-jhs-auth-type
-
Save your changes and close the file.
-
-
(Hadoop clusters with YARN 3.x) For YARN 3.x environments (which typically align with Hadoop 3.x-based distros), add authentication properties to the Pepperdata configuration to enable REST access.
Note: If you already configured the authentication properties during the installation process, you do not need to do so again, and you should skip this procedure now.-
On the ResourceManager host, open the Pepperdata site file,
pepperdata-site.xml
, for editing.By default, the Pepperdata site file,
pepperdata-site.xml
, is located in/etc/pepperdata
. If you customized the location, the file is specified by thePD_CONF_DIR
environment variable. See Change the Location of pepperdata-site.xml for details. -
Add the required properties.
Be sure to substitute your HTTP service policy—
HTTP_ONLY
orHTTPS_ONLY
—for theyour-http-service-policy
placeholder in the following code snippet.For Kerberized clusters, the HTTP service policy is usually
HTTPS_ONLY
. But you should check with your cluster administrator or look for the value of theyarn.http.policy
property in the cluster’syarn-site.xml
file or the Hadoop configuration.<property> <name>pepperdata.agent.yarn.http.authentication.type</name> <value>kerberos</value> </property> <property> <name>pepperdata.agent.yarn.http.policy</name> <value>your-http-service-policy</value> </property>
Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint
, after you edit any .xml configuration file. -
Save your changes and close the file.
-
Task 3: (Basic Access Authentication) Add BA Authentication Credentials
For Basic access (BA) authentication, add the BA authentication credentials for the monitored applications’ servers to the Pepperdata configuration.
Procedure
-
Open the
pepperdata-config.sh
file for editing. -
Add the required environment variables. Be sure to substitute your user name and password for the
your-username
andyour-password
placeholders. (The same environment variables are used to configure the BA authentication credentials for the ResourceManager, MapReduce Job History Server, and YARN Timeline Server.)# For ResourceManager, MapReduce Job History Server, YARN Timeline Server export PD_AGENT_SIMPLE_OR_BASIC_AUTH_USERNAME=your-username export PD_AGENT_BASIC_AUTH_PASSWORD=your-password # For Spark History Server export PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_USERNAME=your-username export PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_PASSWORD=your-password
-
Save your changes and close the file.
Task 4: Activate Application History Monitoring
On the MapReduce Job History Server host, start/restart the PepAgent.
You can use either the service
(if provided by your OS) or systemctl
command:
sudo service pepagentd restart
sudo systemctl restart pepagentd
Task 5: Access the Application Profiler on the Pepperdata Dashboard
The Application Profiler interface is integrated into the Pepperdata Dashboard.
The Applications and Recommendations sections of the dashboard Cluster View show pertinent data for every application that is monitored by Application Profiler.
- To view detailed metrics of a highlighted application, click its link in a tile.
- To view the table of applications that have Pepperdata recommendations of a given severity, click that severity in the applicable Recommendations tile (for the app type).
- To view data for all monitored applications, use the left-nav menu, and select App Spotlight > Application Profiler.
Task 6: (Hadoop 2) Confirm Near Real-Time Data Collection
To confirm that the Application Profiler is correctly configured for near real-time monitoring in Hadoop 2, view the data collection process stats (MapReduce Job History Server retrieval).
Be sure to replace the your-jobhistory-server-host
placeholder with the URL of your actual MapReduce Job History Server.
http://your-jobhistory-server-host:50505/JobHistoryMonitor
.