(Application Spotlight) Configure Application Profiler (RPM/DEB)
Supported versions: See the entries for Pepperdata 8.0.x in the table of Supported Platforms by Pepperdata Version
On This Page
- Prerequisites
- Supported Authentication Protocols for Application Profiler
- Task 1: Configure Pepperdata to Monitor Application History
- Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication
- Task 3: (Basic Access Authentication) Add BA Authentication Credentials
- Task 4: Activate Application History Monitoring
- Task 5: Access the Application Profiler on the Pepperdata Dashboard
- Task 6: (Hadoop 2) Confirm Near Real-Time Data Collection
Prerequisites
Before you begin configuring Application Profiler, ensure that your system meets the required prerequisites.
- Pepperdata must be installed on the host running the MapReduce Job History Server
- MapReduce Job History Server must be running
- (Spark Monitoring) Spark History Server must be running
- Your cluster uses a supported authentication protocol; see Supported Authentication Protocols for Application Profiler, below
Supported Authentication Protocols for Application Profiler
To enable Application Profiler to fetch application data from the MapReduce Job History Server/Spark History Server, your cluster must use a Pepperdata-supported authentication protocol:
-
No authentication.
-
Pseudo auth (also known as Hadoop’s simple authentication)—the server authenticates requests based on the
user.namequery string parameter contained in the request. -
Kerberos.
-
Basic access (BA) authentication—uses standard fields in the HTTP header to specify the user name and password; for details, see https://en.wikipedia.org/wiki/Basic_access_authentication .
Task 1: Configure Pepperdata to Monitor Application History
Procedure
-
If there is no
/etc/pepperdata/pepperdata-config.shfile, copy/etc/pepperdata/pepperdata-config.sh-templateto/etc/pepperdata/pepperdata-config.sh. -
Edit the
/etc/pepperdata/pepperdata-config.shfile as follows.-
Modify the value of
PD_JOBHISTORY_MONITOR_ENABLEDto1. -
To enable Spark application monitoring, add the configuration according to your environment.
Note: If you’re using Application Profiler to fetch history data for Spark apps, you can customize the connection timeout value and/or add a second Spark History Server for monitoring. See Configure Spark History Servers.-
If the
spark-defaults.conffile contains the correct assignment forspark.yarn.historyServer.addressfor the first Spark History Server, configure theSPARK_CONF_DIRenvironment variable to match:export SPARK_CONF_DIR=your-path-to-first-spark-conf-directoryWhere
your-path-to-first-spark-conf-directoryis the directory that contains thespark-defaults.conffile. -
If the
spark-defaults.conffile does not includespark.yarn.historyServer.addressor its value is incorrect, and you can edit thespark-defaults.conffile:-
Edit the
spark-defaults.conffile so that it includes the correct assignment forspark.yarn.historyServer.addressfor the first Spark History Server. -
In the
pepperdata-config.shfile, configure theSPARK_CONF_DIRenvironment variable to match:export SPARK_CONF_DIR=your-path-to-first-spark-conf-directory, whereyour-path-to-first-spark-conf-directoryis the directory that contains thespark-defaults.conffile.
-
-
For all other cases, edit the
pepperdata-config.shfile to include thePD_SPARK_HISTORY_SERVER_ADDRESSenvironment variable, and set its value to the first Spark History Server’s fully-qualified URL.
-
Example of modifications to the pepperdata-config.sh file
export PD_JOBHISTORY_MONITOR_ENABLED=1 # For Spark Application Monitoring export SPARK_CONF_DIR=your-path-to-spark-conf-directory # Or, if the spark-defaults.conf file does not contain the correct assignment # for the spark.yarn.historyServer.address, and you cannot edit it: # export PD_SPARK_HISTORY_SERVER_ADDRESS=http(s)://url-to-your-spark-history:port -
-
Save your changes and close the file.
Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication
If the core services of the ResourceManagers, the MapReduce Job History Server, and, for Tez support in Application Profiler, the YARN Timeline Server are Kerberized (secured with Kerberos), add the authentication type for the auxiliary HTTP/HTTPS endpoint service to the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh.
kerberos authentication, the auxiliary services, such as HTTP/HTTPS, can use either simple or kerberos authentication.Prerequisites
-
Be sure that the Kerberos principal has access to the ResourceManager, MapReduce Job History Server, and, for Tez support in Application Profiler, YARN Timeline Server endpoints (HTTP or HTTPS).
-
Be sure that you added the required environment variables—
PD_AGENT_PRINCIPALandPD_AGENT_KEYTAB_LOCATION—to the Pepperdata configuration during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication).
Procedure
-
For your Kerberized ResourceManager host, determine its authentication type by running the following cURL command, where
{your-protocol}ishttporhttps:curl --tlsv1.2 -kI {your-protocol}://RM_HOST:PORT/ws/v1/cluster/info | grep WWW-Authenticate- If the returned response is
WWW-Authenticate: Negotiate, the authentication type (your-rm-auth-type) iskerberos. - Otherwise nothing is returned, and the authentication type (
your-rm-auth-type) issimple.
- If the returned response is
-
For your Kerberized MapReduce Job History Server host, determine its authentication type by running the following cURL command, where
{your-protocol}ishttporhttps:curl --tlsv1.2 -kI {your-protocol}://JHS_HOST:PORT/ws/v1/history | grep WWW-Authenticate- If the returned response is
WWW-Authenticate: Negotiate, the authentication type (your-jhs-auth-type) iskerberos. - Otherwise nothing is returned, and the authentication type (
your-jhs-auth-type) issimple.
- If the returned response is
-
For your Kerberized YARN Timeline Server host, determine its authentication type by running the following cURL command, where
{your-protocol}ishttporhttps:curl --tlsv1.2 -kI {your-protocol}://TIMELINE_SERVER_HOST:PORT/ws/v1/timeline | grep WWW-Authenticate- If the returned response is
WWW-Authenticate: Negotiate, the authentication type (your-timeline-server-auth-type) iskerberos. - Otherwise nothing is returned, and the authentication type (
your-timeline-server-auth-type) issimple.
- If the returned response is
-
Add the environment variables for the HTTP/HTTPS endpoint’s authentication type for the ResourceManager and the MapReduce Job History Server.
-
Open the
/etc/pepperdata/pepperdata-config.shfor editing. -
Add the required environment variables.
Be sure to substitute the authentication type (
simpleorkerberos, as you determined in the previous step) for theyour-rm-auth-type,your-jhs-auth-type, andyour-timeline-server-auth-typeplaceholders.# For ResourceManager: export PD_JOBHISTORY_RESOURCE_MANAGER_HTTP_AUTH_TYPE=your-rm-auth-type # For MapReduce Job History Server: export PD_JOBHISTORY_MR_HISTORY_SERVER_HTTP_AUTH_TYPE=your-jhs-auth-type -
Save your changes and close the file.
-
-
(Hadoop clusters with YARN 3.x) For YARN 3.x environments (which typically align with Hadoop 3.x-based distros), add authentication properties to the Pepperdata configuration to enable REST access.
Note: If you already configured the authentication properties during the installation process, you do not need to do so again, and you should skip this procedure now.-
On the ResourceManager host, open the Pepperdata site file,
pepperdata-site.xml, for editing.By default, the Pepperdata site file,
pepperdata-site.xml, is located in/etc/pepperdata. If you customized the location, the file is specified by thePD_CONF_DIRenvironment variable. See Change the Location of pepperdata-site.xml for details. -
Add the required properties.
Be sure to substitute your HTTP service policy—
HTTP_ONLYorHTTPS_ONLY—for theyour-http-service-policyplaceholder in the following code snippet.For Kerberized clusters, the HTTP service policy is usually
HTTPS_ONLY. But you should check with your cluster administrator or look for the value of theyarn.http.policyproperty in the cluster’syarn-site.xmlfile or the Hadoop configuration.<property> <name>pepperdata.agent.yarn.http.authentication.type</name> <value>kerberos</value> </property> <property> <name>pepperdata.agent.yarn.http.policy</name> <value>your-http-service-policy</value> </property>Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint, after you edit any .xml configuration file. -
Save your changes and close the file.
-
Task 3: (Basic Access Authentication) Add BA Authentication Credentials
For Basic access (BA) authentication, add the BA authentication credentials for the monitored applications’ servers to the Pepperdata configuration.
Procedure
-
Open the
pepperdata-config.shfile for editing. -
Add the required environment variables. Be sure to substitute your user name and password for the
your-usernameandyour-passwordplaceholders. (The same environment variables are used to configure the BA authentication credentials for the ResourceManager, MapReduce Job History Server, and YARN Timeline Server.)# For ResourceManager, MapReduce Job History Server, YARN Timeline Server export PD_AGENT_SIMPLE_OR_BASIC_AUTH_USERNAME=your-username export PD_AGENT_BASIC_AUTH_PASSWORD=your-password # For Spark History Server export PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_USERNAME=your-username export PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_PASSWORD=your-password -
Save your changes and close the file.
Task 4: Activate Application History Monitoring
On the MapReduce Job History Server host, start/restart the PepAgent.
You can use either the service (if provided by your OS) or systemctl command:
sudo service pepagentd restartsudo systemctl restart pepagentd
Task 5: Access the Application Profiler on the Pepperdata Dashboard
The Application Profiler interface is integrated into the Pepperdata Dashboard.
The Applications and Recommendations sections of the dashboard Cluster View show pertinent data for every application that is monitored by Application Profiler.
- To view detailed metrics of a highlighted application, click its link in a tile.
- To view the table of applications that have Pepperdata recommendations of a given severity, click that severity in the applicable Recommendations tile (for the app type).
- To view data for all monitored applications, use the left-nav menu, and select App Spotlight > Application Profiler.
Task 6: (Hadoop 2) Confirm Near Real-Time Data Collection
To confirm that the Application Profiler is correctly configured for near real-time monitoring in Hadoop 2, view the data collection process stats (MapReduce Job History Server retrieval).
Be sure to replace the your-jobhistory-server-host placeholder with the URL of your actual MapReduce Job History Server.
http://your-jobhistory-server-host:50505/JobHistoryMonitor.