(Application Spotlight) Configure Application Profiler (Cloud)
On This Page
- Prerequisites
- Supported Authentication Protocols for Application Profiler
- Task 1: Configure Pepperdata to Monitor Application History
- Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication
- Task 3: (Basic Access Authentication) Add BA Authentication Credentials
- Task 4: Access the Application Profiler on the Pepperdata Dashboard
- Post Requisite: (Hadoop 2) Confirm Near Real-Time Data Collection
Prerequisites
Before you begin configuring Application Profiler, ensure that your system meets the required prerequisites.
- Pepperdata must be installed on the host running the MapReduce Job History Server
- MapReduce Job History Server must be running
- (Spark Monitoring) Spark History Server must be running
- Your cluster uses a supported authentication protocol; see Supported Authentication Protocols for Application Profiler, below
Supported Authentication Protocols for Application Profiler
To enable Application Profiler to fetch application data from the MapReduce Job History Server/Spark History Server, your cluster must use a Pepperdata-supported authentication protocol:
-
No authentication.
-
Pseudo auth (also known as Hadoop’s simple authentication)—the server authenticates requests based on the
user.name
query string parameter contained in the request. -
Kerberos.
-
Basic access (BA) authentication—uses standard fields in the HTTP header to specify the user name and password; for details, see https://en.wikipedia.org/wiki/Basic_access_authentication .
Task 1: Configure Pepperdata to Monitor Application History
Procedure
-
Download a copy of your existing cluster-level Pepperdata configuration file,
pepperdata-config.sh
, from the environment’s cluster configuration folder (in the cloud) to a location where you can edit it. -
Open the file for editing, and revise it as follows.
-
Modify the value of
PD_JOBHISTORY_MONITOR_ENABLED
to1
. -
To enable Spark application monitoring, add the configuration according to your environment.
Note: If you’re using Application Profiler to fetch history data for Spark apps, you can customize the connection timeout value and/or add a second Spark History Server for monitoring. See Configure Spark History Servers.-
If the
spark-defaults.conf
file contains the correct assignment forspark.yarn.historyServer.address
for the first (or only) Spark History Server, configure theSPARK_CONF_DIR
environment variable to match:export SPARK_CONF_DIR=your-path-to-first-spark-conf-directory
Where
your-path-to-first-spark-conf-directory
is the directory that contains thespark-defaults.conf
file. -
If the
spark-defaults.conf
file does not includespark.yarn.historyServer.address
or its value is incorrect, and you can edit thespark-defaults.conf
file:-
Edit the
spark-defaults.conf
file so that it includes the correct assignment forspark.yarn.historyServer.address
for the first Spark History Server. -
In the
pepperdata-config.sh
file, configure theSPARK_CONF_DIR
environment variable to match:export SPARK_CONF_DIR=your-path-to-first-spark-conf-directory
, whereyour-path-to-first-spark-conf-directory
is the directory that contains thespark-defaults.conf
file.
-
-
For all other cases, edit the
pepperdata-config.sh
file to include thePD_SPARK_HISTORY_SERVER_ADDRESS
environment variable, and set its value to the first (or only) Spark History Server’s fully-qualified URL.
-
Example of modifications to the pepperdata-config.sh file
export PD_JOBHISTORY_MONITOR_ENABLED=1 # For Spark Application Monitoring export SPARK_CONF_DIR=your-path-to-spark-conf-directory # Or, if the spark-defaults.conf file does not contain the correct assignment # for the spark.yarn.historyServer.address, and you cannot edit it: # export PD_SPARK_HISTORY_SERVER_ADDRESS=http(s)://url-to-your-spark-history:port
-
-
Save your changes and close the file.
-
Upload the revised file to the cluster configuration folder to overwrite the original
pepperdata-config.sh
file.
Task 2: (Kerberized Clusters) Configure HTTP/HTTPS Endpoint Authentication
If the core services of the ResourceManagers and the MapReduce Job History Server are Kerberized (secured with Kerberos), add the authentication type for the auxiliary HTTP/HTTPS endpoint service to the Pepperdata configuration file, pepperdata-config.sh
.
kerberos
authentication, the auxiliary services, such as HTTP/HTTPS, can use either simple
or kerberos
authentication.Prerequisites
-
Be sure that the Kerberos principal has access to the ResourceManager and MapReduce Job History Server endpoints (HTTP or HTTPS).
-
Be sure that you added the required environment variables—
PD_AGENT_PRINCIPAL
andPD_AGENT_KEYTAB_LOCATION
—to the Pepperdata configuration during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication).
Procedure
-
For your Kerberized ResourceManager host, determine its authentication type by logging in to the host and running the following cURL command, where
{your-protocol}
ishttp
orhttps
:curl --tlsv1.2 -kI {your-protocol}://RM_HOST:PORT/ws/v1/cluster/info | grep WWW-Authenticate
- If the returned response is
WWW-Authenticate: Negotiate
, the authentication type (your-rm-auth-type
) iskerberos
. - Otherwise nothing is returned, and the authentication type (
your-rm-auth-type
) issimple
.
- If the returned response is
-
For your Kerberized MapReduce Job History Server host, determine its authentication type by logging in to the host and running the following cURL command, where
{your-protocol}
ishttp
orhttps
:curl --tlsv1.2 -kI {your-protocol}://JHS_HOST:PORT/ws/v1/history | grep WWW-Authenticate
- If the returned response is
WWW-Authenticate: Negotiate
, the authentication type (your-jhs-auth-type
) iskerberos
. - Otherwise nothing is returned, and the authentication type (
your-jhs-auth-type
) issimple
.
- If the returned response is
-
For your Kerberized YARN Timeline Server host, determine its authentication type by logging in to the host and running the following cURL command, where
{your-protocol}
ishttp
orhttps
:curl --tlsv1.2 -kI {your-protocol}://TIMELINE_SERVER_HOST:PORT/ws/v1/timeline | grep WWW-Authenticate
- If the returned response is
WWW-Authenticate: Negotiate
, the authentication type (your-timeline-server-auth-type
) iskerberos
. - Otherwise nothing is returned, and the authentication type (
your-timeline-server-auth-type
) issimple
.
- If the returned response is
-
On the MapReduce Job History Server, add the environment variables for the HTTP/HTTPS endpoint’s authentication type for the ResourceManager, and the MapReduce Job History Server.
-
Log in to the MapReduce Job History Server host, and download a copy of its existing Pepperdata configuration file,
pepperdata-config.sh
, to a location where you can edit it. -
Open the file for editing, and add the required environment variables.
Be sure to substitute the authentication type (
simple
orkerberos
, as you determined in the previous step) for theyour-rm-auth-type
,your-jhs-auth-type
, andyour-timeline-server-auth-type
placeholders.# For ResourceManager: export PD_JOBHISTORY_RESOURCE_MANAGER_HTTP_AUTH_TYPE=your-rm-auth-type # For MapReduce Job History Server: export PD_JOBHISTORY_MR_HISTORY_SERVER_HTTP_AUTH_TYPE=your-jhs-auth-type
-
Save your changes and close the file.
-
Upload the revised file to overwrite the original
pepperdata-config.sh
file.
-
-
Revise the Pepperdata configuration that is used for future hosts.
-
Download a copy of your existing cluster-level Pepperdata configuration file,
pepperdata-config.sh
, from the environment’s cluster configuration folder (in the cloud) to a location where you can edit it. -
Open the file for editing, and add the required environment variables.
Be sure to substitute the authentication type (
simple
orkerberos
, as you determined in the previous step) for theyour-rm-auth-type
,your-jhs-auth-type
, andyour-timeline-server-auth-type
placeholders.# For ResourceManager: export PD_JOBHISTORY_RESOURCE_MANAGER_HTTP_AUTH_TYPE=your-rm-auth-type # For MapReduce Job History Server: export PD_JOBHISTORY_MR_HISTORY_SERVER_HTTP_AUTH_TYPE=your-jhs-auth-type
-
Save your changes and close the file.
-
Upload the revised file to overwrite the original cluster-level Pepperdata site file,
pepperdata-config.sh
.
-
-
(YARN 3.x) For YARN 3.x environments (which typically align with Hadoop 3.x-based distros such as EMR 6.x), add authentication properties to the Pepperdata configuration to enable REST access.
Note: If you already configured the authentication properties during the installation process, you do not need to do so again, and you should skip this procedure now.-
Log in to the ResourceManager host, and download a copy of the host’s existing Pepperdata site file,
pepperdata-site.xml
, from the environment’s cluster configuration folder (in the cloud) to a location where you can edit it. -
Open the file for editing, and add the required properties.
Be sure to substitute your HTTP service policy—
HTTP_ONLY
orHTTPS_ONLY
—for theyour-http-service-policy
placeholder in the following code snippet.For Kerberized clusters, the HTTP service policy is usually
HTTPS_ONLY
. But you should check with your cluster administrator or look for the value of theyarn.http.policy
property in the cluster’syarn-site.xml
file or the Hadoop configuration.<property> <name>pepperdata.agent.yarn.http.authentication.type</name> <value>kerberos</value> </property> <property> <name>pepperdata.agent.yarn.http.policy</name> <value>your-http-service-policy</value> </property>
Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint
, after you edit any .xml configuration file. -
Save your changes and close the file.
-
Upload the revised file to overwrite the original
pepperdata-site.xml
file.
-
Task 3: (Basic Access Authentication) Add BA Authentication Credentials
For Basic access (BA) authentication, add the BA authentication credentials for the monitored applications’ servers to the Pepperdata configuration.
Procedure
-
Log in to the MapReduce Job History server host.
-
Download a copy of its existing Pepperdata configuration file,
pepperdata-config.sh
, to a location where you can edit it. -
Open the file for editing, and add the required environment variables. Be sure to substitute your user name and password for the
your-username
andyour-password
placeholders. (The same environment variables are used to configure the BA authentication credentials for the ResourceManager, MapReduce Job History Server, and YARN Timeline Server.)# For ResourceManager, MapReduce Job History Server, YARN Timeline Server export PD_AGENT_SIMPLE_OR_BASIC_AUTH_USERNAME=your-username export PD_AGENT_BASIC_AUTH_PASSWORD=your-password # For Spark History Server export PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_USERNAME=your-username export PD_JOBHISTORY_SPARK_HISTORY_BASIC_AUTH_PASSWORD=your-password
-
Save your changes and close the file.
-
Upload the revised file to overwrite the original
pepperdata-config.sh
file.
Task 4: Access the Application Profiler on the Pepperdata Dashboard
The Application Profiler interface is integrated into the Pepperdata Dashboard.
The Applications and Recommendations sections of the dashboard Cluster View show pertinent data for every application that is monitored by Application Profiler.
- To view detailed metrics of a highlighted application, click its link in a tile.
- To view the table of applications that have Pepperdata recommendations of a given severity, click that severity in the applicable Recommendations tile (for the app type).
- To view data for all monitored applications, use the left-nav menu, and select App Spotlight > Application Profiler.
Post Requisite: (Hadoop 2) Confirm Near Real-Time Data Collection
After you finish configuring and customizing the cluster and bootstrapping it, you can confirm that Application Profiler is correctly configured for near real-time monitoring in Hadoop 2 (for example, in Amazon EMR 5.x or Google Dataproc 1.3–1.5) by viewing the data collection process stats (MapReduce Job History Server retrieval).
Be sure to replace the your-jobhistory-server-host
placeholder with the URL of your actual MapReduce Job History Server.
http://your-jobhistory-server-host:50505/JobHistoryMonitor
.