Monitoring Pepperdata

Pepperdata processes provide a rich set of stats via the ResourceManager’s JMX service and by writing detailed operations information to log files. By using a Web browser, you can access all the published stats, and you can view the log files in the same manner as any program’s logged information.

Supervisor Stats

The Supervisor—the schedule instrumentation enabler that runs in the ResourceManager—reports stats via the ResourceManager’s JMX service (typically accessible at http://<resourcemanagerhost>:8088/jmx). The Pepperdata stats are under the name hadoop:service=PepperdataSupervisor,name=PepperdataSupervisorInfo, and include the following:

WorkerQueue
Statistics for the Pepperdata worker thread inside the ResourceManager that schedules the running tasks.

RpcSends
Counters showing attempts, failures, successes, and usage for Supervisor RPC Message Sends.

JobGroups
Content of the job groups.

JobGroupUsages
Counters showing attempts, failures, successes, and usage for each job group. The usage counter indicates the number of tasks that use the job group.
The counters are reset whenever the Supervisor is restarted or refreshed.

SelfTestRuns
Logged: Total: total # of self-test runs and their complete success/failure, and any components of the self-test and their individual attempts/failures/successes.
Success of self-tests. Because self-tests run only on startup or refresh, these numbers should be small.

Pepperdata Log Files

By default, the Supervisor and PepAgent agents write metrics data to /var/log/pepperdata/metrics. (You can customize the location by changing the PD_LOG_DIR environment variable; see Configure Pepperdata Logs Retention and Disk Usage.) Because the files in these directories typically grow quite large (up to several GB per week), we recommend that you remove logs that are over a month old.

The files contain Google protobuf contents compressed in bzip2. The file names end with .proto.bz2. One method for viewing the log files’ content is to use the Pepperdata print_proto_metrics_file utility. For example:

/opt/pepperdata/supervisor/print_proto_metrics_file pd_task_metrics.20140227_131948_601-1393535988601-pep08.pepperdata.com.proto.bz2

Pepperdata Status Views via Web Servlets

Pepperdata PepAgent and Supervisor agents provide near real-time monitoring views via Web (HTTP/HTTPS) servlets. (To enable SSL support, see Configure SSL Near Real-Time Monitoring on Ports 50510 and 50505.)

If you do not want the Pepperdata Status Views to be available to all your Pepperdata users, you can disable the views; see Disable Pepperdata Status Views.

ResourceManager Views

To monitor an area of interest, use a Web browser to access the associated servlet. In the following URLs, change hostname to the hostname of the ResourceManager, and replace {your-protocol} with http or https as applicable.

  • Controls{your-protocol}://hostname:50510/Controls. For each task running in the cluster, shows the current control limits for each resource that the Supervisor sends to the PepAgent in control of the task.

  • Job Groups—{your-protocol}://hostname:50510/JobGroups. Shows the active job group settings.

  • Cluster Status{your-protocol}://hostname:50510/Cluster. Shows basic status of the PepAgent cluster, as viewed by the ResourceManager.

  • Allocation{your-protocol}://hostname:50510/Allocation. Shows the CPU and memory allocations on all NodeManagers.

  • RPC{your-protocol}://hostname:50510/Rpc. Describes metrics related to the communication between the Supervisor and PepAgents.

NodeManager Views

For the following PepAgent servlet, change hostname to the hostname of any NodeManager that is running a PepAgent. (In most cases, PepAgent runs on all the NodeManagers in a cluster.)

  • Task Metrics{your-protocol}://hostname:50505/TaskMetrics. Reports resources used by all running containers on the associated NodeManager host, including memory, CPU, and Hadoop file counters.