Run Pepperdata as a Non-Root User (Cloud)

If your organization requires that everything be run under the principle of least privilege (PoLP), you can run Pepperdata as a non-root user—a user who lacks root access to the cluster hosts. To change from the default root user to another user, stop the Pepperdata services, remove the default log directory, change the PD_USER variable in the Pepperdata configuration file, pepperdata-config.sh, and restart the Pepperdata services.

Important: Some I/O, CPU, and network metrics collection require privileged access. Therefore, when you run Pepperdata as a non-root user, Pepperdata is unable to collect the data that requires privileged access.

On This Page

Uncollected Metrics with a Non-Root User
Reconfigure the Pepperdata User

Uncollected Metrics with a Non-Root User

When you run Pepperdata as a non-root user, some I/O, CPU, and network metrics are not collected. The table lists the uncollected metrics by their display name in the dashboard and by the underlying metric name.

Display Name	Variable Name
cpu stat nice max	`t_rscpunmx`
cpu stat nice min	`t_rscpunmn`
File descriptor stat file descriptor count	`t_rscfdc`
I/O stat requested read bytes	`t_rsciorb`
I/O stat requested read syscalls	`t_rscioscr`
I/O stat requested syscall writes	`t_rscioscw`
I/O stat storage write bytes	`t_rsciowb`
I/O stat requested write bytes	`t_rsciowc`
socket stat number of sockets	`t_rscsss`

Reconfigure the Pepperdata User

The Pepperdata user is configured in the Pepperdata configuration.

Procedure

Ensure that the /var/log/pepperdata has the necessary access privileges for the new non-root user.
- If you have already been running Pepperdata as the root user, and you want to retain logging data ensure that there are no gaps in metrics coverage, change the log file’s privileges to enable access by the new, non-root user.
  
  Be sure to replace the new-pd-user placeholder with your actual user name.
  
  find /var/log/pepperdata -user root -exec chown new-pd-user {} \;
- If you do not want to retain the logging data, remove the log file. When Pepperdata starts and this log file does not exist, it creates the file with the necessary privileges.
  
  rm /var/log/pepperdata
In your cloud environment (such as GDP or AWS), reconfigure the Pepperdata user.
1. From the environment’s cluster configuration folder (in the cloud), download the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh, to a location where you can edit it.
2. Open the file for editing, find the PD_USER environment variable, and change its value.
  
  Be sure to replace the new-pd-user placeholder with your actual user name.
```
export PD_USER=new-pd-user
```
3. Save your changes and close the file.
4. Upload the revised file to overwrite the original pepperdata-config.sh file.
If there are no already-running hosts with Pepperdata, you are done with this procedure. Do not perform the remaining steps.
Open a command shell (terminal session) and log in to any already-running host as a user with sudo privileges.

Important: You can begin with any host on which Pepperdata is running, but be sure to repeat the login (this step), copying the bootstrap file (next step), and loading the revised Pepperdata configuration (the following step) on every already-running host.
From the command line, copy the Pepperdata bootstrap script that you extracted from the Pepperdata package from its local location to any location; in this procedure’s steps, we’ve copied it to /tmp.
- For Amazon EMR clusters:
  
  aws s3 cp s3://<pd-bootstrap-script-from-install-packages> /tmp/bootstrap
- For Google Dataproc clusters:
  
  sudo gsutil cp gs://<pd-bootstrap-script-from-install-packages> /tmp/bootstrap
Load the revised configuration by running the Pepperdata bootstrap script.
- For EMR clusters:
  - You can use the --long-options form of the --bucket, --upload-realm, and --is-running arguments as shown or their -short-option equivalents, -b, -u, and -r.
  - The --is-running (-r) option is required for bootstrapping an already-running host prior to Supervisor version 7.0.13.
  - Optionally, you can specify a proxy server for the AWS Command Line Interface (CLI) and Pepperdata-enabled cluster hosts.
    
    Include the --proxy-address argument when running the Pepperdata bootstrap script, specifying its value as a fully-qualified host address that uses https protocol.
  - If you’re using a non-default EMR API endpoint (by using the --endpoint-url argument), include the --emr-api-endpoint argument when running the Pepperdata bootstrap script. Its value must be a fully-qualified host address. (It can use http or https protocol.)
  - If you are using a script from an earlier Supervisor version that has the --cluster or -c arguments instead of the --upload-realm or -u arguments (which were introduced in Supervisor v6.5), respectively, you can continue using the script and its old arguments. They are backward compatible.
  - Optionally, you can override the default exponential backoff and jitter retry logic for the describe-cluster command that the Pepperdata bootstrapping uses to retrieve the cluster’s metadata.
    
    Specify either or both of the following options in the bootstrap’s Optional arguments. Be sure to substitute your values for the <my-retries> and <my-timeout> placeholders that are shown in the command.
    - max-retry-attempts—(default=10) Maximum number of retry attempts to make after the initial describe-cluster call.
    - max-timeout—(default=60) Maximum number of seconds to wait before the next retry call to describe-cluster. The actual wait time for a given retry is assigned as a random number, 1–calculated timeout (inclusive), which introduces the desired jitter.
```
# For Supervisor versions before 7.0.13:
sudo bash /tmp/bootstrap --bucket <bucket-name> --upload-realm <realm-name> --is-running [--proxy-address <proxy-url:proxy-port>] [--emr-api-endpoint <endpoint-url:endpoint-port>] [--max-retry-attempts <my-retries>] [--max-timeout <my-timeout>]
   
# For Supervisor versions 7.0.13 and later:
sudo bash /tmp/bootstrap --bucket <bucket-name> --upload-realm <realm-name> [--proxy-address <proxy-url:proxy-port>] [--emr-api-endpoint <endpoint-url:endpoint-port>] [--max-retry-attempts <my-retries>] [--max-timeout <my-timeout>]
```
- For Dataproc clusters:
  
  sudo bash /tmp/bootstrap <bucket-name> <realm-name>
The script finishes with a Pepperdata installation succeeded message.
Repeat steps 3–5 on every already-running host in your cluster.