Configure Pepperdata Logs Retention and Disk Usage (Parcel)

By default, Pepperdata sets a disk usage cap (PD_MAX_LOG_DIR_SIZE) of 5 GB as the maximum size for its accumulated metrics and message log files. So long as this cap is not reached, Pepperdata retains log files for seven (7) days (PD_MAX_LOG_AGE_DAYS) before deleting them, and the Pepperdata Collector (the pepcollectd daemon) uploads data that is up to seven (7) days old (PD_LOG_PROC_MAX_AGE_DAYS). When the disk usage cap is reached, Pepperdata deletes enough log files, starting with the oldest ones, to reduce the disk usage to less than the cap.

Although the age caps—limits on how long log files are eligible for uploading and when they’re ready for deletion—can be important for business requirements such as retaining sensitive files for a given amount of time or for custom processing, the PD_MAX_LOG_DIR_SIZE size cap is the appropriate focus for controlling disk usage.

To override the default disk usage cap and/or log retention policies, you can add any of the following environment variables to the Pepperdata configuration. For RPM/DEB-based installations, add the environment variables to the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh. For Parcel for Cloudera/Cloudera Manager-based installations/management, add the environment variables to the appropriate Cloudera Manager template. See the procedure for details.

  • PD_LOG_DIR: (default=/var/log/pepperdata) Directory to which Pepperdata writes its log files.

  • PD_MAX_LOG_DIR_SIZE: (default=5368709120, which is 5 GB) Size cap (maximum total size), in bytes, of all the log files in the directory specified by the PD_LOG_DIR environment variable (default=/var/log/pepperdata).

    When the PepAgent (the pepagentd daemon) starts, it verifies that there is sufficient capacity on the partition where PD_LOG_DIR is located. If the capacity is less than PD_MAX_LOG_DIR_SIZE, the PepAgent will not start.

  • PD_MAX_LOG_AGE_DAYS: (default=the value of PD_LOG_PROC_MAX_AGE_DAYS) Number of days a log file is retained before Pepperdata deletes it.

  • PD_LOG_PROC_MAX_AGE_DAYS: (default=7) Maximum age of a log file that the Pepperdata Collector (the pepcollectd daemon) will upload to the Pepperdata dashboard.

    If you lose connectivity to Pepperdata for longer than the PD_LOG_PROC_MAX_AGE_DAYS value, pepcollectd will be unable to upload the log file before it exceeds PD_LOG_PROC_MAX_AGE_DAYS, and the log file’s data will be lost.

  • PD_ARCHIVE_DIR: (no default) Directory in which to archive old log files instead of deleting them when they exceed the maximum age (the PD_LOG_PROC_MAX_AGE_DAYS environment variable value). Not applicable unless the PD_CLEAN_LOG_DIR environment variable is enabled (its value set to 1).

  • PD_CLEAN_LOG_DIR: (default=1/enabled) Enable/disable Pepperdata from cleaning (deleting or archiving) its log files.

Procedure

  1. Add the environment variables that you want to configure.

    1. Use Cloudera Manager to add any of the disk usage environment variables to the Pepperdata > Configuration > Pepperdata Service Environment Advanced Configuration Snippet (Safety Valve) template.

      Add the environment variables in the following format. Be sure to replace THE-VARIABLE-NAME and the-variable-value with the actual environment variable’s name and value.

      THE-VARIABLE-NAME=the-variable-value
      
    2. (Only for the PD_LOG_DIR environment variable) Use Cloudera Manager to revise the value of the Pepperdata > Configuration > Pepperdata log base directory configuration.

    3. (Only if you changed the PD_LOG_DIR environment variable) In Cloudera Manager, navigate to your cluster’s YARN (MR2 Included) service > Instances, select all ResourceManager and NodeManager hosts, and in the Actions for Selected, select Restart.

  2. Restart the Pepperdata services.

    In Cloudera Manager, select the Restart action for the Pepperdata service.