Configure Pepperdata Logs Retention and Disk Usage (RPM/DEB)

By default, Pepperdata sets a disk usage cap (PD_MAX_LOG_DIR_SIZE) of 5 GB as the maximum size for its accumulated metrics and message log files. So long as this cap is not reached, Pepperdata retains log files for seven (7) days (PD_MAX_LOG_AGE_DAYS) before deleting them, and the Pepperdata Collector (the pepcollectd daemon) uploads data that is up to seven (7) days old (PD_LOG_PROC_MAX_AGE_DAYS). When the disk usage cap is reached, Pepperdata deletes enough log files, starting with the oldest ones, to reduce the disk usage to less than the cap.

Although the age caps—limits on how long log files are eligible for uploading and when they’re ready for deletion—can be important for business requirements such as retaining sensitive files for a given amount of time or for custom processing, the PD_MAX_LOG_DIR_SIZE size cap is the appropriate focus for controlling disk usage.

To override the default disk usage cap and/or log retention policies, you can add any of the following environment variables to the Pepperdata configuration. For RPM/DEB-based installations, add the environment variables to the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh. For Parcel for Cloudera/Cloudera Manager-based installations/management, add the environment variables to the appropriate Cloudera Manager template. See the procedure for details.

  • PD_LOG_DIR: (default=/var/log/pepperdata) Directory to which Pepperdata writes its log files.

  • PD_MAX_LOG_DIR_SIZE: (default=5368709120, which is 5 GB) Size cap (maximum total size), in bytes, of all the log files in the directory specified by the PD_LOG_DIR environment variable (default=/var/log/pepperdata).

    When the PepAgent (the pepagentd daemon) starts, it verifies that there is sufficient capacity on the partition where PD_LOG_DIR is located. If the capacity is less than PD_MAX_LOG_DIR_SIZE, the PepAgent will not start.

  • PD_MAX_LOG_AGE_DAYS: (default=the value of PD_LOG_PROC_MAX_AGE_DAYS) Number of days a log file is retained before Pepperdata deletes it.

  • PD_LOG_PROC_MAX_AGE_DAYS: (default=7) Maximum age of a log file that the Pepperdata Collector (the pepcollectd daemon) will upload to the Pepperdata dashboard.

    If you lose connectivity to Pepperdata for longer than the PD_LOG_PROC_MAX_AGE_DAYS value, pepcollectd will be unable to upload the log file before it exceeds PD_LOG_PROC_MAX_AGE_DAYS, and the log file’s data will be lost.

  • PD_ARCHIVE_DIR: (no default) Directory in which to archive old log files instead of deleting them when they exceed the maximum age (the PD_LOG_PROC_MAX_AGE_DAYS environment variable value). Not applicable unless the PD_CLEAN_LOG_DIR environment variable is enabled (its value set to 1).

  • PD_CLEAN_LOG_DIR: (default=1/enabled) Enable/disable Pepperdata from cleaning (deleting or archiving) its log files.

Procedure

  1. Add the environment variables that you want to configure.

    1. On any host in the cluster, open the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh, for editing.

    2. Add any of the disk usage environment variables, in the following format. Be sure to replace THE-VARIABLE-NAME and the-variable-value with the actual environment variable’s name and value.

      export THE-VARIABLE-NAME=the-variable-value

    3. Save your changes and close the file.

    4. (Only for the PD_LOG_DIR environment variable) Add the associated property, pepperdata.log.baseDir, to the Pepperdata site file.

      By default, the Pepperdata site file, pepperdata-site.xml, is located in /etc/pepperdata. If you customized the location, the file is specified by the PD_CONF_DIR environment variable. See Change the Location of pepperdata-site.xml for details.

      Be sure that you set the logging directory environment variable and property to the same location. If the locations do not match, not all metrics are sent to Pepperdata, and not all metric log files will be deleted or archived.

      <property>
        <name>pepperdata.log.baseDir</name>
        <value>your/pepperdata/log/dir</value>
      </property>
      
    5. (Only for the PD_LOG_DIR environment variable) Restart the ResourceManagers and NodeManagers.

  2. On every host in the cluster, restart the PepCollector and PepAgent services.

    Although restarting the PepAgent is optional, we recommend restarting it to prevent:

    • Subsequent, difficult-to-diagnose Pepperdata startup failure. When the pepagentd daemon starts, it verifies that there is sufficient capacity on the partition where PD_LOG_DIR is located. If the capacity is less than the PD_MAX_LOG_DIR_SIZE size cap, the PepAgent will not start.

    • Disk write errors. If you lowered the PD_MAX_LOG_DIR_SIZE size cap because the associated partition’s capacity was reduced for any reason, but did not restart the PepAgent, the PepAgent could attempt to write a log file when there is insufficient capacity. The result would be a runtime disk write error.

    1. Restart the Pepperdata Collector.

      You can use either the service (if provided by your OS) or systemctl command:

      • sudo service pepcollectd restart
      • sudo systemctl restart pepcollectd
    1. Restart the PepAgent.

      You can use either the service (if provided by your OS) or systemctl command:

      • sudo service pepagentd restart
      • sudo systemctl restart pepagentd