Syntax: Program Matching Rules

To write program matching rules, which Pepperdata compares to every process that’s running, you need to know the required YAML elements. It’s beneficial to know what pid files the programs that you want to monitor generate, and to be aware of the process names of other programs that are running on your hosts so you can know what differentiates their names from the names of custom programs you want to monitor.

We recommend that you study the examples and use them as models for your program matching rules.

Example: Annotated Resource Manager Matching Rule

This example shows a typical program matching rule, for the Resource Manager. The rule matches any program pid whose launch command matches any (one or more) of the following conditions:

  • Contains substring “-Dproc_resourcemanager” AND whose pid matches the process id string in the /var/run/hadoop-yarn/yarn-yarn-resourcemanager.pid file

  • Contains substring “ResourceManager”

  • Contains regex “Dproc_resourcemanager.*”

programs:
    ResourceManager: # program label name
          active: yes # optional and can be overridden.
          rules: # list of rules to match against a single program.
            - command-match:
                substring: "-Dproc_resourcemanager"
               pid-locations: # one or more locations for pid file
                  - /var/run/hadoop-yarn/yarn-yarn-resourcemanager.pid
            - command-match:
                substring: "ResourceManager"
            - command-match:""
                regex: "-Dproc_resourcemanager.*"

When Pepperdata finds a process that matches this program matching rule, PepAgent monitors the process throughout its lifetime and displays it in the Pepperdata dashboard, with the label name, “ResourceManager”.

Override Preconfigured Program Monitoring

By default, Pepperdata software is preconfigured to monitor Impala®, Apache Spark History Server, and MapReduce Job History Server processes. If you do not want to monitor these programs you can override their program matching rules in your own yaml file for program matching rules.

The program matching rules for the preconfigured program monitoring are in the /opt/pepperdata/supervisor/lib/pepagent-program-monitor-config-default.yaml file. Do not edit this file, but use it as a reference to find the applicable configuration labels. For this file’s listing, see Preconfigured Custom Program Monitoring.

To configure your overrides, create rules in your custom rules file—the file specified by the pepperdata.agent.program.monitor.configPath property—as follows:

  • To deactivate program matching for a given label, add the label to your custom rules file, and assign the active key a value of “no”.

  • To replace the default matching rules with custom rules, add the label and a rules dictionary to your custom rules file. Your rules completely replace the default rules so that the default rules are not applied.

  • To add rules for a preconfigured program label, add the label to your custom rules file, and add the new rules to an add-rules key. The syntax for the add-rules key is identical to that of the rules key. Your rules are added to the default rules.

YAML Sections: Program Matching Rules

The following yaml file snippet shows the structure for program matching rules. Unless labeled optional, keys are required.

programs: # top level key for the dictionary of all program matching rules
    NodeManager: # unique program label name, in pattern \\w+
      active: yes # (optional, default=yes; can be overridden). Takes yes|no values
      rules: # One or more rules. Multiple rules are ORed.
        - pid-locations: # (optional); one or more pid location files.
            - /var/run/hadoop-yarn/yarn-yarn-nodemanager.pid
          command-match: # Command match type: substring|regex
            substring: "-Dproc_nodemanager"
          ignore-match: # (optional); match type: substring|regex
            substring: "container" # ignore matches for child processes forked by NodeManager
        - command-match:
            substring: "-Dproc_nodemanager"
          ignore-match: # (optional); match type: substring|regex
            regex: "container" # ignore matches for child processes forked by NodeManager

The structure includes the following (all keys are required unless labeled optional):

  • programs: Top level dictionary; “programs” specifies Pepperdata custom program matching.

  • Each entry in programs is a program monitoring definition. A program monitoring definition is identified by a label of the form \\w++ (from Java Regex definition constructs ). The label must be unique within the programs dictionary. In this example, the label is “NodeManager”.

  • Each label has the following keys:
    • (optional; default=yes) active: yes|no
    • rules: <list of program matching rules>
  • Each program matching rule is a dictionary with the following keys:
    • (optional) pid-locations: One or more pid files that might contain the pid to monitor. If the program daemon does not generate pid files, do not include this key.
    • command-match: One or more dictionaries that contain a substring or regex to match.
    • (optional) ignore-match: One or more dictionaries that contain a substring or regex that, if matched, removes the child (forked process) match from the result set.
  • Each command-match key is a dictionary with the following keys:
    • [ regex: a regex pattern in Java regex syntax | substring: case-sensitive character string ]
    • (optional) ignore-match: One or more dictionaries that contain a substring or regex to match.
  • Each ignore-match key has one of the following keys to specify a substring or regex that, if matched, removes the child match from the result set:

Best Practices: Writing Program Matching Rules

There is so much flexibility when writing program matching rules that it’s easy to end up monitoring more programs than you intended or with data from so many programs that it’s difficult to find what you’re looking for on the Pepperdata dashboard. But by following a few best practices around naming, command matching, and referencing appropriate pid files, you can ensure an easy-to-use result.

  • Choose program label names that are easy to understand and unambiguous. Remember that the labels appear in the Pepperdata dashboard.

  • If there is a program pid file for the program that you want to monitor, specify it in the rule. This enables PepAgent to take an optimized code path for faster matching as it scans the process tree.

  • Use as specific a matching rule as possible for your program of interest to ensure that your matching rules do not match processes that you do not want to monitor.

  • Use the custom program matching rules linter in the Pepperdata package to ensure that your rules are valid; see Verify and Validate Program Matching Rules.