FS
Documentation

Configuring Sentries and Agents

From Documentation

(Difference between revisions)
Jump to: navigation, search
Revision as of 04:15, 28 September 2006
Daniels (Talk | contribs)
(Agent options for LogFile class)
← Previous diff
Revision as of 04:15, 28 September 2006
Daniels (Talk | contribs)
(Agent options for LogFile class)
Next diff →
Line 145: Line 145:
Here is a fragment from a log file, showing one record of five lines. Here is a fragment from a log file, showing one record of five lines.
-<code>+<pre>
INFILE:/data/MFLA/2001Q2/DACCin/reg3.dat INFILE:/data/MFLA/2001Q2/DACCin/reg3.dat
- 
Validating file...... Validating file......
- 
DACCedit V4.1 © 1991–99 TransDACC Ltd. DACCedit V4.1 © 1991–99 TransDACC Ltd.
- 
17 transactions flagged 17 transactions flagged
- 
1128 transactions passed 1128 transactions passed
-</code>+</pre>

Revision as of 04:15, 28 September 2006

This chapter describes how to define sentries to monitor resources and respond to events. The main procedures, in order, are:

The first topic lists some questions you should look into when preparing to configure sentries.



Note
You must have the Admin or Manager role to configure sentries.


Contents

Planning your Sentry

When preparing to build sentries it is useful to consider these questions:

What object or resource do you wish to monitor?

What is the purpose of monitoring this resource:

How can the status of the resource be queried:

Does running one command return data about several instances of an object or resource? Multiple instances imply a ‘cloning’ sentry, in which case one of the attributes or variables that uniquely identifies each instance must be designated as the key column. Will every instance be monitored, or should selected instances be filtered out?

What part of the output from the agent do you wish to capture?

How can a message of interest be identified (e.g., by its position? by searching for a unique string or pattern)? How can portions of the data destined to be saved in variables be separated from the rest of the output? Can the output be simplified by discarding header lines or a message prefix?

Will there be ‘spikes’ or gaps in the data that must be allowed for? In other words, when the sentry tests a variable, is it sufficient to test just the current value, or would it be more realistic to take an average of several recent values?

How often does the agent need to collect data? You may need to consider a tradeoff between running the command too frequently and affecting system performance, and not running it frequently enough, in which case the information provided may not be current.

What agent data (that is, variables) do you want to appear on the console? How should they be formatted?

What are the thresholds or triggers that cause a sentry to move from one state to another? If the sentry remains in a state for a long time, does that indicate that the problem is getting more serious? Or that the problem may have resolved itself ?

Should the sentry be changed to another state after a certain period?

How many different states are of interest? Be careful not to multiply states unnecessarily. For example, if you would not expect Sentinel3G or an operator to respond differently when a sentry is in either of two states, then it’s probably safe to merge one of the states into the other. You can even define an ‘information only’ sentry with no states, which simply logs and displays on the console information supplied ny the agent. On the other hand, you may wish to define separate states for reporting purpose. For example, a printer can be idle or printing and both are considered of normal severity, but if you are interested in the ratio of time spent idle to printing, you would need two states to represent this information.

What additional information would be useful to help an operator understand, diagnose, or fix a problem when the sentry is in each state? Is there a command, report, or graph that could be offered to operators to help diagnose the problem?

Is there a standard procedure that should be followed by operators? If so, consider attaching a monitoring notes file to the sentry, state, or agent.

Adding an Agent

Each sentry tests variables supplied by an agent. If the agent has not already been defined you must add it first, then define its variables, before adding the sentry.

  1. From the console, select Configure > Host monitor.
  2. If you have not already selected the host to update, Sentinel3G will ask you to choose one now. This is the host that the agent will run on. The ‘All Sentries’ window is displayed, showing details of all the sentries defined on this host.
  3. Select Tables > Agents. The ‘Agents’ window opens, showing details of all the agents defined on this host.
  4. Select Maintain > Add. The ‘Add Agent’ form opens.
    Figure 19 — ‘Add Agent’ window
  5. Enter these fields:
    Knowledge Base
    The KB that this agent belongs to. Click to choose one of the predefined KBs installed on this host, otherwise leave this field blank if the agent is not associated with a particular KB.
    Agent name
    Enter a unique name for the agent. The name must not contain spaces.
    Class
    Choose the TCL handler. This tells Sentinel3G how to parse the output from the agent command. See Agent Classes and Variables for more details about selecting an agent class.
    API
    An external application sends data via the Sentinel3G API.
    DB
    The agent returns data in Functional Database format, typically as a result of a query on a Functional Database table.
    ExitStatus
    The agent returns the exit status of the command.
    LogFile
    The agent searches in a log file for messages that match a pattern.
    In the Agent options form you specify a select pattern to select records of interest, and an extract pattern for each text string in the record that must be assigned to a variable.
    SNMPPolled
    The agent polls for the current status of a managed object in an SNMP MIB, such as a device or port.
    Text
    The agent returns text data to STDOUT. You can use the Agent options form to filter out extraneous text such as blank lines, header lines, and labels.
    Note Additional classes may be listed depending on which KBs are installed.
    See the documentation accompanying the KB for more information.
    Description
    Enter a description for the agent.
    Command
    If the Class is DB, ExitStatus, or Text, enter the command to be run. Other agent classes don’t require a command as they obtain their data by different means.
    Poll time (secs)
    There are two ways to use the poll time setting.
    • A short-running command runs once per poll and terminates immediately after returning its data. Poll time is the time the agent waits before rerunning the command. Note that this is the time between the end of the previous run and the start of the next, so the command won’t run exactly this often. In other words, if the poll time is set to 60 seconds and command takes about 5 seconds, on average the command will run about every 65 seconds.
    • A persistent or ‘long-running’ command simply returns to the agent the latest data accumulated at an interval determined by the poll time. The command must accept the polling frequency as an argument, which you pass in an environment variable $PollTime. Example: you wish a command ntping to return data to the agent every 60 seconds. ntping has a flag - p that sets the polling frequency. Set Poll time (secs) to 60 and the Command field to: exec ntping -sentinel -p$PollTime <other_arguments>
    Note Avoid running the command too frequently if testing shows that it may degrade system performance.

Multi-instance agents

Instance variable
The instance variable is the key field that uniquely identifies each row. In the df example, the instance variable would be the filesystem name. You cannot add an agent’s variables until the agent itself exists, so type in the variable name now and continue to the next field. You can add the details of the variable later. See Adding Variables Used By the Agent.
Instance type
This setting specifies how the list of instances is generated.
explicit
Each sentry will explicitly list the names of the instances. For example, a sentry that monitors log files would list the names of the files it will monitor. The agent will ‘gather’ all the instance names from its associated sentries and pass them to the Command as $Instances.
cloning
The agent creates or ‘clones’ instances for each row of data returned by the command.
both
The final list of instances can include both instances specified explicitly by sentries and instances discovered by the agent. For example, the ProcessInfo agent can be set up always to monitor specific system processes, but also to discover arbitrary application or user processes.


If Instance type is set to cloning or both, you can specify patterns for instance names to be included in or excluded from the list returned by the agent. If Instance type is explicit, Include and Exclude are disabled.

Include and Exclude work in much the same way: the agent gets some data for a particular instance, then, if Include is set, it tests whether the instance name matches any of the patterns. If not it rejects the instance. If the name did match one of the patterns, or if Include is not set, it then matches the instance against the patterns in the Exclude field. If there is a match, the instance is rejected.

Include
An optional list of patterns used to select instances to be included.
Exclude
An optional list of patterns used to select instances to be excluded. To stop a sentry being created for a particular instance even if a row is returned, enter its name here. For example if the purpose of the agent is to monitor available disk space, you can exclude rows that represent read-only filesystems such as a CD-ROM drive.

Agent options

You can pass flags and other arguments recognized by this agent class.

Agent options
Click to see the available options. These depend on the Class:

If no options are available for this class the button is disabled.

Discovery pgm
An optional command that is run before the agent starts. Its job is to return an exit status of true or false based on the existence or status of a resource. If the discovery program returns false, this agent and its associated sentries will not be started. For more details and examples, see Discovery program.
Notes file
Enter the file name only (without the path) of a notes file in the Sentinel3G doc directory. These notes will be available from the console to operators when monitoring or responding to alerts relating to this agent. Typically the notes file would describe the variables returned by the agent.

Click Accept to save the agent.

Agent options for LogFile class

The Logfile agent class checks the contents of an ASCII file (usually a log file) for lines that match a pattern. A line is defined as a string of characters terminated by a newline character. One record in a log file may comprise one or more lines. When such a line is found, portions of the record can be assigned to one or more variables.

Figure 20 — Example
options for an agent that monitors the message log
Logfile name
The name of the file to be monitored.
Select pattern
A regular expression that is used to select records from the log file. Lines matching this pattern will be returned by the agent.
Record length
The total number of lines in each record.
Record offset
How many lines before the matching line is the first line in the record.
Strip initial chars
Ignore this number of characters at the start of every line. Use this to discard a fixed-length prefix (such as a time-stamp) if it will not be used by the sentry.
Clear pattern
Remove any string that matches this regular expression and replace it with a tab. Use this to discard any text or fields that will not be passed by the agent as a variable, or to simplify the Select pattern by removing extraneous or variable-length text from the middle of a line.
Split data by
How should the agent assign parts of each matching line to variables? In this field you specify how to break the data into columns. See Assigning Text and Log File Data to Variables. Later when adding variables for this agent you will specify which columns to assign to each variable – see Adding Variables Used By the Agent.
column
Don’t split the data into fields. Instead each line will be treated as a string of characters, with the first character being column 1, the second character column 2, and so on. You will use the "c <col>-<col>" format in the Column field to define each variable.
whitespace
Break the line into a series of columns separated by white space. Each column is numbered in turn, starting from column 1.
tab
Break the line into a series of tokens separated by a single tab character (two tabs in a row define a NULL field between them). Each column is numbered in turn, starting from column 1. Clear pattern should be used to replace an unwanted string with a single tab character before the fields are split.
pattern
Specify a regular expression containing at least one extract pattern, each of which is contained in parentheses. The first extract pattern is treated as column 1, the second extract pattern column 2, and so on.

If you selected pattern, specify one or more regular expressions to match patterns in each line of the record.

Pattern (line 1)
Extract matching variable(s) from the first line in the record.
More patterns
Click to extract data from additional lines in the record. For example, specify a regular expression in the Pattern (line 2) field to extract matching variables from the second line in the record.


Example: selecting a multi-line record from a log file

Here is a fragment from a log file, showing one record of five lines.

INFILE:/data/MFLA/2001Q2/DACCin/reg3.dat
Validating file......
DACCedit V4.1 © 1991–99 TransDACC Ltd.
17 transactions flagged
1128 transactions passed


The unique string that identifies this record is the program name, DACCedit, at the start of the third line. The first line containing a variable we wish to keep (the input file) is two lines earlier. If any transactions were flagged we wish the agent to report the file name and the number of transactions that were flagged. • Set Select pattern to DACCedit • Set Record length to 5 • Set Start record to 2 (the first line in the record is two lines before the line containing the select pattern) • Set Split data by to pattern • Set Pattern (line 1) to: INFILE:(.*) • Leave Pattern (line 2) blank • Leave Pattern (line 3) blank • Set Pattern (line 4) to: ([0-9]*) transactions flagged Configuring Sentries and Agents 101 Figure 21 shows the order in which the agent finds a matching record and sets the corresponding variables. Figure 21 — Example: how agent extracts variables from a log file record When you have finished setting the LogFile agent options, click Accept to return to the main Add Agent form. INFILE:/data/MFLA/2001Q2/DACCin/reg3.dat Validating file...... DACCedit V4.1 © 1991–99 TransDACC Ltd. 17 transactions flagged 1128 transactions passed Start record Select pattern Pattern (line 1) Pattern (line 2) Pattern (line 3) Pattern (line 4) 􀁮Agent finds a line in the log file that matches the select pattern 􀁯Agent calculates the first line in the record by counting back from the selected line 􀁰Text that matches an extract pattern in the first 4 lines is used to assign values to variables 102 Configuring Sentries and Agents Agent options for SNMPPolled class Figure 22 — Example: agent options for an agent in SNMPPolled class MIB file name The name of the MIB file, which Sentinel3G expects to find in the directory lib/tnm2.1.10/mibs under the COSmanager home directory. If the MIB file is not already stored there, copy it there now. Multi-host? Select yes if this agent will query multiple hosts, each specified by a separate instance. If this agent does not support multiple instances (that is, if the Multi-instance? field on the Add Agent form is set to no), this field will be disabled. IP address The IP address or hostname of the host to be queried. Port The UDP port number or service name used for SNMP queries (usually specified in /etc/services). SNMP table The name of a table contained in the MIB, which specifies a set of sequences relating to a device being monitored. The agent will “walk the tree” specified in this table to obtain details of each component of the device. For example, if this table specifies that a Configuring Sentries and Agents 103 switch has multiple ports, the agent queries the switch to get the specified details of each port. SNMP version The version of SNMP that the MIB file conforms to. Community The SNMP community to identify and validate the sender of SNMP messages (SNMPv1 and SNMPv2c only). User The user name to identify the sender of SNMP messages (SNMPv2u only). Password Password corresponding to the user name (SNMPv2u only). Timeout (secs) The maximum time to wait for a response from the node being polled. When you have finished setting the SNMPPolled agent options, click Accept to return to the main Add Agent form. Agent options for Text class Figure 23 — Agent options for agents in the Text class 104 Configuring Sentries and Agents Record length The total number of lines in each record. Skip initial lines How many lines to skip at the start of the record. You can use this field to skip a repeating title or header. Skip initial records When the agent starts up, some spurious alerts may be generated from the first couple of polls. For example if the agent is being started during the system boot procedure, the resource being monitored may be under an unusual load from all the other user processes being started, or a large number of events may have accumulated while the agent was not running. You can choose not to process the data collected by the agent in the first few polls. Examples: enter 2 to skip the first two polls; enter 1 to skip only the first poll. Sentinel3G can extract variables from up to four lines of data. If a record contains more than four lines, you can use the next two fields to discard lines that don’t contain data needed by a sentry, such blank lines and headers. Skip blank lines Select yes to discard blank lines. Skip pattern Skip lines containing a match for this pattern. Use this to discard lines that won’t be used to set variables. Skip initial chars Ignore this number of characters at the start of every line. Use this to discard a fixed-length prefix (such as a time-stamp) if it will not be used by the sentry Clear pattern Remove any string that matches this regular expression and replace it with a tab. Use this to discard any text or fields that will not be passed by the agent as a variable, or to simplify the Select pattern by removing extraneous or variable-length text from the middle of a line. Split data by How should the agent assign parts of each matching line to variables? In this field you specify how to break the data into columns–see Assigning Text and Log File Data to Variables on page 41. Later. when adding variables for this agent you will Configuring Sentries and Agents 105 specify which columns to assign to each variable–see Adding Variables Used By the Agent on page 106 column Don’t split the data into fields. Instead each line will be treated as a string of characters, with the first character being column 1, the second character column 2, and so on. You will use the "c <col>- <col>" format in the Column field to define each variable. whitespace Break the line into a series of columns separated by white space. Each column is numbered in turn, starting from column 1. tab Break the line into a series of tokens separated by a single tab character (two tabs in a row define a NULL field between them). Each column is numbered in turn, starting from column 1. Clear pattern should be used to replace an unwanted string with a single tab character before the fields are split. pattern Specify a regular expression containing at least one extract pattern, each of which is contained in parentheses. The first extract pattern is treated as column 1, the second extract pattern column 2, and so on. If you selected pattern, specify one or more regular expressions to match patterns in each line of the record. Pattern (line 1) Extract matching variable(s) from the first line in the record. More patterns Click to extract data from additional lines in the record. For example, specify a regular expression in the Pattern (line 2) field to extract matching variables from the second line in the record. When you have finished setting the Text agent options, click Accept to return to the main Add Agent form. 106 Configuring Sentries and Agents Adding Variables Used By the Agent 1. From the console, select Configure > Host monitor. 2. If you have not already selected the host to update, Sentinel3G will ask you to choose one now. This is the host that the agent will run on. The ‘All Sentries’ window opens, showing details of all the sentries defined on this host. 3. Select Tables > Agents. The ‘Agents’ window opens, showing details of all the agents defined on this host. 4. Select the agent, then select Maintain > Variables. The ‘Variables from Agent <agent_name>’ window opens. 5. Select Maintain > Add. The ‘Add Variable’ form opens. Figure 24 — ‘Add variable’ window Variable name Enter a name for the variable. The name must not contain spaces. It must not have the same name as another variable belonging to Configuring Sentries and Agents 107 this agent. It doesn’t need to be unique across agents; two agents may have variables with the same name. If the agent is in the API class, the name must match one of the variable names passed by the external application. Class Select one of these options, depending on how the value of the variable will be set: raw The value is set by the agent. derived The value will be computed from other variables (in the Expression field on this form). This is often used to express the value of another variable in a different way, such as a rate, proportion, or percentage. trigger If this variable is included in the list of trigger variables for a state, the Expression field will be evaluated when the sentry changes into that state. This is usually used to save the previous value when a new value is received, so that the old and new values can be compared. Type The internal data type. number An integer or floating point number. string A text string. boolean Mainly used by agents in the ExitStatus class. If the command returns 0, the boolean value is true, otherwise it is false. date A date in Functional Database internal format. The date will be stored in the form YYYYMMDD and output as MM/DD/YY (U.S. display format) or DD/MM/YY (European display format). Mainly used by agents in the DB class. datetime A date and time in Functional Database internal format. The date will be stored in the form YYYYMMDD.hhmmss and output as MM/DD/YY-hh:mm (U.S. display format) or DD/MM/YY-hh:mm (European display format). Mainly used by agents in the DB class. clock A count of the number of seconds since 1 Jan 1970 (GMT). You can subtract from the current value a clock value saved earlier to return a time period (for example how long a process has run). Private Leave unchecked if you wish this variable to be available for logging and graphing. 108 Configuring Sentries and Agents Description A longer description of the source or purpose of this variable. Column Enter the field name or the column number(s) in the output that you want to assign to this variable. The format depends on the agent class: On NULL How should the variable be set if the agent doesn’t return a valid value? The options are: zero set the variable to zero null set the variable to null ignore leave the value of the variable unchanged from the previous poll History Should recent values be stored for use in state conditions and realtime graphing? none don’t keep historical values–the previous value will be overwritten each time the agent polls. time keep all values collected within a time period. count keep this number of recent values, one for each time the agent has run. Keep the last If History is set to time, enter a number of seconds to store all values collected within this period. If History is set to count, enter a number of values to be stored. Expression An expression (using TCL EXPR syntax) that calculates, modifies, or reformats the current value of the variable (see Expressions on Text or LogFile Enter the column number(s)— see Assigning Text and Log File Data to Variables on page 41. DB Enter the column name from the Functional Database dictionary entry. SNMPPolled Enter the object ID from the MIB. API Leave this field blank. The variable name and value will be passed explicitly by the external application. ExitStatus Leave this field blank. Configuring Sentries and Agents 109 page 23). How the expression is used depends on the variable class: derived Reformulate the value of another variable as a rate, proportion, or percentage. Any variables attached to the same sentry (including history variables) may be used in the expression. trigger Trigger variables are used to save a previous value that would otherwise be overwritten when the agent receives new data. If this is a trigger variable, use the Expression field to copy the value of another variable you want to save. Any variables attached to the same sentry (including history variables) may be used in the expression. raw An optional expression to post-process the data received from the agent (contained in $data) E.g change units from KB to MB. Note that for “raw variables” the only variable that can be used in the expression is $data, which is the value returned by the agent. Note To return a floating point number, put .0 at the end of any constant values. This is because TCL will do an integer calculation if both parameters are integers. Example: if $data is an integer, $data / 1024.0 will return a floating-point value; $data / 1024 will return an integer. Initial value An expression (using TCL EXPR syntax) that, when evaluated, returns an initial value for the variable. This can be used to set a starting value before the first time the agent polls, or to initialize a trigger variable before it is set with a real value. This is important if, for example, this variable is used elsewhere in an arithmetic expression, to avoid the calculation generating a data error. The last two fields affect how the variable will appear on the console. Units Choose from the table of descriptive units. Examples: % (percent); MB; per secs. To add a new type of unit, see Maintain List of Numeric Units on page 146. Decimal places The value will be rounded to this number of decimal places. Click Accept to save the variable. 110 Configuring Sentries and Agents Adding a Sentry 1. If the console is not in Host View, select Go > Hosts. 2. Select the host that the agent will run on. 3. Select Configure > Host monitor. The ‘All Sentries’ window opens, showing details of all the sentries defined on this host. 4. Select Maintain > Add. The ‘Add Sentry’ form opens. Figure 25 — Sentry details form Knowledge Base Choose the name of the KB that this sentry belongs to. Click to choose one of the predefined KBs installed on this host, otherwise leave this field blank if the agent is not associated with a particular KB. Configuring Sentries and Agents 111 Class/Folder Choose the name of the folder that the sentry will appear within in the console. Sentry Enter a name for the sentry. The name must not contain spaces. Host The host to which this sentry applies. If you leave the field blank it defaults to being the Host Monitor host. If the agent is running remotely from the Host Monitor, you may want the icon for a resource to appear under a different host. If so, you can enter the name of the remote host here. On/Off Set the initial condition of the sentry. on means the sentry will be operating normally. This means the agent will be running and setting variables to be tested by the sentry. off means the agent is not required to collect data on behalf of this sentry, in which case the agent will not be running (unless it also happens to be collecting data for another sentry). You can switch the sentry off if you wish to test it before running it on a production system— see Running Sentries in Test Mode on page 143. Description Enter a description for the sentry. Primary agent Click to choose the main agent whose variables supply this sentry with data. If other agents also supply variables, list them in the Secondary agents field on the Advanced options form below. Variables from the primary agent can be referenced by their name alone. Note that if a primary agent and a secondary agent both have a variable with the same name, the primary agent’s variable is used. Instance details If the primary agent supports multiple instances, click the Instance details button to specify the instance details. 112 Configuring Sentries and Agents There are three ways to define sentry instances: • by cloning one instance for each key value returned by the primary agent • by explicitly defining each instance • by running a command to ‘discover’ the list of instances Clone Tick this checkbox to clone (create another instance of) a new sentry for each instance return by the multi-instance agent. Clone if If Clone is ticked, you can specify an optional TCL expression. New sentries will only be cloned if this expression evaluates to true. Example: You can create two almost identical sentries, one for small filesystems ( < 1GB) and one for large filesystems ( >= 1GB) with different thresholds. Both would use the same agent, but each would have Clone if set to "$size < 1000" and "$size >= 1000" (where $size is in MB) respectively. Sentinel3G will then clone the appropriate sentry only. The two sentries can even have the same name so that they look indistinguishable on the console. If this field is left blank, new sentries will always be cloned. Discover insts This is an optional command that is run when the Host Monitor starts up. For example, the commend could return a list of object names. A sentry instance will be generated for line of data returned by the command. Configuring Sentries and Agents 113 Instances The Instances window enables you to define sentries explicitly. If the Clone field is ticked, you can predefine some of the attributes of the cloned instances (for example, to specify a different label or to turn the instance off). Instances Click to maintain the list of instance names. Select Maintain > Add to add up to four instances at a time. Instance The name of a specific instance, which the sentry passes to the agent. through the $Instances variable in the agent command. On/Off Set the initial condition of the sentry. on means the sentry will be operating normally. off means the sentry will not process agent data unless it is switched on manually. Label If set, this will be used on the console as the name of the instance. If it is not set, the instance name will be used on the console Agent data Agent-specific data for this instance (used by certain agents only). Example: the ProcessInfo agent can be passed a regular expression in this field to match process names. The Instances of Sentry … window includes options to turn off instances or assign them to instance groups. The Turn on an Turn off methods on the Instance menu simply turn on or off the selected instance. These options work on either cloning or explicit instance sentries. For example, for the cloning sentry Free_Space just add the instance for a particular filesystem and turn it off. Assign to group lets you add selected instances to a previously defined instance group. 114 Configuring Sentries and Agents Maintain > Instance groups brings up the instance groups defined for this sentry. When you have finished filling in the Instances form, click Accept to save the instances. When you have finished adding instances, press F3 in the Instances form to return to the Sentry Instance Details form. Note Changes to instances will not be processed until you exit the Instances window and restart the Host Monitor. Instance label If set, this will be used on the console as the name of the instance. You can use a raw variable to give a more meaningful label (for example: use $printer to label each instance with the printer name). If this field is blank the instance will not have a label. Separate Logs? Tick this checkbox to create a separate log file for each instance. Leave the checkbox blank to write entries for all instances to a combined log file. When you have finished filling in the Sentry Instance Details form, click Return to complete the remaining fields in the main Add Sentry form. Console fields Define how the sentry should appear on the console: Text Enter a text string to provide extra information about the current state of the sentry. The string will be displayed in the status area of the console, and may contain both informative messages and the values of variables returned by the primary agent or a secondary agent. Example: HTTP hits: &HttpHits; HTTP errors: &HttpErrors If a variable name is prefixed with $, the raw or unformatted value will be displayed. If a variable name is prefixed with & (ampersand), the formatted value will be displayed. The formatted value appends the Units field, if specified, and converts some variable types such as dates from internal storage format to display format. Configuring Sentries and Agents 115 Examples: if Units= MB: $MBfree would display an amount like "120.6"; &MBfree would display "120.6 MB" Icon Click to choose an icon to represent this sentry on the console. To see what each icon looks like or to add new icons, see Add Icons on page 154. Indicator Select a type of overlay icon to represent the current state of the sentry or to give a rough indication (to within 10 percent) of the current data value returned by Variable. default Represent the current state of the sentry with the default overlay icon for that state or severity. pie chart Represent the current percentage value returned by Variable as a small pie chart. thermometer Represent the current percentage value returned by Variable as a thermometer. Variable Click to choose a variable to be represented next to the icon for this sentry. You specify in the Indicator field whether to show the value of the variable as a pie chart or a thermometer icon. Note For the indicator to work properly, the variable you choose must always be in the range 0 to 100. Default action Click to specify the action that is to be performed by default when an operator double-clicks this sentry on the console. Variables Display the contents of the variables returned by the agent Graph Draw a realtime graph. Click to choose a predefined graph. Action Run a predefined action command. Click to choose a predefined action. 116 Configuring Sentries and Agents Logged_data Generate a logged data report. Click to choose a predefined report. Click Return to complete the remaining fields in the main Add Sentry form. Notes file Enter the file name only (without the path) of a notes file in the Sentinel3G doc directory. These notes will be available from the console to operators when monitoring or responding to alerts relating to this sentry. Advanced options Click the Advanced button to display some additional options relating to notification and data logging. Figure 26 — Advanced sentry options form Configuring Sentries and Agents 117 Notification type Select none to turn off notification for this sentry. Select default to use the global NotifyList and NotifySeverity settings (see Maintain Notification Settings on page 149). Select specify to use the Whom to notify and On severity fields to override the global settings. Whom to notify Choose the name of one or more users to be notified when this sentry changes state. This overrides the global NotifyList setting. On severity Select a threshold level at which the user(s) listed in the Notify field should be notified. Notification will happen when the sentry changes into a state with this severity or higher. This overrides the global NotifySeverity setting. Show variables Click to choose the variables collected by the agent that are used by this sentry. You can do this to shorten the list of variables that will be displayed when an operator double-clicks on the sentry, and the list of variables available for graphing and reporting, or for running actions. This is useful for an agent that returns a large number of variables, such as sar performance statistics which may include CPU, memory, and network, all from the one agent. When an operator double-clicks on a CPU sentry, you can arrange for them to see only the variables relating to CPU statistics. Leave this field blank to show all variables associated with the primary agent. Data Logging The fields in the Data Logging frame are used to specify what data will be collected for the Logged Data Report. This report can be used to check the events leading up to an alert, and for longer-term trend analysis and capacity planning. 118 Configuring Sentries and Agents If you wish to be able to generate a Logged Data Report for this sentry, you need to specify now which numeric variables must be logged and how often. There are two logging methods: time-based (variables are logged at the specified time interval) and state-based (variables are logged after every poll while the sentry is in a specified state or higher). Note that the period for time-based logging is approximate only. The value is actually taken from the next poll after the interval. It follows that there is no benefit to logging data more often than the polling frequency. Avoid specifying too short a period, otherwise the log files can grow large very quickly. It’s better to log data only occasionally during periods of normal operation. then increase the logging frequency during alerts using the Or on severity field. Enable logging Select this option to enable collection of logging data for this sentry. Log variables Click to choose which numeric variables to be logged. A snapshot of the current values of all these variables will be added to the log file at the frequency specified in the Every field. Default settings?Select this option to use the global DefLogTime and DefLogSeverity settings. Leave this option unselected to specify a period and minimum severity level manually. Every Enter a period in minutes. Examples: enter 10 to log data every 10 minutes; enter 60 to log data every hour. Enter 0 to log data every time the agent polls. This field overrides the global DefLogTime setting. Or on severity You can also log data while the sentry is in a particular state or any state of a higher severity. The variables are logged every time the agent polls. This is a way to selectively log more data during alerts. For example, select severe to log every poll while the sentry is in severe or critical state. To switch off state-based logging, select never. This field overrides the global DefLogSeverity setting. Configuring Sentries and Agents 119 Secondary agents Secondary agents If the sentry needs to use variables collected by an agent other than the Primary agent, click to specify these secondary agents. From the Secondary Agents window, select Maintain > Add then enter the following fields: Agent Click to choose the secondary agent. Instance Enter the name of the instance. Leave this field blank to use the same instance as the sentry. Trigger sentry? Select this option to force the sentry’s state to be reevaluated when the agent returns new data. Click Accept to save the details of this secondary agent. When you have finished specifying secondary agents, press F3 to return to the Advanced Sentry Details form. Note Changes to secondary agents will not be processed until you exit the Secondary Agents window. No-data state Click to choose a default state to be used if the agent doesn’t return data for the sentry. An instance of this sentry will change to this state if the agent stops returning data for that instance. Example: when a filesystem is unmounted and df no longer returns details about that filesystem, then the sentry for that instance only will be put into the No-data state. A Delete state is often used in cases like this, where multiple instances of a 120 Configuring Sentries and Agents sentry are created by “cloning”, and you wish to selectively suppress any instance while it is not returning data. Discovery pgm This is an optional command that is run during Host Monitor startup. Its job is to return an exit status of true or false based on the existence or status of a resource. If the discovery program returns false, this sentry will not be started. When you have finished setting the advanced sentry options, click Accept to return to the main Add Sentry form. If you have finished defining the sentry, click Accept to save it and return to the ‘All Sentries’ window. Configuring Sentries and Agents 121 Adding States The next step is to define the states that the sentry can be in. 1. If the console is not in Host View, select Go > Hosts. 2. Select the host that the agent will run on. 3. Select Configure > Host monitor. The ‘All Sentries’ window opens, showing details of all the sentries defined on this host. 4. Select the sentry, then select Maintain > States. The ‘States for sentry <sentry_name>’ window opens. States are listed in the order in which the entry conditions are evaluated. By default, this is in order of decreasing severity, with the most severe at the top. The first state whose entry condition evaluates to true will cause the sentry to enter that state. It follows that states with a NULL entry condition (usually the ‘normal’ state) should be last, for example: You can the drag the states into a different order using the Order > Reorder menu option. 5. There are three ways in which you can add states: • If the sentry was created by cloning, a separate copy of the original sentry’s states is made. If these states are exactly as required, you don’t need to do anything more. If you need to modify a state in some way, select it now and use Maintain > Change to make the changes (see Maintain State Details on page 122). • If there is another sentry that already has a set of states that are similar or identical to what is required, you can copy that sentry’s states: select Maintain > Copy states, then choose the sentry. Note Copy states is only available if the sentry has no states defined. If the new sentry already has states and you wish use the states of another sentry instead, remove the states from the new sentry first. critical $pct_free == 0 severe $pct_free < 5 alarm $pct_free < 10 warning $pct_free < 15 normal <null> 122 Configuring Sentries and Agents If the states are exactly as required, you don’t need to do anything more. If you need to modify a state in some way, select it now and use Maintain > Change to make the changes (see Maintain State Details on page 122). • You can add each state manually: select Maintain > Add. The ‘Add States’ form opens, as shown in Figure 27. Maintain State Details To define the sentry’s attributes and appearance while in this state, enter these fields: Figure 27 — State details form State Enter a unique name for the state. The name must not contain spaces. Severity Select the severity level for this state. The severity determines how the sentry will look while in this state (that is, its color and the color and type of any associated overlay icon). Note that this will result in a notification message being sent if the new severity is at or above the notification level specified globally or for this sentry. Configuring Sentries and Agents 123 The options are listed in order of increasing severity from normal to critical. disabled is a special severity that indicates the sentry is ‘down’ or otherwise unavailable, but doesn’t require attention (for example, a device that has been taken offline for maintenance). Description Enter a description for the state. Entry condition Enter a conditional expression. This is a TCL expression made up of any combination of agent variables, constants, text strings, numbers, history variables, boolean values, and TCL functions. Examples: $Status == "Off" && $PID != "-1" $pct_free < $LOW [hist_avg @cpu_idle] If the entry condition is left blank it evaluates to true. For the correct syntax to refer to variables, see Expressions on page 23. Console These fields define how the sentry will appear on the console while in this state: Text Enter a text string to provide extra information about the current state of the sentry. The string will be displayed in the status area of the console, and may contain both informative messages and the values of variables. Example: HTTP hits: &HttpHits; HTTP errors: &HttpErrors … where HttpHits and HttpErrors are the names of variables returned by the primary agent or a secondary agent. If you prefix a variable name with & it will be formatted for output on the console. For example, numeric variables will be displayed with the correct number of decimal places and with the units after the value, while date/time values will be formatted as a readable dates or times. If you prefix a variable name with $ the raw value will be displayed on the console. 124 Configuring Sentries and Agents You can change the appearance of the sentry’s icon while it is this state, by specifying a different icon or by overlaying the normal icon with an additional indicator icon to modify its appearance. Note To add new icons, see Add Icons on page 154. Icon Click to choose a different icon to represent this sentry while it is in this state. Example: the sentry normally has a 32x32 icon representing a remote system. When it is in network_down state, you choose to use another icon that has a red X indicating a problem with the network connection. If Icon is left blank the default sentry icon will be used. Indicator Click to choose a 16x16 indicator icon. This is an additional small icon that overlays the main icon in the top right hand corner. You can use this to modify the appearance of the icon while the sentry is in this state. If Indicator is left blank the overlay specified in the sentry (thermometer or pie chart) will be used. If neither overlay is specified in the sentry, the default overlay icon and color for the current severity is used (see Overlays and Indicator Icons on page 16). Notes file Enter the file name only (without the path) of a notes file in the Sentinel3G doc directory. These notes will be available from the console to operators when monitoring or responding to alerts relating to this sentry. Trigger variables Click next to the Trigger vars field to list the trigger variables for this sentry. Choose one or more variables whose values you want to reset when the sentry enters this state (see Trigger Variables on page 43). Configuring Sentries and Agents 125 Responses The Responses button lets you define several responses that can be run automatically while the sentry is in this state. Responses Click to see the options for notification, escalation and running automatic responses. Figure 28 — Defining the responses for a state The Response form contains three blocks of response fields which are run in turn if the sentry remains in this state for longer than the specified period. Each response block specifies a waiting period, then three actions Sentinel3G can take at the end of each period. The actions are: changing the severity level, which in turn changes the appearance of the sentry on the console; sending a notification message; and running a command. These options are not mutually exclusive; you can specify any or all actions in each response block. The final group, the Escalation/acknowledgement block, specifies a period after which the sentry will be forced into a different state. See Escalation/acknowledgement on page 127. A typical response is to run a command to fix the problem, and if it succeeds to return the sentry to a normal state. If the command doesn’t fix the problem you may choose to leave the sentry in that state, and specify a later response to run another command or to notify someone. These responses are all optional. You can specify all three, or one, or even none. If no responses or escalation period are specified here, the sentry will remain in this state until and unless the evaluation condition evaluates to a different severity level. 126 Configuring Sentries and Agents If a sentry changes state while it is waiting to process a response, then all responses for that state are cancelled, and any responses for the new state are started. Wait (secs) The length of time to wait after running the previous response, or if this is the first response, after entering this state. Each period is cumulative. In other words the period for Response #2 is counted from the end of the period for Response #1. New severity After the wait period, change the appearance of the sentry on the console to this new severity level. This would usually be done to trigger a global or sentry-level notification message or to increase the apparent urgency of the event by making the icon flash or change color. Note that changing the severity does not change the state of the sentry. Select unchanged if you wish to leave the severity level as it is. Notification Click to specify who will be notified if the sentry is still in this state at the end of the wait period. In the Type field, select one of the following: default Use the default notification list for this sentry. specify In the Who field, choose the names of one or more users. These are in addition to the default notification level for this sentry (see under Advanced options on page 116). If you don’t want to do any additional notification for this response, select none. Command Enter a command to be run by the Host Monitor. This would usually attempt to fix the problem. Example: a ‘free disk space’ sentry could archive files to an offline storage device or remove files such as core and *.o that are deemed expendable. Fire agent Choose an agent to be run. This agent will be polled immediately at the end of the wait period. To poll a specific instance only, enter the instance name in brackets after the agent name. Example: Filesystem(/tmp). Configuring Sentries and Agents 127 Escalation/acknowledgement Another way to respond to an alert is simply to wait for a while to see if the problem corrects itself or more information is received, then to change to another state at the end of that period. Figure 29 — Defining the escalation condition for a state Change to a more severe state if the problem would normally be expected to resolve itself either spontaneously or by the running of the automatic responses. If the sentry is still in this state at the end of the waiting period it suggests some other action must be taken. Change to a less severe state (typically, normal state) if the problem appears to have been a one-off event. For example, the Bad_SU sentry goes into warning state if a failed su attempt is detected. If no further failed su attempts are detected by the end of the waiting period no action need be taken and the sentry can be returned to normal state. The change of state may depend on manual confirmation from an operator (Acknowledgement) or it may happen automatically (Escalation). Wait (secs) The length of time to wait after running the previous response, or if there are no previous responses, after entering this state. If the waiting period is set to 0 seconds, the response (either escalation or the appearance of the acknowledgement icon on the console) will occur as soon as the sentry enters this state. Go to state Choose the new state to change to. Type Select acknowledge if the change of state depends on manual confirmation from an operator. Select escalation if the change of state should happen automatically at the end of the waiting period. 128 Configuring Sentries and Agents When you have finished defining responses and escalation details, click Accept to return to the main Add State form. If you have finished defining this state, click Accept to save it and return to the ‘State Details’ window. Constants Click next to the Constants field to maintain the list of constants and thresholds for this sentry. You can add a new constant, change the details of an existing constant, or adjust the threshold values at which the sentry changes from one state to another. For details about all these tasks, see Maintaining Constants and Thresholds on page 140. Configuring Sentries and Agents 129 Adding an Action or Report A sentry can have several associated actions, which an operator can choose to run from the console. Actions may either be tied to particular states, or can be made available when the sentry is in any state. There are two types: actions typically are used to try to fix a problem; reports display output on the screen and help the operators to diagnose the problem. You can assist operators by explaining in the monitoring notes for the sentry or state when and how each action should be used. When designing an action for a multi-instance sentry you can set it up to run for selected instances or for every instance in a parent folder. For example, you can set up an action so that the output for all instances is combined into one report. 1. From the ‘All Sentries’ window, select the sentry. 2. Select Maintain > Actions. The ‘Actions for sentry <sentry_name>’ window opens. 3. Select Maintain > Add. The ‘Add actions for sentry <sentry_name>’ form opens.(Tip: If a similar action has already been defined for a sentry that uses the same agent, it may be faster to use Maintain > Copy.) Figure 30 — Action details form 130 Configuring Sentries and Agents Action Enter a name for the action. This is the name that will appear in the list of actions that the operator can select from. Type Select whether or not you want the output from the command to be displayed on the operator’s screen: action Simply runs the command without displaying any output. Example: starting a service when it is stopped. report Displays the command’s output on the screen. Command Enter the command, using UNIX shell syntax. The command can make use of the variables $Sentry, $Host, and $Action, which will be set in the environment when the action is run. For multi-instance sentries you can also refer to $Instance, which contains the instance name. To use agent variables in the command, select Uses agent data? below. This example shows how to define a simple report for a singleinstance sentry: echo -n "Report '$Action' "; date; echo " Sentry: $Sentry"; echo " Host: $Host"; When the report is run it will display the name of this action, the date, and the name of the sentry and the host it runs on. For more details and examples of actions and reports, see Actions on page 24. Display command If Type is report, enter a command to display the output from the Command (examples: scroll, db_scroll, db_graph). The default is Sentinel3G’s own browser widget. If the action is run on several instances, all the output from all the commands will be piped to the same display command. Display command is optional if Command handles the displaying of the data itself. Run as user Choose a user name from the password file on this host. The command will run with the privileges of this user. The default account is root. Example: some RDBMS packages require that certain administrative commands be run from a special DB admin account. Configuring Sentries and Agents 131 In state(s) You can make this action available in only certain states. Example: the Services sentry has Stop and Restart actions that are available when a service is in a state that indicates it is running, and a Start action when it is not running Click to choose one or more states. Leave this field blank to make the action available at all times. Access role If set, only users with the specified role can perform this action. If blank, all Sentinel3G users who have the action capability may perform this action. Authenticate? Tick this field to ask for the operator’s password before running the action. Uses agent data? Tick this checkbox if you wish to use any of the agent variables in the Command or Display command fields. This gives the commands access to the same primary agent variables as the sentry. Reads from STDIN? If Uses agent data? is ticked, use this field to specify where the action can find the data: no Command will expect the variables to be set in the environment and accessed by name (e.g. $pct_free). yes The data will be passed from STDIN in Functional Database format. Use this option if you wish to manipulate the data using the Functional Toolset. Fire agent Tick this checkbox if you wish the sentry’s agent to be polled after the action has been run. Export to parent? Tick this checkbox if you wish the action to be available from the parent folder of this sentry. If the action is exported, operators will be able to choose this action both in relation to a selected sentry and for all sentries in the parent folder. 132 Configuring Sentries and Agents Example: a Free Space report that displays details for a selected filesystem (single sentry) or all user filesystems on a host (parent folder). 4. Click Accept to save the action. To test the action from the console, select the sentry and then select Sentry > Action. Configuring Sentries and Agents 133 Adding a Realtime Graph Realtime graphs plot recent values returned by selected variables for a sentry. 1. From the ‘All Sentries’ window, select the sentry. 2. Select Maintain > Realtime graphs. The ‘Realtime graphs for sentry <sentry_name>’ window opens. 3. Select Maintain > Add. The ‘Add realtime graphs for sentry <sentry_name>’ form opens. (Tip: If a similar graph has already been defined for a sentry that uses the same agent, it may be faster to use Maintain > Copy.) Figure 31 — Example: displaying free disk space as a stack graph 134 Configuring Sentries and Agents Specify attributes of the graph Graph type Select the type of graph you wish to use to display the data: line line graphs are useful for gauging trends. bar bar graphs are useful for comparing variables within one observation or comparing adjacent observations. stack stack graphs are typically used where all values add up 100%, Example: CPU usage = %user + %system + %idle Figure 32 shows the same data presented using each type of graph: Figure 32 — Sample disk space data shown as line, bar, and stack graphs Polls displayed The number of values to display across the X-axis of the graph. For example, if you enter 3, values from the last three polls will be displayed. Line graph… …Stack graph Bar graph… Configuring Sentries and Agents 135 The first two fields control the scale on the graph’s Y-axis. If Min value and Max value are not specified, the Y-axis will be sized to the current minimum and maximum data value. This means the scale may change as new values are graphed. To keep the scale constant, set both Min value and Max value. Use close minimum and maximum values if you want to focus on relatively small differences among data values. For example, if a set of variables is mainly of interest when the values are clustered near 100%, a minimum value of 90 will help to separate them. Min value The minimum value to display next to the Y-axis. If the values will always be positive, set Min value=0. Max value The maximum value to display next to the Y-axis. Scale to max? (For stack charts only) Tick this checkbox if you want the values to be scaled so that their sum equals Max value. This is useful where the total adds up approximately to Max Value. Scaling ensures a flat top to the stack. Specify variables You can now choose the names of up to five agent variables whose values are to be graphed. Click next to the Variable details field to specify the attributes of the chosen variables. For example, Figure 33 shows the details for two variables called MBfree and MBused. 136 Configuring Sentries and Agents Figure 33 — Example of a realtime graph that plots two variables For each variable, enter the following details: Color Select the color to be used to display this variable. Label If you wish you can change the default label displayed for this variable. For example, if you wish to scale down a value for free disk space by a factor of 1000, you could also change the label to read GB (gigabytes) instead of MB (megabytes). Scale by This is an optional scaling factor. The values displayed will be multiplied (scaled up) by this factor. Use this to convert very large or small numbers to more manageable units. Example: specify 0.001 to divide the reported values by 1000. Click Return to save the variable details. Configuring Sentries and Agents 137 Specify threshold markers You can now specify up to four markers to be superimposed over the data values. Each marker is displayed as a colored horizontal line and represents a state threshold or other significant value. You can specify both constants associated with this sentry and enter arbitrary integers or floating-point numbers (such as 20, 40, 60, 80). Click next to the Threshold markers field to specify up to four markers. For each marker, specify these details: At value Enter a floating point number, or click to choose one of the constant values defined for this sentry. See Maintaining Constants and Thresholds on page 140. Color Select the color to be used to display this threshold. Remember to use a different color from those used to graph the variables. Use the Test graph option to see which colors show up best. If the threshold is equivalent to a boundary between states, it may be helpful to use the color of the severity level for the higher state. For example, if the constant LOW is the boundary between normal and warning state, and the sentry goes orange when it is in warning state, use orange as the color of the threshold marker. 138 Configuring Sentries and Agents Figure 34 — Generating a Test graph to check colors and thresholds When you have finished specifying markers, click Return to return to the ‘Add realtime graphs for sentry <sentry_name>’ form. Test the graph Now you can test the appearance of the graph. Click next to the Test graph field to generate a graph based on the settings in the form and the most recent data returned by the agent on this host. Note The host monitor must be running and the agent must be returning valid data. You can display several graphs at once by experimenting with different settings and clicking Test graph again. If this is a multi-instance sentry you can test different instances. When you are finished with each graph press F3 to dismiss it. Save the graph details Graph name Enter a unique name to identify this graph. Threshold markers 0.75 0.5 Configuring Sentries and Agents 139 Description Enter a description that explains what this graph will show or when it should be used. This will help operators to select the correct graph to diagnose problems. Title The title that appears in the heading of the graph. It can contain plain text, a variable such as $Instance (for a multi-instance sentry, the name of this instance), or a combination of the two. Export to parent? Tick this checkbox if you wish the graph to be available from the parent folder of this sentry. If the graph is exported, operators will be able to choose this graph when they select the parent folder. 4. Click Accept to save the graph. To test the graph from the console, restart the host monitor, select the sentry and then select Report > Realtime graph. 140 Configuring Sentries and Agents Maintaining Constants and Thresholds Constants are like variables, but they are associated with a sentry rather than an agent. You can use constants in a state’s Entry condition field to define thresholds between states, and as a visual aid on realtime graphs. Example of use: You create a sentry and its states. Some of the states have an entry condition that compares the current data value from the agent with a constant such as VERY_LOW. You clone the sentry. The same set of states is shared between the old and new sentry, but you set the constant VERY_LOW to different a value in each sentry. To display the constants for a sentry 1. From the ‘All Sentries’ window, select the sentry. 2. Select Maintain > Constants. The ‘Constants for sentry <sentry_name>’ window opens. To add a constant for a sentry 1. Select Maintain > Add. The ‘Add Constants’ form opens. 2. Enter the following fields: Configuring Sentries and Agents 141 Constant Enter a name for the constant. The convention is to use uppercase letters and underscores only (e.g., HALF_FULL). The name must be different from other constants belonging to this sentry, though another sentry can have a constant with the same name. Value Set the value of the constant (examples: 3; 0.5; true). Comment Enter an optional comment. Group override? Can an instance group override the value of this constant? If this option is set to yes, the value of this constant always applies to any instance that uses it. If this option is set to no, the value set in an instance group can override the value set here. 3. Click Accept to save the constant. To adjust the values of a sentry’s constants You can adjust the values of all the constants belonging to a sentry. You can use this to fine tune the thresholds at which a sentry changes from one state to another. 1. From the ‘Constants for sentry <sentry_name>’ window, select Maintain > Change values. 142 Configuring Sentries and Agents 2. Change the values next to any of the constants. 3. Click Accept to save the new values. Configuring Sentries and Agents 143 Running Sentries in Test Mode When you are developing sentries, you leave them “switched off ” until you are ready to move them to production mode. You can use the hostmon -T command to test sentries even if they are off, or are in KBs that are off. Running the sentries in test mode will attempt to start all the agents required by the selected sentries and display status messages including an error messages. You can correct any configuration problems and retest the sentries. When you are satisfied that the sentries will work correctly you can change their condition to on. To test sentries, start a Sentinel3G shell then run hostmon -T <sentries…>. Example: cos sentinel -c bash hostmon -T Clients Swap_Size Image:Example.jpg