FS
Documentation

Configuring Sentries and Agents

This page was last modified 06:03, 14 June 2013.

From Documentation

Jump to: navigation, search

This chapter describes how to define sentries to monitor resources and respond to events. The main procedures, in order, are:

The first topic lists some questions you should look into when preparing to configure sentries.



Note
You must have the Admin or Manager role to configure sentries.


Contents

Planning your Sentry

When preparing to build sentries it is useful to consider these questions:

What object or resource do you wish to monitor?

What is the purpose of monitoring this resource:

How can the status of the resource be queried:

Does running one command return data about several instances of an object or resource? Multiple instances imply a ‘cloning’ sentry, in which case one of the attributes or variables that uniquely identifies each instance must be designated as the key column. Will every instance be monitored, or should selected instances be filtered out?

What part of the output from the agent do you wish to capture?

How can a message of interest be identified (e.g., by its position? by searching for a unique string or pattern)? How can portions of the data destined to be saved in variables be separated from the rest of the output? Can the output be simplified by discarding header lines or a message prefix?

Will there be ‘spikes’ or gaps in the data that must be allowed for? In other words, when the sentry tests a variable, is it sufficient to test just the current value, or would it be more realistic to take an average of several recent values?

How often does the agent need to collect data? You may need to consider a tradeoff between running the command too frequently and affecting system performance, and not running it frequently enough, in which case the information provided may not be current.

What agent data (that is, variables) do you want to appear on the console? How should they be formatted?

What are the thresholds or triggers that cause a sentry to move from one state to another? If the sentry remains in a state for a long time, does that indicate that the problem is getting more serious? Or that the problem may have resolved itself ?

Should the sentry be changed to another state after a certain period?

How many different states are of interest? Be careful not to multiply states unnecessarily. For example, if you would not expect Sentinel3G or an operator to respond differently when a sentry is in either of two states, then it’s probably safe to merge one of the states into the other. You can even define an ‘information only’ sentry with no states, which simply logs and displays on the console information supplied ny the agent. On the other hand, you may wish to define separate states for reporting purpose. For example, a printer can be idle or printing and both are considered of normal severity, but if you are interested in the ratio of time spent idle to printing, you would need two states to represent this information.

What additional information would be useful to help an operator understand, diagnose, or fix a problem when the sentry is in each state? Is there a command, report, or graph that could be offered to operators to help diagnose the problem?

Is there a standard procedure that should be followed by operators? If so, consider attaching a monitoring notes file to the sentry, state, or agent.

Adding an Agent

Each sentry tests variables supplied by an agent. If the agent has not already been defined you must add it first, then define its variables, before adding the sentry.

  1. From the console, select Configure > Host monitor.
  2. If you have not already selected the host to update, Sentinel3G will ask you to choose one now. This is the host that the agent will run on. The ‘All Sentries’ window is displayed, showing details of all the sentries defined on this host.
  3. Select Tables > Agents. The ‘Agents’ window opens, showing details of all the agents defined on this host.
  4. Select Maintain > Add. The ‘Add Agent’ form opens.
    Image:Sentinelfigure19.jpg
    Figure 19 — ‘Add Agent’ window
  5. Enter these fields:
    Knowledge Base
    The KB that this agent belongs to. Click to choose one of the predefined KBs installed on this host, otherwise leave this field blank if the agent is not associated with a particular KB.
    Agent name
    Enter a unique name for the agent. The name must not contain spaces.
    Class
    Choose the TCL handler. This tells Sentinel3G how to parse the output from the agent command. See Agent Classes and Variables for more details about selecting an agent class.
    API
    An external application sends data via the Sentinel3G API.
    DB
    The agent returns data in Functional Database format, typically as a result of a query on a Functional Database table.
    ExitStatus
    The agent returns the exit status of the command.
    LogFile
    The agent searches in a log file for messages that match a pattern.
    In the Agent options form you specify a select pattern to select records of interest, and an extract pattern for each text string in the record that must be assigned to a variable.
    SNMPPolled
    The agent polls for the current status of a managed object in an SNMP MIB, such as a device or port.
    Text
    The agent returns text data to STDOUT. You can use the Agent options form to filter out extraneous text such as blank lines, header lines, and labels.
    Note Additional classes may be listed depending on which KBs are installed.
    See the documentation accompanying the KB for more information.
    Description
    Enter a description for the agent.
    Command
    If the Class is DB, ExitStatus, or Text, enter the command to be run. Other agent classes don’t require a command as they obtain their data by different means.
    Poll time (secs)
    There are two ways to use the poll time setting.
    • A short-running command runs once per poll and terminates immediately after returning its data. Poll time is the time the agent waits before rerunning the command. Note that this is the time between the end of the previous run and the start of the next, so the command won’t run exactly this often. In other words, if the poll time is set to 60 seconds and command takes about 5 seconds, on average the command will run about every 65 seconds.
    • A persistent or ‘long-running’ command simply returns to the agent the latest data accumulated at an interval determined by the poll time. The command must accept the polling frequency as an argument, which you pass in an environment variable $PollTime. Example: you wish a command ntping to return data to the agent every 60 seconds. ntping has a flag - p that sets the polling frequency. Set Poll time (secs) to 60 and the Command field to: exec ntping -sentinel -p$PollTime <other_arguments>
    Note Avoid running the command too frequently if testing shows that it may degrade system performance.

Multi-instance agents

Instance variable
The instance variable is the key field that uniquely identifies each row. In the df example, the instance variable would be the filesystem name. You cannot add an agent’s variables until the agent itself exists, so type in the variable name now and continue to the next field. You can add the details of the variable later. See Adding Variables Used By the Agent.
Instance type
This setting specifies how the list of instances is generated.
explicit
Each sentry will explicitly list the names of the instances. For example, a sentry that monitors log files would list the names of the files it will monitor. The agent will ‘gather’ all the instance names from its associated sentries and pass them to the Command as $Instances.
cloning
The agent creates or ‘clones’ instances for each row of data returned by the command.
both
The final list of instances can include both instances specified explicitly by sentries and instances discovered by the agent. For example, the ProcessInfo agent can be set up always to monitor specific system processes, but also to discover arbitrary application or user processes.


If Instance type is set to cloning or both, you can specify patterns for instance names to be included in or excluded from the list returned by the agent. If Instance type is explicit, Include and Exclude are disabled.

Include and Exclude work in much the same way: the agent gets some data for a particular instance, then, if Include is set, it tests whether the instance name matches any of the patterns. If not it rejects the instance. If the name did match one of the patterns, or if Include is not set, it then matches the instance against the patterns in the Exclude field. If there is a match, the instance is rejected.

Include
An optional list of patterns used to select instances to be included.
Exclude
An optional list of patterns used to select instances to be excluded. To stop a sentry being created for a particular instance even if a row is returned, enter its name here. For example if the purpose of the agent is to monitor available disk space, you can exclude rows that represent read-only filesystems such as a CD-ROM drive.

Agent options

You can pass flags and other arguments recognized by this agent class.

Agent options
Click to see the available options. These depend on the Class:

If no options are available for this class the button is disabled.

Discovery pgm
An optional command that is run before the agent starts. Its job is to return an exit status of true or false based on the existence or status of a resource. If the discovery program returns false, this agent and its associated sentries will not be started. For more details and examples, see Discovery program.
Notes file
Enter the file name only (without the path) of a notes file in the Sentinel3G doc directory. These notes will be available from the console to operators when monitoring or responding to alerts relating to this agent. Typically the notes file would describe the variables returned by the agent.

Click Accept to save the agent.

Agent options for LogFile class

The Logfile agent class checks the contents of an ASCII file (usually a log file) for lines that match a pattern. A line is defined as a string of characters terminated by a newline character. One record in a log file may comprise one or more lines. When such a line is found, portions of the record can be assigned to one or more variables.

Image:Sentinelfigure20.jpg
Figure 20 — Example: options for an agent that monitors the message log
Logfile name
The name of the file to be monitored.
Select pattern
A regular expression that is used to select records from the log file. Lines matching this pattern will be returned by the agent.
Record length
The total number of lines in each record.
Record offset
How many lines before the matching line is the first line in the record.
Strip initial chars
Ignore this number of characters at the start of every line. Use this to discard a fixed-length prefix (such as a time-stamp) if it will not be used by the sentry.
Clear pattern
Remove any string that matches this regular expression and replace it with a tab. Use this to discard any text or fields that will not be passed by the agent as a variable, or to simplify the Select pattern by removing extraneous or variable-length text from the middle of a line.
Split data by
How should the agent assign parts of each matching line to variables? In this field you specify how to break the data into columns. See Assigning Text and Log File Data to Variables. Later when adding variables for this agent you will specify which columns to assign to each variable – see Adding Variables Used By the Agent.
column
Don’t split the data into fields. Instead each line will be treated as a string of characters, with the first character being column 1, the second character column 2, and so on. You will use the "c <col>-<col>" format in the Column field to define each variable.
whitespace
Break the line into a series of columns separated by white space. Each column is numbered in turn, starting from column 1.
tab
Break the line into a series of tokens separated by a single tab character (two tabs in a row define a NULL field between them). Each column is numbered in turn, starting from column 1. Clear pattern should be used to replace an unwanted string with a single tab character before the fields are split.
pattern
Specify a regular expression containing at least one extract pattern, each of which is contained in parentheses. The first extract pattern is treated as column 1, the second extract pattern column 2, and so on.

If you selected pattern, specify one or more regular expressions to match patterns in each line of the record.

Pattern (line 1)
Extract matching variable(s) from the first line in the record.
More patterns
Click to extract data from additional lines in the record. For example, specify a regular expression in the Pattern (line 2) field to extract matching variables from the second line in the record.


Example: selecting a multi-line record from a log file

Here is a fragment from a log file, showing one record of five lines.

INFILE:/data/MFLA/2001Q2/DACCin/reg3.dat
Validating file......
DACCedit V4.1 © 1991–99 TransDACC Ltd.
17 transactions flagged
1128 transactions passed

The unique string that identifies this record is the program name, DACCedit, at the start of the third line. The first line containing a variable we wish to keep (the input file) is two lines earlier. If any transactions were flagged we wish the agent to report the file name and the number of transactions that were flagged.

When you have finished setting the LogFile agent options, click Accept to return to the main Add Agent form.

Agent options for SNMPPolled class

Image:Sentinelfigure22
Figure 22 — Example: agent options for an agent in SNMPPolled class
MIB file name
The name of the MIB file, which Sentinel3G expects to find in the directory lib/tnm2.1.10/mibs under the COSmanager home directory. If the MIB file is not already stored there, copy it there now.
Multi-host?
Select yes if this agent will query multiple hosts, each specified by a separate instance. If this agent does not support multiple instances (that is, if the Multi-instance? field on the Add Agent form is set to no), this field will be disabled.
IP address
The IP address or hostname of the host to be queried.
Port
The UDP port number or service name used for SNMP queries (usually specified in /etc/services).
SNMP table
The name of a table contained in the MIB, which specifies a set of sequences relating to a device being monitored. The agent will “walk the tree” specified in this table to obtain details of each component of the device. For example, if this table specifies that a switch has multiple ports, the agent queries the switch to get the specified details of each port.
SNMP version
The version of SNMP that the MIB file conforms to.
Community
The SNMP community to identify and validate the sender of SNMP messages (SNMPv1 and SNMPv2c only).
User
The user name to identify the sender of SNMP messages (SNMPv2u only).
Password
Password corresponding to the user name (SNMPv2u only).
Timeout
(secs) The maximum time to wait for a response from the node being polled.

When you have finished setting the SNMPPolled agent options, click Accept to return to the main Add Agent form.

Agent options for Text class

Image:Sentinelfigure23.jpg
Figure 23 — Agent options for agents in the Text class
Record length
The total number of lines in each record.
Skip initial lines
How many lines to skip at the start of the record. You can use this field to skip a repeating title or header.
Skip initial records
When the agent starts up, some spurious alerts may be generated from the first couple of polls. For example if the agent is being started during the system boot procedure, the resource being monitored may be under an unusual load from all the other user processes being started, or a large number of events may have accumulated while the agent was not running. You can choose not to process the data collected by the agent in the first few polls. Examples: enter 2 to skip the first two polls; enter 1 to skip only the first poll. Sentinel3G can extract variables from up to four lines of data. If a record contains more than four lines, you can use the next two fields to discard lines that don’t contain data needed by a sentry, such blank lines and headers.

Skip blank lines Select yes to discard blank lines. Skip pattern Skip lines containing a match for this pattern. Use this to discard lines that won’t be used to set variables.

Skip initial chars
Ignore this number of characters at the start of every line. Use this to discard a fixed-length prefix (such as a time-stamp) if it will not be used by the sentry Clear pattern Remove any string that matches this regular expression and replace it with a tab. Use this to discard any text or fields that will not be passed by the agent as a variable, or to simplify the Select pattern by removing extraneous or variable-length text from the middle of a line.
Split data by
How should the agent assign parts of each matching line to variables? In this field you specify how to break the data into columns–see Assigning Text and Log File Data to Variables. Later. when adding variables for this agent you will specify which columns to assign to each variable–see Adding Variables Used By the Agent
column
Don’t split the data into fields. Instead each line will be treated as a string of characters, with the first character being column 1, the second character column 2, and so on. You will use the "c <col>-<col>" format in the Column field to define each variable.
whitespace
Break the line into a series of columns separated by white space. Each column is numbered in turn, starting from column 1.
tab
Break the line into a series of tokens separated by a single tab character (two tabs in a row define a NULL field between them). Each column is numbered in turn, starting from column 1. Clear pattern should be used to replace an unwanted string

with a single tab character before the fields are split.

pattern
Specify a regular expression containing at least one extract pattern, each of which is contained in parentheses. The first extract pattern is treated as column 1, the second extract pattern column 2, and so on.

If you selected pattern, specify one or more regular expressions to match patterns in each line of the record.

Pattern (line 1)
Extract matching variable(s) from the first line in the record.
More patterns
Click to extract data from additional lines in the record. For example, specify a regular expression in the Pattern (line 2) field to extract matching variables from the second line in the record.

When you have finished setting the Text agent options, click Accept to return to the main Add Agent form.

Adding Variables Used by the Agent

  1. From the console, select Configure > Host monitor.
  2. If you have not already selected the host to update, Sentinel3G will ask you to choose one now. This is the host that the agent will run on. The ‘All Sentries’ window opens, showing details of all the sentries defined on this host.
  3. Select Tables > Agents. The ‘Agents’ window opens, showing details of all the agents defined on this host.
  4. Select the agent, then select Maintain > Variables. The ‘Variables from Agent <agent_name>’ window opens.
  5. Select Maintain > Add. The ‘Add Variable’ form opens.
Image:Sentinelfigure24.jpg
Figure 24 — ‘Add variable’ window
Variable name
Enter a name for the variable. The name must not contain spaces, but it may contain underscores instead. It must not have the same name as another variable belonging to this agent. It doesn’t need to be unique across agents; two agents may have variables with the same name. If the agent is in the API class, the name must match one of the variable names passed by the external application. By convention, agent variable names are lower case.
Class
Select one of these options, depending on how the value of the variable will be set:
raw
The value is set by the agent.
derived
The value will be computed from other variables (in the Expression field on this form). This is often used to express the value of another variable in a different way, such as a rate, proportion, or percentage.
trigger
If this variable is included in the list of trigger variables for a state, the Expression field will be evaluated when the sentry changes into that state. This is usually used to save the previous value when a new value is received, so that the old and new values can be compared.
Type
The internal data type.
number
An integer or floating point number.
string
A text string.
boolean
Mainly used by agents in the ExitStatus class. If the command returns 0, the boolean value is true, otherwise it is false.
date
A date in Functional Database internal format. The date will be stored in the form YYYYMMDD and output as MM/DD/YY (U.S. display format) or DD/MM/YY (European display format). Mainly used by agents in the DB class.
datetime
A date and time in Functional Database internal format. The date will be stored in the form YYYYMMDD.hhmmss and output as MM/DD/YY-hh:mm (U.S. display format) or DD/MM/YY-hh:mm (European display format). Mainly used by agents in the DB class.
clock
A count of the number of seconds since 1 Jan 1970 (GMT). You can subtract from the current value a clock value saved earlier to return a time period (for example how long a process has run).
Private
Leave unchecked if you wish this variable to be available for logging and graphing.
Description
A longer description of the source or purpose of this variable.
Column
Enter the field name or the column number(s) in the output that you want to assign to this variable. The format depends on the agent class:
Text or LogFile
Enter the column number(s)— see Assigning Text and Log File Data to Variables.
DB
Enter the column name from the Functional Database dictionary entry.
SNMPPolled
Enter the object ID from the MIB.
API
Leave this field blank. The variable name and value will be passed explicitly by the external application.
ExitStatus
Leave this field blank.
On NULL
How should the variable be set if the agent doesn’t return a valid value? The options are:
zero
set the variable to zero
null
set the variable to null
ignore
leave the value of the variable unchanged from the previous poll
History
Should recent values be stored for use in state conditions and realtime graphing?
none
don’t keep historical values–the previous value will be overwritten
each
time the agent polls.
time
keep all values collected within a time period.
count
keep this number of recent values, one for each time the agent has run.
Keep the last
If History is set to time, enter a number of seconds to store all values collected within this period. If History is set to count, enter a number of values to be stored.
Expression
An expression (using TCL EXPR syntax) that calculates, modifies, or reformats the current value of the variable (see Expressions). How the expression is used depends on the variable class:
derived
Reformulate the value of another variable as a rate, proportion, or percentage. Any variables attached to the same sentry (including history variables) may be used in the expression.
trigger
Trigger variables are used to save a previous value that would otherwise be overwritten when the agent receives new data. If this is a trigger variable, use the Expression field to copy the value of another variable you want to save. Any variables attached to the same sentry (including history variables) may be used in the expression.
raw
An optional expression to post-process the data received from the agent (contained in $data) E.g change units from KB to MB. Note that for “raw variables” the only variable that can be used in the expression is $data, which is the value returned by the agent.

Note
To return a floating point number, put .0 at the end of any constant values. This is because TCL will do an integer calculation if both parameters are integers. Example: if $data is an integer, $data / 1024.0 will return a floating-point value; $data / 1024 will return an integer.


Initial value
An expression (using TCL EXPR syntax) that, when evaluated, returns an initial value for the variable. This can be used to set a starting value before the first time the agent polls, or to initialize a trigger variable before it is set with a real value. This is important if, for example, this variable is used elsewhere in an arithmetic expression, to avoid the calculation generating a data error.

The last two fields affect how the variable will appear on the console.

Units
Choose from the table of descriptive units. To add a new type of unit, see Maintain List of Numeric Units.
Decimal places
The value will be rounded to this number of decimal places.

Click Accept to save the variable.

Adding a Sentry

  1. If the console is not in Host View, select Go > Hosts.
  2. Select the host that the agent will run on.
  3. Select Configure > Host monitor. The ‘All Sentries’ window opens, showing details of all the sentries defined on this host.
  4. Select Maintain > Add. The ‘Add Sentry’ form opens.
Image:Sentinelfigure25.jpg
Figure 25 — Sentry details form

Knowledge Base

Choose the name of the KB that this sentry belongs to. Click to choose one of the predefined KBs installed on this host, otherwise leave this field blank if the agent is not associated with a particular KB.

Class/Folder
Choose the name of the folder that the sentry will appear within in the console.
Sentry
Enter a name for the sentry. The name must not contain spaces.
Host
The host to which this sentry applies. If you leave the field blank it defaults to being the Host Monitor host. If the agent is running remotely from the Host Monitor, you may want the icon for a resource to appear under a different host. If so, you can enter the name of the remote host here.
On/Off
Set the initial condition of the sentry. on means the sentry will be operating normally. This means the agent will be running and setting variables to be tested by the sentry. off means the agent is not required to collect data on behalf of this sentry, in which case the agent will not be running (unless it also happens to be collecting data for another sentry). You can switch the sentry off if you wish to test it before running it on a production system— see Running Sentries in Test Mode .
Description
Enter a description for the sentry.
Primary agent
Click to choose the main agent whose variables supply this sentry with data. If other agents also supply variables, list them in the Secondary agents field on the Advanced options form below. Variables from the primary agent can be referenced by their name alone. Note that if a primary agent and a secondary agent both have a variable with the same name, the primary agent’s variable is used.

Instance details

If the primary agent supports multiple instances, click the Instance details button to specify the instance details. There are three ways to define sentry instances:

Clone
Tick this checkbox to clone (create another instance of) a new sentry for each instance return by the multi-instance agent.
Clone if
If Clone is ticked, you can specify an optional TCL expression. New sentries will only be cloned if this expression evaluates to true. Example: You can create two almost identical sentries, one for small filesystems ( < 1GB) and one for large filesystems ( >= 1GB) with different thresholds. Both would use the same agent, but each would have Clone if set to "$size < 1000" and "$size >= 1000" (where $size is in MB) respectively. Sentinel3G will then clone the appropriate sentry only. The two sentries can even have the same name so that they look indistinguishable on the console. If this field is left blank, new sentries will always be cloned.
Discover insts
This is an optional command that is run when the Host Monitor starts up. For example, the commend could return a list of object names. A sentry instance will be generated for line of data returned by the command.

Instances

The Instances window enables you to define sentries explicitly. If the Clone field is ticked, you can predefine some of the attributes of the cloned instances (for example, to specify a different label or to turn the instance off).

Instances
Click to maintain the list of instance names. Select Maintain > Add to add up to four instances at a time.
Instance
The name of a specific instance, which the sentry passes to the agent. through the $Instances variable in the agent command.
On/Off
Set the initial condition of the sentry. on means the sentry will be operating normally. off means the sentry will not process agent data unless it is switched on manually.
Label
If set, this will be used on the console as the name of the instance. If it is not set, the instance name will be used on the console.
Group
If the sentry has Instance Groups defined, you can optionally pre-assign the instance to a particular group here.
Agent data
Agent-specific data for this instance (used by certain agents only). Example: the ProcessInfo agent can be passed a regular expression in this field to match process names.

The Instances of Sentry … window includes options to turn off instances or assign them to instance groups.

The Turn on an Turn off methods on the Instance menu simply turn on or off the selected instance. These options work on either cloning or explicit instance sentries. For example, for the cloning sentry Free_Space just add the instance for a particular filesystem and turn it off.

Assign to group lets you add selected instances to a previously defined instance group.

Maintain > Instance groups brings up the instance groups defined for this sentry.

When you have finished filling in the Instances form, click Accept to save the instances. When you have finished adding instances, press F3 in the Instances form to return to the Sentry Instance Details form.


Note
Changes to instances will not be processed until you exit the Instances window and restart the Host Monitor.


Instance label
If set, this will be used on the console as the name of the instance. You can use a raw variable to give a more meaningful label (for example: use $printer to label each instance with the printer name). If this field is blank the instance will not have a label.
Separate Logs?
Tick this checkbox to create a separate log file for each instance. Leave the checkbox blank to write entries for all instances to a combined log file.

When you have finished filling in the Sentry Instance Details form, click Return to complete the remaining fields in the main Add Sentry form.

Console fields

Define how the sentry should appear on the console:

Text
Enter a text string to provide extra information about the current state of the sentry. The string will be displayed in the status area of the console, and may contain both informative messages and the values of variables returned by the primary agent or a secondary agent. Example: HTTP hits: &HttpHits; HTTP errors: &HttpErrors If a variable name is prefixed with $, the raw or unformatted value will be displayed. If a variable name is prefixed with & (ampersand), the formatted value will be displayed. The formatted value appends the Units field, if specified, and converts some variable types such as dates from internal storage format to display format.
Icon
Click to choose an icon to represent this sentry on the console. To see what each icon looks like or to add new icons, see Add Icons.
Indicator
Select a type of overlay icon to represent the current state of the sentry or to give a rough indication (to within 10 percent) of the current data value returned by Variable.
default
Represent the current state of the sentry with the default overlay icon for that state or severity.
pie chart
Represent the current percentage value returned by Variable as a small pie chart.
thermometer
Represent the current percentage value returned by Variable as a thermometer.
Variable
Click to choose a variable to be represented next to the icon for this sentry. You specify in the Indicator field whether to show the value of the variable as a pie chart or a thermometer icon.

Note
For the indicator to work properly, the variable you choose must always be in the range 0 to 100.


Default action
Click to specify the action that is to be performed by default when an operator double-clicks this sentry on the console.
Variables
Display the contents of the variables returned by the agent Graph Draw a realtime graph. Click to choose a predefined graph.
Action
Run a predefined action command. Click to choose a predefined action.
Logged_data
Generate a logged data report. Click to choose a predefined report. Click Return to complete the remaining fields in the main Add Sentry form.
Notes file
Enter the file name only (without the path) of a notes file in the Sentinel3G doc directory. These notes will be available from the console to operators when monitoring or responding to alerts relating to this sentry.

Advanced options

Click the Advanced button to display some additional options relating to notification and data logging.

Image:Sentinelfigure26.jpg
Figure 26 — Advanced sentry options form
Notification type
Select none to turn off notification for this sentry. Select default to use the global NotifyList and

NotifySeverity settings (see Maintain Notification Settings). Select specify to use the Whom to notify and On severity fields to override the global settings.

Whom to notify
Choose the name of one or more users to be notified when this sentry changes state. This overrides the global NotifyList setting.
On severity
Select a threshold level at which the user(s) listed in the Notify field should be notified. Notification will happen when the sentry changes into a state with this severity or higher. This overrides the global NotifySeverity setting.
Show variables
Click to choose the variables collected by the agent that are used by this sentry. You can do this to shorten the list of variables that will be displayed when an operator double-clicks on the sentry, and the list of variables available for graphing and reporting, or for running actions. This is useful for an agent that returns a large number of variables,

such as sar performance statistics which may include CPU, memory, and network, all from the one agent. When an operator double-clicks on a CPU sentry, you can arrange for them to see only the variables relating to CPU statistics. Leave this field blank to show all variables associated with the primary agent.

Data Logging

The fields in the Data Logging frame are used to specify what data will be collected for the Logged Data Report. This report can be used to check the events leading up to an alert, and for longer-term trend analysis and capacity planning.

If you wish to be able to generate a Logged Data Report for this sentry, you need to specify now which numeric variables must be logged and how often. There are two logging methods: time-based (variables are logged at the specified time interval) and state-based (variables are logged after every poll while the sentry is in a specified state or higher).


Note
the period for time-based logging is approximate only. The value is actually taken from the next poll after the interval. It follows that there is no benefit to logging data more often than the polling frequency.


Avoid specifying too short a period, otherwise the log files can grow large very quickly. It’s better to log data only occasionally during periods of normal operation. then increase the logging frequency during alerts using the Or on severity field.

Enable logging
Select this option to enable collection of logging data for this sentry.
Log variables
Click to choose which numeric variables to be logged. A snapshot of the current values of all these variables will be added to the log file at the frequency specified in the Every field.
Default settings?
Select this option to use the global DefLogTime and DefLogSeverity settings. Leave this option unselected to

specify a period and minimum severity level manually.

Every
Enter a period in minutes. Examples: enter 10 to log data every 10 minutes; enter 60 to log data every hour. Enter 0 to log data every time the agent polls. This field overrides the global DefLogTime setting.
Or on severity
You can also log data while the sentry is in a particular state or any state of a higher severity. The variables are logged every time the agent polls. This is a way to selectively log more data during alerts. For example, select severe to log every poll while the sentry is in severe or critical state. To switch off state-based logging, select never. This field overrides the global DefLogSeverity setting.

Secondary agents

Secondary agents
If the sentry needs to use variables collected by an agent other than the Primary agent, click to specify these secondary agents.

From the Secondary Agents window, select Maintain > Add then enter the following fields:

Agent
Click to choose the secondary agent.
Instance
Enter the name of the instance. Leave this field blank to use the same instance as the sentry.
Trigger sentry?
Select this option to force the sentry’s state to be reevaluated when the agent returns new data.

Click Accept to save the details of this secondary agent.

When you have finished specifying secondary agents, press F3 to return to the Advanced Sentry Details form.


Note
Changes to secondary agents will not be processed until you exit the Secondary Agents window.


No-data state
Click to choose a default state to be used if the agent doesn’t return data for the sentry. An instance of this sentry will change to this state if the agent stops returning data for that instance. Example: when a filesystem is unmounted and df no longer returns details about that filesystem, then the sentry for that instance only will be put into the No-data state. A Delete state is often used in cases like this, where multiple instances of a sentry are created by “cloning”, and you wish to selectively suppress any instance while it is not returning data.
Discovery pgm
This is an optional command that is run during Host Monitor startup. Its job is to return an exit status of true or false based on the existence or status of a resource. If the discovery program returns false, this sentry will not be started.

When you have finished setting the advanced sentry options, click Accept to return to the main Add Sentry form.

If you have finished defining the sentry, click Accept to save it and return to the ‘All Sentries’ window.

Adding States

The next step is to define the states that the sentry can be in.

  1. If the console is not in Host View, select Go > Hosts.
  2. Select the host that the agent will run on.
  3. Select Configure > Host monitor. The ‘All Sentries’ window opens, showing details of all the sentries defined on this host.
  4. Select the sentry, then select Maintain > States. The ‘States for sentry <sentry_name>’ window opens.
    States are listed in the order in which the entry conditions are evaluated. By default, this is in order of decreasing severity, with the most severe at the top. The first state whose entry condition evaluates to true will cause the sentry to enter that state. It follows that states with a NULL entry condition (usually the ‘normal’ state) should be last, for example:
    critical $pct_free == 0
    severe $pct_free < 5
    alarm $pct_free < 10
    warning $pct_free < 15
    normal <null>
    You can the drag the states into a different order using the Order > Reorder menu option.
  5. There are three ways in which you can add states:
    • If the sentry was created by cloning, a separate copy of the original sentry’s states is made. If these states are exactly as required, you don’t need to do anything more. If you need to modify a state in some way, select it now and use Maintain > Change to make the changes (see Maintain State Details).
    • If there is another sentry that already has a set of states that are similar or identical to what is required, you can copy that sentry’s states: select Maintain > Copy states, then choose the sentry.
    • If the states are exactly as required, you don’t need to do anything more. If you need to modify a state in some way, select it now and use Maintain > Change to make the changes (see Maintain State Details).
    • You can add each state manually: select Maintain > Add. The ‘Add States’ form opens, as shown in Figure 27.
Image:Figure27.jpg
Figure 27 — State details form

Note
Copy states is only available if the sentry has no states defined. If the new sentry already has states and you wish use the states of another sentry instead, remove the states from the new sentry first.


Maintain State Details

To define the sentry’s attributes and appearance while in this state, enter these fields:

State
Enter a unique name for the state. The name must not contain spaces.
Severity
Select the severity level for this state. The severity determines how the sentry will look while in this state (that is, its color and the color and type of any associated overlay icon). Note that this will result in a notification message being sent if the new severity is at or above the notification level specified globally or for this sentry.
The options are listed in order of increasing severity from normal to critical. disabled is a special severity that indicates the

sentry is ‘down’ or otherwise unavailable, but doesn’t require attention (for example, a device that has been taken offline for maintenance).

Description
Enter a description for the state.
Entry condition
Enter a conditional expression. This is a TCL expression made up of any combination of agent variables, constants, text strings, numbers, history variables, boolean values, and TCL functions.
Examples:
$Status == "Off" && $PID != "-1"
$pct_free < $LOW
[hist_avg @cpu_idle]

If the entry condition is left blank it evaluates to true. For the correct syntax to refer to variables, see Expressions.

Console

These fields define how the sentry will appear on the console while in this state:

Text
Enter a text string to provide extra information about the current state of the sentry. The string will be displayed in the status area of the console, and may contain both informative messages and the values of variables. Example: HTTP hits: &HttpHits; HTTP errors: &HttpErrors…

where HttpHits and HttpErrors are the names of variables returned by the primary agent or a secondary agent.

If you prefix a variable name with & it will be formatted for output on the console.

For example, numeric variables will be displayed with the correct number of decimal places and with the units after the value, while date/time values will be formatted as a readable dates or times. If you prefix a variable name with $ the raw value will be displayed on the console.

You can change the appearance of the sentry’s icon while it is this state, by specifying a different icon or by overlaying the normal icon with an additional indicator icon to modify its appearance.



Note
To add new icons, see Add Icons.


Icon
Click to choose a different icon to represent this sentry while it is in this state. Example: the sentry normally has a 32x32 icon representing a remote system. When it is in network_down state, you choose to use another icon that has a red X indicating a problem with the network connection.
If Icon is left blank the default sentry icon will be used.
Indicator
Click to choose a 16x16 indicator icon. This is an additional small icon that overlays the main icon in the top right hand corner.
You can use this to modify the appearance of the icon while the sentry is in this state.
If Indicator is left blank the overlay specified in the sentry (thermometer or pie chart) will be used. If neither overlay is specified in the sentry, the default overlay icon and color for the current severity is used (see Overlays and Indicator Icons).
Notes file
Enter the file name only (without the path) of a notes file in the Sentinel3G doc directory. These notes will be available from the console to operators when monitoring or responding to alerts relating to this sentry.

Trigger variables

Click next to the Trigger vars field to list the trigger variables for this sentry. Choose one or more variables whose values you want to reset when the sentry enters this state (see Trigger Variables).

Responses

The Responses button lets you define several responses that can be run automatically while the sentry is in this state. Responses Click to see the options for notification, escalation and running automatic responses.

Image:Sentinelfigure28.jpg
Figure 28 — Defining the responses for a state

The Response form contains three blocks of response fields which are run in turn if the sentry remains in this state for longer than the specified period. Each response block specifies a waiting period, then three actions Sentinel3G can take at the end of each period. The actions are: changing the severity level, which in turn changes the appearance of the sentry on the console; sending a notification message; and running a command. These options are not mutually exclusive; you can specify any or all actions in each response block.

The final group, the Escalation/acknowledgement block, specifies a period after which the sentry will be forced into a different state. See Escalation/acknowledgement.

A typical response is to run a command to fix the problem, and if it succeeds to return the sentry to a normal state. If the command doesn’t fix the problem you may choose to leave the sentry in that state, and specify a later response to run another command or to notify someone.

These responses are all optional. You can specify all three, or one, or even none. If no responses or escalation period are specified here, the sentry will remain in this state until and unless the evaluation condition evaluates to a different severity level.

If a sentry changes state while it is waiting to process a response, then all responses for that state are cancelled, and any responses for the new state are started.

Wait
The length of time to wait after running the previous response, or if this is the first response, after entering this state. Each period is cumulative. In other words the wait period for Response #2 is counted from the end of the period for Response #1.
By default the wait time is in seconds. However an optional suffix of either secs, mins, hrs or polls can also be used to change the unit of time. For example, if the primary agent's poll time is 120 seconds, "2 polls" will wait for 240 seconds before activating.
If the wait time of Response #1 is set to 0, the response will occur as soon as the sentry enters this state.
New severity
After the wait period, change the appearance of the sentry on the console to this new severity level. This would usually be done to trigger a global or sentry-level notification message or to increase the apparent urgency of the event by making the icon flash or change color. Note that changing the severity does not change the state of the sentry. Select unchanged if you wish to leave the severity level as it is.
Notification
Click to specify who will be notified if the sentry is still in this state at the end of the wait period.
In the Type field, select one of the following:
default
Use the default notification list for this sentry.
specify
In the Who field, choose the names of one or more users.
These are in addition to the default notification level for this sentry (see Advanced options).
If you don’t want to do any additional notification for this response, select none.
Command
Enter a command to be run by the Host Monitor. This would usually attempt to fix the problem. Example: a ‘free disk space’ sentry could archive files to an offline storage device or remove files such as core and *.o that are deemed expendable.
Fire agent
Choose an agent to be run. This agent will be polled immediately at the end of the wait period. To poll a specific instance only, enter the instance name in brackets after the agent name. Example: Filesystem(/tmp). This is useful to cause the sentry's states to be re-evaluated quickly after running a response, rather than having to wait for the next poll.

Escalation/acknowledgement

Another way to respond to an alert is simply to wait for a while to see if the problem corrects itself or more information is received, then to change to another state at the end of that period.

Image:Sentinelfigure29.jpg
Figure 29 — Defining the escalation condition for a state

Change to a more severe state if the problem would normally be expected to resolve itself either spontaneously or by the running of the automatic responses. If the sentry is still in this state at the end of the waiting period it suggests some other action must be taken.

Change to a less severe state (typically, normal state) if the problem appears to have been a one-off event. For example, the Bad_SU sentry goes into warning state if a failed su attempt is detected. If no further failed su attempts are detected by the end of the waiting period no action need be taken and the sentry can be returned to normal state.

The change of state may depend on manual confirmation from an operator (Acknowledgement) or it may happen automatically (Escalation).

Wait
The length of time to wait after running the previous response, or if there are no previous responses, after entering this state. The format and meaning of this field is the same as for Responses, above.
Go to state
Choose the new state to change to.
Type
Select acknowledge if the change of state depends on manual confirmation from an operator. Select escalation if the change of state should happen automatically at the end of the waiting period. Typically "acknowledgement" is a change to a less severe state, and "escalation" is to a more severe state.

When you have finished defining responses and escalation details, click Accept to return to the main Add State form.

If you have finished defining this state, click Accept to save it and return to the ‘State Details’ window.

Constants

Click next to the Constants field to maintain the list of constants and thresholds for this sentry. You can add a new constant, change the details of an existing constant, or adjust the threshold values at which the sentry changes from one state to another. For details about all these tasks, see Maintaining Constants and Thresholds.

Adding Instance Groups

A multi-instance sentry can optionally have a set of Instance Groups, a feature which allows certain instances to use different configurations from that defined in the sentry. The configurations that can be set at the Instance Group level are:

An instance is assigned to an Instance Group one of two ways:

Once assigned to an Instance Group, the configuration for that group is used, overriding those defined in the sentry.

Notes:

  1. The assignment to an instance group happens once only when the Host Monitor starts.
  2. If an instance is not assigned to a group, the default configuration from the sentry is used.

To maintain the Instance Groups of a sentry:

  1. From the ‘Sentries’ window, select the sentry.
  2. Select Maintain > Instance groups. The ‘Instance Groups of sentry <sentry_name>’ window opens.
  3. Select Maintain > Add. The ‘Add instance group (Sentry <sentry_name>)’ form opens.
Group
Enter the name of the Instance Group.
Description
Enter the description of the group.
Icon
To use a different icon on the Sentinel3G console for instances of this group, enter it here.
Label
To have a different label on the Sentinel3G console for instances of this group, enter it here. The label may contain the variables $Instance, $Group or any "raw" agent variable.
Assign if
If instances are to be dynamically assigned to the group, enter a TCL boolean expression. The expression is evaluated for each instance of the sentry, and when true, the instance is assigned to this group. The expression may contain the variable $Instance and/or and "raw" agent variable.
Constants
Click this field to override some or all of the constants (thresholds).
Notification
Click this field to override the users to be notified about instances in this group.

Click Accept to save the group.

Adding an Action or Report

A sentry can have several associated actions, which an operator can choose to run from the console. Actions may either be tied to particular states, or can be made available when the sentry is in any state. There are two types: actions typically are used to try to fix a problem; reports display output on the screen and help the operators to diagnose the problem. You can assist operators by explaining in the monitoring notes for the sentry or state when and how each action should be used.

When designing an action for a multi-instance sentry you can set it up to run for selected instances or for every instance in a parent folder. For example, you can set up an action so that the output for all instances is combined into one report.

  1. From the ‘Sentries’ window, select the sentry.
  2. Select Maintain > Actions. The ‘Actions for sentry <sentry_name>’ window opens.
  3. Select Maintain > Add. The ‘Add actions for sentry <sentry_name>’ form opens.(Tip: If a similar action has already been defined for a sentry that uses the same agent, it may be faster to use Maintain > Copy.)
Image:Sentinelfigure30.jpg
Figure 30 — Action details form
Action
Enter a name for the action. This is the name that will appear in the list of actions that the operator can select from.
Type
Select whether or not you want the output from the command to be displayed on the operator’s screen:
action
Simply runs the command without displaying any output. Example: starting a service when it is stopped.
report
Displays the command’s output on the screen.
Command
Enter the command, using UNIX shell syntax. The command can make use of the variables $Sentry, $Host, and $Action,

which will be set in the environment when the action is run. For multi-instance sentries you can also refer to $Instance, which contains the instance name. To use agent variables in the command, select Uses agent data? below.

This example shows how to define a simple report for a singleinstance sentry:

echo -n "Report '$Action' "; date; echo " Sentry: $Sentry"; echo " Host: $Host";

When the report is run it will display the name of this action, the date, and the name of the sentry and the host it runs on. For more details and examples of actions and reports, see Actions.
Display command
If Type is report, enter a command to display the output from the Command (examples: scroll, db_scroll, db_graph). The default is Sentinel3G’s own browser widget. If the action is run on several instances, all the output from all the

commands will be piped to the same display command.

Display command is optional if Command handles the displaying of the data itself.
Run as user
Choose a user name from the password file on this host. The command will run with the privileges of this user. The default account is root. Example: some RDBMS packages require that certain administrative commands be run from a special DB admin

account.

In state(s)
You can make this action available in only certain states. Example: the Services sentry has Stop and Restart actions that are available when a service is in a state that indicates it is running, and a Start action when it is not running.
Click to choose one or more states. Leave this field blank to make the action available at all times.
Access role
If set, only users with the specified role can perform this action. If blank, all Sentinel3G users who have the action capability may perform this action.
Authenticate?
Tick this field to ask for the operator’s password before running the action.
Uses agent data?
Tick this checkbox if you wish to use any of the agent variables in the Command or Display command fields. This gives the commands access to the same primary agent variables as the sentry.
Reads from STDIN?
If Uses agent data? is ticked, use this field to specify where the action can find the data:
no
Command will expect the variables to be set in the environment and accessed by name (e.g. $pct_free).
yes
The data will be passed from STDIN in Functional Database format. Use this option if you wish to manipulate the data using the Functional Toolset.
Fire agent
Tick this checkbox if you wish the sentry’s agent to be polled after the action has been run.
Export to parent?
Tick this checkbox if you wish the action to be available from the parent folder of this sentry. If the action is exported, operators will be able to choose this action both in relation to a selected sentry and for all sentries in the parent folder.
Example: a Free Space report that displays details for a selected filesystem (single sentry) or all user filesystems on a host (parent folder).

Click Accept to save the action.

To test the action from the console, select the sentry and then select Sentry > Action.

Adding a Realtime Graph

Realtime graphs plot recent values returned by selected variables for a sentry.

Image:Sentinelfigure31.jpg
Figure 31 — Example: displaying free disk space as a stack graph
  1. From the ‘All Sentries’ window, select the sentry.
  2. Select Maintain > Realtime graphs. The ‘Realtime graphs for sentry <sentry_name>’ window opens.
  3. Select Maintain > Add. The ‘Add realtime graphs for sentry <sentry_name>’ form opens.

Tip
If a similar graph has already been defined for a sentry that uses the same agent, it may be faster to use Maintain > Copy.

Specify attributes of the graph

Graph type
Select the type of graph you wish to use to display the data:
line
line graphs are useful for gauging trends.
bar
bar graphs are useful for comparing variables within one observation or comparing adjacent observations.
stack
stack graphs are typically used where all values add up 100%. Example: CPU usage = %user + %system + %idle

Figure 32 shows the same data presented using each type of graph:

Image:Sentinelfigure32.jpg
Figure 32 — Sample disk space data shown as line, bar, and stack graphs
Polls displayed
The number of values to display across the X-axis of the graph. For example, if you enter 3, values from the last three polls will be displayed.

The first two fields control the scale on the graph’s Y-axis. If Min value and Max value are not specified, the Y-axis will be sized to the current minimum and maximum data value. This means the scale may change as new values are graphed. To keep the scale constant, set both Min value and Max value.

Use close minimum and maximum values if you want to focus on relatively small differences among data values. For example, if a set of variables is mainly of interest when the values are clustered near 100%, a minimum value of 90 will help to separate them.

Min value
The minimum value to display next to the Y-axis. If the values will always be positive, set Min value=0.
Max value
The maximum value to display next to the Y-axis.
Scale to max?
(For stack charts only) Tick this checkbox if you want the values to be scaled so that their sum equals Max value. This is useful where the total adds up approximately to Max Value. Scaling ensures a flat top to the stack.

Specify variables

You can now choose the names of up to five agent variables whose values are to be graphed.

Click next to the Variable details field to specify the attributes of the chosen variables. For example, Figure 33 shows the details for two variables called MBfree and MBused.

Image:Sentingfigure33.jpg
Figure 33 — Example of a realtime graph that plots two variables

For each variable, enter the following details:

Color
Select the color to be used to display this variable.
Label
If you wish you can change the default label displayed for this variable. For example, if you wish to scale down a value for free disk space by a factor of 1000, you could also change the label to read GB (gigabytes) instead of MB (megabytes).
Scale by
This is an optional scaling factor. The values displayed will be multiplied (scaled up) by this factor. Use this to convert very large or small numbers to more manageable units. Example: specify 0.001 to divide the reported values by 1000.

Click Return to save the variable details.

Specify threshold markers

You can now specify up to four markers to be superimposed over the data values.

Each marker is displayed as a colored horizontal line and represents a state threshold or other significant value. You can specify both constants associated with this sentry and enter arbitrary integers or floating-point numbers (such as 20, 40, 60, 80). Click next to the Threshold markers field to specify up to four markers. For each marker, specify these details:

At value
Enter a floating point number, or click to choose one of the constant values defined for this sentry. See Maintaining Constants and Thresholds.
Color
Select the color to be used to display this threshold. Remember to use a different color from those used to graph the variables. Use the Test graph option to see which colors show up best.
If the threshold is equivalent to a boundary between states, it may be helpful to use the color of the severity level for the higher state. For example, if the constant LOW is the boundary between normal and warning state, and the sentry goes orange when it is in warning state, use orange as the color of the threshold marker.
Image:Sentinelfigure34.jpg
Figure 34 — Generating a Test graph to check colors and thresholds

When you have finished specifying markers, click Return to return to the ‘Add realtime graphs for sentry <sentry_name>’ form.

Test the graph

Now you can test the appearance of the graph. Click next to the Test graph field to generate a graph based on the settings in the form and the most recent data returned by the agent on this host.


Note
The host monitor must be running and the agent must be returning valid data.


You can display several graphs at once by experimenting with different settings and clicking Test graph again. If this is a multi-instance sentry you can test different instances.

When you are finished with each graph press F3 to dismiss it.

Save the graph details

Graph name
Enter a unique name to identify this graph.
Description
Enter a description that explains what this graph will show or when it should be used. This will help operators to select the correct graph to diagnose problems.
Title
The title that appears in the heading of the graph. It can contain plain text, a variable such as $Instance (for a multi-instance sentry, the name of this instance), or a combination of the two.
Export to parent?
Tick this checkbox if you wish the graph to be available from the parent folder of this sentry. If the graph is exported, operators will be able to choose this graph when they select the parent folder.

Click Accept to save the graph.

To test the graph from the console, restart the host monitor, select the sentry and then select Report > Realtime graph.

Maintaining Constants and Thresholds

Constants are like variables, but they are associated with a sentry rather than an agent. You can use constants in a state’s Entry condition field to define thresholds between states, and as a visual aid on realtime graphs.

Example of use: You create a sentry and its states. Some of the states have an entry condition that compares the current data value from the agent with a constant such as VERY_LOW. You clone the sentry. The same set of states is shared between the old and new sentry, but you set the constant VERY_LOW to different a value in each sentry.

To display the constants for a sentry

  1. From the ‘All Sentries’ window, select the sentry.
  2. Select Maintain > Constants. The ‘Constants for sentry <sentry_name>’ window opens.

To add a constant for a sentry

  1. Select Maintain > Add. The ‘Add Constants’ form opens.
  2. Enter the following fields:
    Constant
    Enter a name for the constant. The convention is to use uppercase letters and underscores only (e.g., HALF_FULL). The name must be different from other constants belonging to this sentry, though another sentry can have a constant with the same name.
    Value
    Set the value of the constant (examples: 3; 0.5; true).
    Comment
    Enter an optional comment.
    Group override?
    Can an instance group override the value of this constant? If this option is set to yes, the value of this constant always applies to any instance that uses it. If this option is set to no, the value set in an instance group can override the value set here.
  3. Click Accept to save the constant.

To adjust the values of a sentry’s constants

You can adjust the values of all the constants belonging to a sentry. You can use this to fine tune the thresholds at which a sentry changes from one state to another.

  1. From the ‘Constants for sentry <sentry_name>’ window, select Maintain > Change values.
  2. Change the values next to any of the constants.
  3. Click Accept to save the new values.

Note: If the sentry has Instance Groups defined, you may also need to change the values of constants defined there.

Running Agents or Sentries in Test Mode

When you are developing sentries, you leave them “switched off ” until you are ready to move them to production mode. You can use the commands "hostmon -A" to test agents, and "hostmon -T" command to test sentries (and their agents) even if they are off, or are in KBs that are off.

Running the sentries in test mode will attempt to start all the agents required by the selected sentries and display status messages including an error messages. You can correct any configuration problems and retest the sentries. When you are satisfied that the sentries will work correctly you can change their condition to on.

To test sentries, start a Sentinel3G shell then run hostmon -T <sentries…>.

Example:

cos sentinel -c bash
hostmon -T Clients Swap_Size

To just test an agent to verify that it's variables are being set as expected, run hostmon -A <agent>

Example:

hostmon -A db_agent