FS
Documentation

Sentinel3G Concepts

This page was last modified 08:16, 14 June 2013.

From Documentation

(Difference between revisions)
Jump to: navigation, search
Revision as of 03:08, 28 April 2006
Daniels (Talk | contribs)
(Trigger Variables)
← Previous diff
Current revision
Mike (Talk | contribs)
(Responses)
Line 17: Line 17:
The Host Monitor takes some action based on the severity of the event, such as running a predefined command. Persistent problems that can’t be resolved automatically are escalated and passed to operations staff for action. Staff are notified of events by a console, and if necessary by some other means such as e-mail. The Host Monitor takes some action based on the severity of the event, such as running a predefined command. Persistent problems that can’t be resolved automatically are escalated and passed to operations staff for action. Staff are notified of events by a console, and if necessary by some other means such as e-mail.
-A console is a kind of ‘head-up’ display that gives a concise hierarchical view of the current state of the sentries being monitored. It is both a means to alert operators of an event and a means for them to monitor and respond to events.+A console is a kind of ‘heads-up’ display that gives a concise hierarchical view of the current state of the sentries being monitored. It is both a means to alert operators of an event and a means for them to monitor and respond to events.
Consoles can present information in customized views, such as by region, by host, or by function. Different classes of user see an appropriate level of detail: from a broad enterprise-wide summary for managers to fine detail for operators and enduser administrators. Console users can select a predefined view or sort and filter sentry data to help diagnose problems. Reports provide more details about the current state of the sentry. Graphs chart the changes in the value of agent variables and Consoles can present information in customized views, such as by region, by host, or by function. Different classes of user see an appropriate level of detail: from a broad enterprise-wide summary for managers to fine detail for operators and enduser administrators. Console users can select a predefined view or sort and filter sentry data to help diagnose problems. Reports provide more details about the current state of the sentry. Graphs chart the changes in the value of agent variables and
Line 42: Line 42:
This topic lists all of the overlay icons by type and gives a brief description. This topic lists all of the overlay icons by type and gives a brief description.
-Overlays that represent the type of a sentry or folder Indicators that represent a sentry’s state If an indicator is not specified for a state, the default indicator specified in the sentry (thermometer or pie chart) will be used. See [[Indicators that represent data values from a sentry’s variable]].+Overlays that represent the type of a sentry or folder Indicators that represent a sentry’s state If an indicator is not specified for a state, the default indicator specified in the sentry (thermometer or pie chart) will be used. See [[#Indicators_that_represent_data_values_from_a_sentry.E2.80.99s_variable | Indicators that represent data values from a sentry’s variable]].
-If no overlay is specified in the sentry or state details, the default overlay icon and color for the current severity is used. See [[Indicators that represent a sentry’s severity]] on page 18.+If no overlay is specified in the sentry or state details, the default overlay icon and color for the current severity is used. See [[#Indicators_that_represent_a_sentry.E2.80.99s_severity | Indicators that represent a sentry’s severity]]
{| border="1" cellpadding="3" cellspacing="0" {| border="1" cellpadding="3" cellspacing="0"
Line 52: Line 52:
|<strong>Description</strong> |<strong>Description</strong>
|- |-
-|+|[[Image:Unlocked_folder.jpg|center]]
|Bottom left |Bottom left
|This object is a user-defined folder, containing sentries and possibly other sub-folders. |This object is a user-defined folder, containing sentries and possibly other sub-folders.
|- |-
-|+|[[Image:Locked_folder.jpg|center]]
|Bottom left |Bottom left
|This is a locked or system folder– its contents can be modified but the folder itself can’t be removed as it is required by Sentinel3G. |This is a locked or system folder– its contents can be modified but the folder itself can’t be removed as it is required by Sentinel3G.
|- |-
-|+|[[Image:Information_only_sentry.jpg|center]]
|Top right |Top right
|This is an information-only sentry. It has no states. It gives information through its console text and property sheet. |This is an information-only sentry. It has no states. It gives information through its console text and property sheet.
Line 73: Line 73:
|<strong>Description</strong> |<strong>Description</strong>
|- |-
-|+|[[Image:Check_box.gif|center]]
|Bottom right |Bottom right
|This sentry is requesting acknowledgement from an operator before changing to another state. |This sentry is requesting acknowledgement from an operator before changing to another state.
|- |-
-|+|[[Image:Service_running.jpg|center]]
|Top right |Top right
|This sentry represents a service that is running. |This sentry represents a service that is running.
|- |-
-|+|[[Image:Service_disabled.jpg|center]]
|Top right |Top right
|This sentry represents a service that is not running. Check the console text, notes, or property sheet to find out why. |This sentry represents a service that is not running. Check the console text, notes, or property sheet to find out why.
|- |-
-|+|[[Image:Service_not_running.jpg|center]]
|Top left |Top left
|Notification for this sentry has been disabled. |Notification for this sentry has been disabled.
Line 95: Line 95:
The severity of a folder is the maximum severity of all the sentries and sub-folders it contains. The severity of a folder is the maximum severity of all the sentries and sub-folders it contains.
-Icon Color Severity Description+ 
-Wait Sentry is starting+{| border="1" cellpadding="3" cellspacing="0"
-grey Disabled Sentry is in a state where data is not being+|-
-returned. Examples: the Host Monitor is+|<strong>Icon</strong>
-down, or the resource itself is disabled.+|<strong>Color</strong>
-grey Down Sentry is reporting that a service is not+|<strong>Severity</strong>
-running+|<strong>Description</strong>
-Normal Sentry is indicating that there are no+|-
-problems+|[[Image:Hourglass.gif|center]]
-blue Information Sentry is reporting matters of interest+|
-orange Warning Sentry is reporting a potential problem+|Wait
-red Alarm Sentry has detected a serious problem that+|Sentry is starting
-should be investigated as soon as possible+|-
-red,+|[[Image:Down16.gif|center]]
-flashing+|grey
-Severe Sentry has detected a very serious problem+|Disabled
-that must be investigated now+|Sentry is in a state where data is not being returned. Examples: the Host Monitor is down, or the resource itself is disabled.
-magenta,+|-
-flashing+|[[Image:Down16.gif|center]]
-Critical Sentry has detected an extremely serious+|grey
-problem affecting the network or a key+|Down
-application, system, or service. Immediate+|Sentry is reporting that a service is not running
-action is needed.+|-
-(none)+|
-Sentinel3G Concepts 19+|
-Indicators that represent data values from a sentry’s variable+|Normal
 +|Sentry is indicating that there are no problems
 +|-
 +|[[Image:Information.gif|center]]
 +|blue
 +|Information
 +|Sentry is reporting matters of interest
 +|-
 +|[[Image:Bang_orange.gif|center]]
 +|orange
 +|Warning
 +|Sentry is reporting a potential problem
 +|-
 +|[[Image:Bang_red.gif|center]]
 +|red
 +|Alarm
 +|Sentry has detected a serious problem that should be investigated as soon as possible
 +|-
 +|[[Image:Bang_red_fl.gif|center]]
 +|red, flashing
 +|Severe
 +|Sentry has detected a very serious problem that must be investigated now
 +|-
 +|[[Image:Bang_magenta_fl.gif|center]]
 +|magenta, flashing
 +|Critical
 +|Sentry has detected an extremely serious problem affecting the network or a key application, system, or service. Immediate action is needed.
 +|}
 + 
 +==== Indicators that represent data values from a sentry’s variable ====
 + 
Two types of overlay icon, called indicators, can represent actual data from a variable. Two types of overlay icon, called indicators, can represent actual data from a variable.
-A percentage value is mapped to either small pie chart or thermometer, in+ 
-increments of at least 10 percent. This gives an immediate indication of what the+A percentage value is mapped to either small pie chart or thermometer, in increments of at least 10 percent. This gives an immediate indication of what the data value is. The type of overlay (pie chart or thermometer) may be specified for each sentry.
-data value is. The type of overlay (pie chart or thermometer) may be specified for+ 
-each sentry.+The amount of the ‘pie’ that is filled in or the height of the thermometer’s filled-in area gives a rough indication of the quantity being reported. Here are some examples:
-The amount of the ‘pie’ that is filled in or the height of the thermometer’s filled-in+{| border="1" cellpadding="3" cellspacing="0"
-area gives a rough indication of the quantity being reported. Here are some examples:+|-
-Indicator Type 0% 30% 50% 80% 100%+|<strong>Indicator Type </strong>
-Pie Chart+|<strong>0% </strong>
-Thermometer+|<strong>30% </strong>
-20 Sentinel3G Concepts+|<strong>50% </strong>
 +|<strong>80% </strong>
 +|<strong>100%</strong>
 +|-
 +|Pie Chart
 +|[[Image:Piechart_0.gif|center]]
 +|[[Image:Piechart_30.gif|center]]
 +|[[Image:Piechart_50.gif|center]]
 +|[[Image:Piechart_80.gif|center]]
 +|[[Image:Piechart_100.gif|center]]
 +|-
 + 
 +|Thermometer
 +|[[Image:Therm_0.gif|center]]
 +|[[Image:Therm_30.gif|center]]
 +|[[Image:Therm_50.gif|center]]
 +|[[Image:Therm_80.gif|center]]
 +|[[Image:Therm_100.gif|center]]
 +|}
== Sentries and States == == Sentries and States ==
-A sentry is an individual object or resource that is being monitored though+A sentry is an individual object or resource that is being monitored though Sentinel3G. Some examples:
-Sentinel3G. Some examples:+ 
-CPU usage on host titanic+*CPU usage on host titanic
-free disk space on filesystem /usr2+*free disk space on filesystem /usr2
-run queue length on host lusitania+*run queue length on host lusitania
-network printer lusitania attached to host endurance+*network printer lusitania attached to host endurance
-Sentries are grouped into classes and are represented on the console as icons.+ 
-Each sentry has an agent (or possibly more than one agent) that collects data on its+Sentries are grouped into classes and are represented on the console as icons. Each sentry has an agent (or possibly more than one agent) that collects data on its behalf about the resource or object being monitored. The data determines what state the sentry is in at any time.
-behalf about the resource or object being monitored. The data determines what+ 
-state the sentry is in at any time.+An information-only sentry has no states attached to it, but simply provides useful status information to operators in the form of console text or via its property sheet.
-An information-only sentry has no states attached to it, but simply provides useful+ 
-status information to operators in the form of console text or via its property sheet.+You can maintain most things about a sentry from the console, including its constants, actions, agent, and variables. For example, to configure a sentry’s states, just select the sentry and then select Configure > States.
-Figure 3 — Property sheet for an information-only sentry+
-Icon indicates that this+
-is an ‘Informationonly’+
-sentry icon+
-Property+
-sheet+
-Console text shows+
-the latest data from+
-the selected sentry+
-Sentinel3G Concepts 21+
-You can maintain most things about a sentry from the console, including its constants,+
-actions, agent, and variables. For example, to configure a sentry’s states, just+
-select the sentry and then select Configure > States.+
=== States === === States ===
-A sentry’s state represents its current operating status or condition. The entry condition+A sentry’s state represents its current operating status or condition. The entry condition for each state is evaluated in turn until one evaluates to true. Most sentries have a normal state, indicating that it is operating satisfactorily and requires no
-for each state is evaluated in turn until one evaluates to true. Most sentries+action, and a number of other abnormal states of increasing severity. For example, a simple sentry that monitors a service may have only a couple of states showing whether the service is running or not running. A sentry that monitors a resource such as disk space or memory may have several states whose severity increases as the availability of the resource decreases.
-have a normal state, indicating that it is operating satisfactorily and requires no+ 
-action, and a number of other abnormal states of increasing severity. For example, a+
-simple sentry that monitors a service may have only a couple of states showing+
-whether the service is running or not running. A sentry that monitors a resource+
-such as disk space or memory may have several states whose severity increases as the+
-availability of the resource decreases.+
For each state you can: For each state you can:
-modify the appearance of the sentry on the console by changing the main+*modify the appearance of the sentry on the console by changing the main icon or adding an overlay icon
-icon or adding an overlay icon+*provide a range of options and information from the console to help operators resolve problems
-provide a range of options and information from the console to help operators+*specify background actions such as increased data logging and automatic responses.
-resolve problems+ 
-specify background actions such as increased data logging and automatic+A sentry does not have to have a state for every severity level. You can define more than one state for the same severity. Although it is possible to define a large number of states, representing small changes in the sentry, it’s better to have a minimum number of states corresponding to real differences in urgency or severity.
-responses.+ 
-A sentry does not have to have a state for every severity level. You can define more+==== Events ====
-than one state for the same severity. Although it is possible to define a large number+An event is an external incident or condition on a particular host or in a particular application or device that is detected by Sentinel3G and passed to the Event Manager for action. In simple terms, an event is a condition that causes a sentry to move from one state to another state.
-of states, representing small changes in the sentry, it’s better to have a minimum+ 
-number of states corresponding to real differences in urgency or severity.+==== Entry condition ====
-Events+The entry condition is a TCL expression made up of any combination of agent variables, constants, text strings, numbers, history variables, boolean values, and TCL functions. Typically an entry condition tests the value of an agent variable against a predefined constant or threshold. Some examples:
-An event is an external incident or condition on a particular host or in a particular+ 
-application or device that is detected by Sentinel3G and passed to the Event+*comparing a number to an absolute value: $Count > 1
-Manager for action. In simple terms, an event is a condition that causes a sentry to+*comparing a string to an absolute value: $Status == "Unconfigured"
-move from one state to another state.+*comparing a variable to a constant: $pct_free < $LOW
-22 Sentinel3G Concepts+*compound expression: $Status == "Off" && $PID != -1
-Entry condition+ 
-The entry condition is a TCL expression made up of any combination of agent variables,+The entry conditions should cover all possible values returned by the agent. If none of the entry conditions is true, the sentry is put in undefined state. If a sentry is in Failed state, it indicates a problem with the agent (usually that it failed to start
-constants, text strings, numbers, history variables, boolean values, and TCL+or has never returned any valid data).
-functions. Typically an entry condition tests the value of an agent variable against a+
-predefined constant or threshold. Some examples:+
-comparing a number to an absolute value:+
-$Count > 1+
-comparing a string to an absolute value:+
-$Status == "Unconfigured"+
-comparing a variable to a constant:+
-$pct_free < $LOW+
-compound expression:+
-$Status == "Off" && $PID != -1+
-The entry conditions should cover all possible values returned by the agent. If none+
-of the entry conditions is true, the sentry is put in undefined state. If a sentry is+
-in Failed state, it indicates a problem with the agent (usually that it failed to start+
-or has never returned any valid data).+
If a state’s entry condition is left blank it always evaluates to true. If a state’s entry condition is left blank it always evaluates to true.
-Copying states+ 
-When you add a sentry, you can choose to copy the states of another sentry. If the+==== Copying states ====
-states for the new sentry need to be similar but not identical, you can first copy then+When you add a sentry, you can choose to copy the states of another sentry. If the states for the new sentry need to be similar but not identical, you can first copy then edit them. Changes to the states of the new sentry will not affect the original sentry.
-edit them. Changes to the states of the new sentry will not affect the original sentry.+
=== Severity === === Severity ===
-Each state that a sentry can be in has a severity level, representing how serious the+Each state that a sentry can be in has a severity level, representing how serious the event is. When you define each state, the standard severity levels are listed in order of increasing severity from normal to critical.
-event is. When you define each state, the standard severity levels are listed in order+ 
-of increasing severity from normal to critical.+The severity determines how the sentry is displayed on the console—its color, and if it has an indicator icon, the color of the indicator and whether it flashes. The severity is also used for notification. A notification message will be sent if the severity
-Sentinel3G Concepts 23+
-The severity determines how the sentry is displayed on the console—its color, and+
-if it has an indicator icon, the color of the indicator and whether it flashes. The+
-severity is also used for notification. A notification message will be sent if the severity+
of a sentry is greater than or equal to either: of a sentry is greater than or equal to either:
-the global NotifySeverity setting+*the global NotifySeverity setting
-the notification level for that sentry+*the notification level for that sentry
 + 
Notes about severities Notes about severities
-disabled is a special severity that can be used to indicate when a sentry is ‘down’+ 
-or otherwise unavailable, but doesn’t require attention. Examples:+<em>disabled</em> is a special severity that can be used to indicate when a sentry is ‘down’ or otherwise unavailable, but doesn’t require attention. Examples:
-A device that has been taken offline can be put into a state whose severity is+*A device that has been taken offline can be put into a state whose severity is disabled, with console text explaining that it is undergoing maintenance.
-disabled, with console text explaining that it is undergoing maintenance.+*When a group of sentries is not working because of a problem with another sentry, there is no need to have all the sentries showing an alarm over the same problem. For example when Apache or Squid is down, the “status” sentry goes into alarm, but the other (mainly informational) sentries go into a disabled state.
-When a group of sentries is not working because of a problem with another+ 
-sentry, there is no need to have all the sentries showing an alarm over the+<em>information</em> severity shows operators that the sentry has some useful information to report. This can be used as a state above normal state, where there is no problem serious enough to require going into a warning state or higher.
-same problem. For example when Apache or Squid is down, the “status”+ 
-sentry goes into alarm, but the other (mainly informational) sentries go into+Note that this is different from an ‘information-only’ sentry, which has no states and only exists to provide information.
-a disabled state.+ 
-information severity shows operators that the sentry has some useful information+=== Instances ===
-to report. This can be used as a state above normal state, where there is no+Some agents can return data for multiple objects, such as disks on a computer, tablespaces in a database etc. In Sentinel3G these objects are called "Instances", and only one sentry need be configured to handle all the instances. Instances can be listed explicitly in the sentry, or more commonly, the sentry can be defined as "cloning", which means that for each unique instance returned by the primary agent, the sentry will automatically create a new instance of itself.
-problem serious enough to require going into a warning state or higher.+ 
-Note that this is different from an ‘information-only’ sentry, which has no states and+A sentry can optionally define a number of "Instance Groups", providing the ability, among other things, to assign different threshold values to different instances, depending upon their group.
-only exists to provide information.+
=== Expressions === === Expressions ===
-TCL expressions are used when defining state conditions, console text, and variables.+TCL expressions are used when defining state conditions, console text, and variables.
-For example, state conditions include a expression which, when evaluated,+ 
-returns true or false to indicate whether the sentry is currently in that state.+For example, state conditions include a expression which, when evaluated, returns true or false to indicate whether the sentry is currently in that state.
-Expression are written using TCL syntax and can refer to any variables belonging to+ 
-the sentry’s primary agent or secondary agents. Table 1 shows the correct syntax for+Expression are written using TCL syntax and can refer to any variables belonging to the sentry’s primary agent or secondary agents. Normally variables are prefixed with "$". However in console text, variables may instead be prefixed with "&" which displays the variable in a formatted form, including any units. Finally, the history of a variable can be accessed by prefixing the variable with "@". See [[#History_Variables | History Variables]] for more details.
-referring to variables in expressions.+ 
-24 Sentinel3G Concepts+Example:
 + Disk $disk I/O rate: $io_rate => Disk hd2 I/O rate: 145.7
 +
 + Disk $disk I/O rate: &io_rate => Disk hd2 I/O rate: 145.7MB/sec
 + 
 +The following tables list other internal variables that are also available for use in expressions.
 + 
 + 
 +{| border="1" cellpadding="3" cellspacing="0"
 +|-
 +|<strong>Variable</strong>
 +|<strong>Description</strong>
 +|-
 +|$Sentry
 +|The name of the sentry
 +|-
 +|$Class
 +|The name of the sentry's class (aka folder)
 +|-
 +|$Host
 +|The sentry's host
 +|-
 +|$Instance
 +|The sentry's instance (if any)
 +|-
 +|$Group
 +|The name of the instance group (if any)
 +|-
 +|$State
 +|The current state that the sentry is in
 +|-
 +|$Since
 +|The time that when the sentry last changed state
 +|-
 +|$Severity
 +|The current severity of the sentry
 +|-
 +|$PrevState
 +|The previous state of the sentry
 +|-
 +|$Agent
 +|The name of the primary agent
 +|-
 +|$PollTime
 +|The polltime of the primary agent in seconds (polled agents only)
 +|}
 +Table 1a — Internal variables available in sentry and state expressions
 + 
 + 
 +{| border="1" cellpadding="3" cellspacing="0"
 +|-
 +|<strong>Variable</strong>
 +|<strong>Description</strong>
 +|-
 +|$Agent
 +|The name of the agent
 +|-
 +|$PollTime
 +|The polltime of the agent in seconds (polled agents only)
 +|-
 +|$Instance
 +|The agent's instance (if any)
 +|-
 +|$data
 +|The value of the variable as received from the agent (raw variables only)
 +|}
 +Table 1b — Internal variables available in raw and derived variable expressions
=== Actions === === Actions ===
-Actions are predefined responses associated with a sentry that may be invoked by an+Actions are predefined responses associated with a sentry that may be invoked by an operator from the console. Each action is a command that is run on the same host as the host monitor. Actions may be associated with a particular state or may be available at any time.
-operator from the console. Each action is a command that is run on the same host+ 
-as the host monitor. Actions may be associated with a particular state or may be+
-available at any time.+
There are two types: There are two types:
-An action simply runs the command, and is intended to correct a problem.+*An action simply runs the command, and is intended to correct a problem. Example starting a service when it is stopped.
-Example starting a service when it is stopped.+*A report displays the command’s output on the screen, usually in a browser or pager window, and is intended to help the operator diagnose the problem.
-A report displays the command’s output on the screen, usually in a browser+ 
-or pager window, and is intended to help the operator diagnose the problem.+You can design a single action to work both on selected instances of a multi-instance sentry and on every instance in a selected parent folder. For example, you can set up an action so that the output for every selected instance is combined into one report.
-You can design a single action to work both on selected instances of a multi-instance+ 
-sentry and on every instance in a selected parent folder. For example, you can set up+Tasks that don’t require any action or judgement by an operator and can safely be run automatically are better implemented as responses. Data is passed to an action from the host monitor either by being written to the action’s STDIN, or, if the flag Uses agent data: is set to yes, through the environment variables $Sentry, $Host, and $Action. For multi-instance sentries you can refer to a specific named instance or use $Instance, which contains the instance name of the primary agent.
-an action so that the output for every selected instance is combined into one report.+ 
-Type of+ Note: History data and functions and the & <varname> syntax, which are available in state conditions and console text,
-agent+ cannot be used in an action. To pass the value of a history function, use a derived variable.
-Multiinstance?+ 
-Variable name+If you wish to format a value returned by an agent you must do it manually in the command.
-primary no $ <varname>+ 
-Example: $pct_free+==== Examples: defining reports ====
-primary yes, current+Example 1 shows how to define a simple report for a single-instance sentry, without using any agent variables. When the report is run it will display in a browser window the name of this action (‘Sentry Details Report 1’), the date, the name of the sentry, and the host it runs on.
-instance+ 
-$ <varname>+ Action Sentry Details Report 1
-Example: $pct_free+ Type report
-secondary no $ <agentname>( <varname>)+ Command echo -n "Report '$Action' "; date; echo " Sentry: $Sentry"; echo " Host: $Host"
-Example: $printer(entries)+ Display command browser
-secondary yes, current+ Uses agent data? no
-instance+ Reads from STDIN? (N/A)
-$ <agentname>( <varname>:$Instance)+ Export to parent? no
-Example: $Service(Status:$Instance)+ 
-either yes, specific+In example 2 the agent variables associated with the sentry are exported to the environment (Uses agent data? yes) so that they can be used in the command.
-instance+ 
-$ <agentname>( <varname>: <instancename>)+ Action Sentry Details Report 2
-Example: $Printer(status:HPDesignJet)+ Type report
-Table 1 — Referring to a variable name in an action command+ Command echo "Free space on $Filesystem = $pct_free%"
-Sentinel3G Concepts 25+ Display command browser
-Tasks that don’t require any action or judgement by an operator and can safely be+ Uses agent data? yes
-run automatically are better implemented as responses.+ Reads from STDIN? no
-Data is passed to an action from the host monitor either by being written to the+ Export to parent? no
-action’s STDIN, or, if the flag Uses agent data: is set to yes, through the environment+ 
-variables $Sentry, $Host, and $Action. For multi-instance sentries+When you select a filesystem from the console and run the action, the report will show the free space on that filesystem. If you select multiple filesystems, the command will be run once for each instance, and the output window will show one row for each filesystem.
-you can refer to a specific named instance or use $Instance, which contains the+ 
-instance name of the primary agent.+Example 3 demonstrates another way to make data available to an action, this time by reading from STDIN. This passes any agent data to the sentry in Functional Database format (a plain-text table, with rows separated by a newline and fields separated by a tab).
-Note History data and functions and the & <varname> syntax, which are+ 
-available in state conditions and console text, cannot be used in an+ Action Sentry Details Report 3
-action. To pass the value of a history function, use a derived variable.+ Type report
-If you wish to format a value returned by an agent you must do it+ Command cat -
-manually in the command.+ Display command db_scroll
-Examples: defining reports+ Uses agent data? yes
-Example 1 shows how to define a simple report for a single-instance sentry, without+ Reads from STDIN? yes
-using any agent variables. When the report is run it will display in a browser window+ Export to parent? no
-the name of this action (‘Sentry Details Report 1’), the date, the name of the sentry,+ 
-and the host it runs on.+When you select a filesystem from the console and run the action, the report will show the raw database row containing the filesystem variables. To read this in a script, you would then need to use db_readrow, a Functional Toolset program.
-In example 2 the agent variables associated with the sentry are exported to the environment+ 
-(Uses agent data? yes) so that they can be used in the command.+Use this option if you are familiar with the Functional Toolset and wish to use it to manipulate the data. With this method, unlike the previous example, the command is only run once. The database rows are accumulated before piping them to the Command. Try selecting multiple filesystems and running the action. Note that there is one header row and multiple data rows.
-Action Sentry Details Report 1+ 
-Type report+In Example 4, you export the action to the parent folder (Export to parent? yes). This makes the action available from the context menu when the operator clicks on the folder background (that is, no sentries are selected), or on the parent class folder.
-Command echo -n "Report '$Action' "; date; echo+ 
-" Sentry: $Sentry"; echo " Host: $Host"+ Action Sentry Details Report 4
-Display command browser+ Type report
-Uses agent data? no+ Command echo "Free space on $Filesystem = $pct_free%"
-Reads from STDIN? (N/A)+ Display command browser
-Export to parent? no+ Uses agent data? yes
-26 Sentinel3G Concepts+ Reads from STDIN? yes
-When you select a filesystem from the console and run the action, the report will+ Export to parent? yes
-show the free space on that filesystem. If you select multiple filesystems, the command+ 
-will be run once for each instance, and the output window will show one row+When you run this action by clicking on the background of the folder or on the parent class folder, it is the same as selecting all instances and then running the action. If the action were configured on a single instance sentry, it is the same as selecting
-for each filesystem.+that single sentry and running the action.
-Example 3 demonstrates another way to make data available to an action, this time+ 
-by reading from STDIN. This passes any agent data to the sentry in Functional+==== Example: defining an action ====
-Database format (a plain-text table, with rows separated by a newline and fields separated+ 
-by a tab).+
-When you select a filesystem from the console and run the action, the report will+
-show the raw database row containing the filesystem variables. To read this in a+
-script, you would then need to use db_readrow, a Functional Toolset program.+
-Use this option if you are familiar with the Functional Toolset and wish to use it to+
-manipulate the data.+
-With this method, unlike the previous example, the command is only run once. The+
-database rows are accumulated before piping them to the Command. Try selecting+
-Action Sentry Details Report 2+
-Type report+
-Command echo "Free space on $Filesystem =+
-$pct_free%"+
-Display command browser+
-Uses agent data? yes+
-Reads from STDIN? no+
-Export to parent? no+
-Action Sentry Details Report 3+
-Type report+
-Command cat -+
-Display command db_scroll+
-Uses agent data? yes+
-Reads from STDIN? yes+
-Export to parent? no+
-Sentinel3G Concepts 27+
-multiple filesystems and running the action. Note that there is one header row and+
-multiple data rows.+
-In Example 4, you export the action to the parent folder (Export to parent?+
-yes). This makes the action available from the context menu when the operator+
-clicks on the folder background (that is, no sentries are selected), or on the parent+
-class folder.+
-When you run this action by clicking on the background of the folder or on the parent+
-class folder, it is the same as selecting all instances and then running the action.+
-If the action were configured on a single instance sentry, it is the same as selecting+
-that single sentry and running the action.+
-Example: defining an action+
Example 5 runs a command to stop the service represented by this sentry. Example 5 runs a command to stop the service represented by this sentry.
-This is an action and not a report, so there is no Display command.+*This is an action and not a report, so there is no Display command.
-The command, called system_service, could cause system problems if+*The command, called system_service, could cause system problems if run incorrectly, so when run from a shell it requires root privileges. Therefore Run as user is set to root so that when it is run from within Sentinel3G it has the necessary root privileges.
-run incorrectly, so when run from a shell it requires root privileges. Therefore+*Access role is set to Manager so that only Sentinel3G users with the Manager role can run the action.
-Run as user is set to root so that when it is run from within+*Authenticate=yes means that the user’s password must be entered. This is a further security measure to ensure that the action cannot be run by an unauthorized person from the Sentinel3G Manager’s workstation.
-Sentinel3G it has the necessary root privileges.+*This action is only useful if the service is really running, so In state(s) is set so that the action is only presented to the operator if the sentry is in Running state or Confused state (which means the service is turned off but still running).
-Access role is set to Manager so that only Sentinel3G users with+*Uses agent data? is set to yes so that the command can obtain the variable containing the name of the service.
-the Manager role can run the action.+ 
-Authenticate=yes means that the user’s password must be entered.+ Action Stop service
-This is a further security measure to ensure that the action cannot be run by+ Type action
-an unauthorized person from the Sentinel3G Manager’s workstation.+ Command system_service $Filename stop
-This action is only useful if the service is really running, so In state(s)+ Access role Manager
-Action Sentry Details Report 4+ Authenticate yes
-Type report+ Run as user root
-Command echo "Free space on $Filesystem =+ In state(s) Confused Running
-$pct_free%"+ Uses agent data? yes
-Display command browser+
-Uses agent data? yes+
-Reads from STDIN? yes+
-Export to parent? yes+
-28 Sentinel3G Concepts+
-is set so that the action is only presented to the operator if the sentry is in+
-Running state or Confused state (which means the service is turned off+
-but still running).+
-Uses agent data? is set to yes so that the command can obtain the+
-variable containing the name of the service.+
=== Responses === === Responses ===
-Responses are commands that are run automatically by the Host Monitor when a+Responses are actions that are run automatically by the Host Monitor when a sentry is in a particular state. You can define a series of responses for each state that is tailored to the severity of the problem.
-sentry is in a particular state. You can define a series of responses for each state that+ 
-are tailored to the severity of the problem.+Each response may run immediately, or there may be a waiting period after the sentry first enters this state or after the running of a previous response. Figure 4 shows an example of the full set of responses defined for a sentry while it is in warning state.
-Each response may run immediately, or there may be a waiting period after the sentry+ 
-first enters this state or after the running of a previous response. Figure 4 shows+Each response period is cumulative. In other words the period for Response #2 is counted from the end of the period for Response #1. Example: Response #1 is defined to go to a new severity of warning after 120 seconds. Response #2 is defined to notify after 60 seconds, which will be 180 seconds after the sentry entered this state.
-an example of the full set of responses defined for a sentry while it is in warning+ 
-state.+The response Command can attempt to remedy a situation. If successful it will typically return the sentry to a normal state. If the Command does not succeed, you may choose to leave the sentry in that state, and specify a later response to run another command or to notify someone, or simply to elevate the severity.
-Action Stop service+ 
-Type action+Another possible response is to force an agent to be polled at the end of the response period. This is called ‘firing’ the agent. You can fire the primary agent to refresh the variables used by the sentry, or fire another agent to collect additional data. This is useful if you performed an immediate response to try to correct the situation, and you want to check quickly if this has worked rather than waiting until the next poll of the agent.
-Command system_service $Filename stop+ 
-Access role Manager+Where a sentry experiences occasional temporary situations which usually correct themselves quickly, you may not want to take action or be notified unless the sentry has been in that state for some minimum period.
-Authenticate yes+ 
-Run as user root+If a sentry changes state while it is waiting to process a response (that is, before the end of the waiting period), then all responses for this state are cancelled, and any responses for the new state are started.
-In state(s) Confused Running+ 
-Uses agent data? yes+Example: as free disk space in a filesystem reaches a dangerously low level, Sentinel3G can run a series of commands such as:
-Sentinel3G Concepts 29+*writing to currently logged-in users asking them to remove surplus files
-Figure 4 — Time-line showing the responses of a sentry in warning state+*archiving files to an offline storage device
-Each response period is cumulative. In other words the period for Response #2 is+*removing files deemed expendable, such as files named core and *.o.
-counted from the end of the period for Response #1. Example: Response #1 is+ 
-defined to go to a new severity of warning after 120 seconds. Response #2 is+Any helpful task that can safely be run without prior checking can be set up as an automatic response to an event. Tasks that require some action or judgement by an operator are better implemented as actions.
-defined to notify after 60 seconds, which will be 180 seconds after the sentry+
-entered this state.+
-The response Command can attempt to remedy a situation. If successful it will typically+
-return the sentry to a normal state. If the Command does not succeed, you may+
-choose to leave the sentry in that state, and specify a later response to run another+
-command or to notify someone.+
-Another possible response is to force an agent to be polled at the end of the+
-response period. This is called ‘firing’ the agent. You can fire the primary agent to+
-refresh the variables used by the sentry, or fire another agent to collect additional+
-data.+
-Where a sentry experiences occasional temporary situations which usually correct+
-themselves quickly, you may not want to take action or be notified unless the sentry+
-has been in that state for some minimum period.+
-If a sentry changes state while it is waiting to process a response (that is, before the+
-end of the waiting period), then all responses for this state are cancelled, and any+
-responses for the new state are started.+
-􀂡Response #1+
-After (secs): 240+
-Command: rmtmpfiles+
-Response #2+
-After (secs): 360+
-Notify: opsgroup+
-acknowledge+
-After (secs): 300+
-Go to state: alarm+
-sentry enters warning state+
-operator acknowledges event;+
-sentry enters alarm state+
-240 secs+
-0 secs+
-600 secs+
-900 secs flag appears next to sentry on console+
-rmtmpfiles script starts running+
-message e-mailed to users in opsgroup+
-+
-30 Sentinel3G Concepts+
-Example: as free disk space in a filesystem reaches a dangerously low level,+
-Sentinel3G can run a series of commands such as:+
-writing to currently logged-in users asking them to remove surplus files+
-archiving files to an offline storage device+
-removing files deemed expendable, such as files named core and *.o.+
-Any helpful task that can safely be run without prior checking can be set up as an+
-automatic response to an event. Tasks that require some action or judgement by an+
-operator are better implemented as actions.+
=== Escalation === === Escalation ===
-Another way to respond to an alert is simply to wait for a while to see if the problem+Another way to respond to an alert is simply to wait for a while to see if the problem corrects itself, then to change to another state at the end of that period.
-corrects itself, then to change to another state at the end of that period.+ 
-For example, a sentry may be defined to wait up to 300 seconds in warning state,+For example, a sentry may be defined to wait up to 300 seconds in warning state, then to change to alarm state. The change of state may depend on manual confirmation from an operator (Acknowledgement) or it may happen automatically (Escalation).
-then to change to alarm state. The change of state may depend on manual confirmation+ 
-from an operator (Acknowledgement) or it may happen automatically (Escalation).+If the problem is normally transient and self-correcting, you could put the sentry into a warning state for a few minutes. At this point the appearance of the sentry is simply a passive signal that the sentry is not in its normal state. If the sentry is still in
-If the problem is normally transient and self-correcting, you could put the sentry+warning state at the end of this period, it indicates that the problem is unlikely to resolve itself. In this case you could change the sentry to a more severe state with its own set of responses.
-into a warning state for a few minutes. At this point the appearance of the sentry is+ 
-simply a passive signal that the sentry is not in its normal state. If the sentry is still in+In other cases you might return the sentry to a normal state if no other events have occurred by the end of the period. For example, a warning message appearing in a system log file may indicate a potential performance problem, but if no other messages are logged in the next few minutes it may be safe to return the sentry to normal state.
-warning state at the end of this period, it indicates that the problem is unlikely to+ 
-resolve itself. In this case you could change the sentry to a more severe state with its+Another use for escalation is to “chain together” several responses by splitting them over two states. Each state has a maximum of three responses.
-own set of responses.+ 
-In other cases you might return the sentry to a normal state if no other events have+Note that it may take several seconds for the escalation to be processed at the end of the waiting period.
-occurred by the end of the period. For example, a warning message appearing in a+
-system log file may indicate a potential performance problem, but if no other messages+
-are logged in the next few minutes it may be safe to return the sentry to normal+
-state.+
-Another use for escalation is to “chain together” several responses by splitting them+
-over two states. Each state has a maximum of three responses.+
-Note that it may take several seconds for the escalation to be processed at the end of+
-the waiting period.+
-Sentinel3G Concepts 31+
=== Acknowledgement === === Acknowledgement ===
-A sentry may request acknowledgement from an operator before changing to+A sentry may request acknowledgement from an operator before changing to another state. This is usually done to confirm that an operator has been made aware of a probable “one-off ” incident before returning the sentry to normal state. For example, if the Bad_SU sentry detects a single failed attempt to gain root privileges, it remains in Report state until:
-another state. This is usually done to confirm that an operator has been made aware+*It receives acknowledgement from an operator and returns to its normal state
-of a probable “one-off ” incident before returning the sentry to normal state. For+*It detects another failed su attempt and goes to Violation state
-example, if the Bad_SU sentry detects a single failed attempt to gain root privileges,+ 
-it remains in Report state until:+Prompting for acknowledgement verifies that an operator was made aware of the condition at the time, which can be useful for audit or training purposes. You should provide monitoring notes to help operators understand what their options are when the sentry is in this state, and what will happen next if they acknowledge the alert.
-It receives acknowledgement from an operator and returns to its normal+ 
-state+
-It detects another failed su attempt and goes to Violation state+
-Prompting for acknowledgement verifies that an operator was made aware of the+
-condition at the time, which can be useful for audit or training purposes. You should+
-provide monitoring notes to help operators understand what their options are when+
-the sentry is in this state, and what will happen next if they acknowledge the alert.+
If a sentry is waiting for acknowledgement this overlay icon will appear next to it. If a sentry is waiting for acknowledgement this overlay icon will appear next to it.
=== Notification === === Notification ===
-Sentinel3G can notify a list of staff by e-mail when an event is detected. This is a+Sentinel3G can notify a list of staff by e-mail when an event is detected. This is a useful way to alert staff who do not normally run or are not currently running a console. There are three layers or types of notification:
-useful way to alert staff who do not normally run or are not currently running a console.+*Global notification is triggered when any sentry goes into a state at a specified severity level or higher. This would normally be used only for the most serious alerts, to avoid staff being flooded with messages about routine events. An example would be to prompt at least one person in the operations group to check the console to find out more about the problem. Global notification lets you specify a blanket notification policy in one place rather than having to set it for every sentry.
-There are three layers or types of notification:+*Sentry-level notification occurs when a particular sentry goes into a state at a specified severity level or higher. For example, you can set global notification to occur when any sentry goes into a state whose severity is severe or critical, but override that for a particular sentry. When that sentry goes into a state whose severity is alarm notification should be sent. Sentrylevel notification lets you supplement the global notification policy by changing the notification level for selected sentries.
-Global notification is triggered when any sentry goes into a state at a specified+ 
-severity level or higher. This would normally be used only for the most serious+ Note that operators can disable notification for selected sentries from the console.
-alerts, to avoid staff ’s being flooded with messages about routine events.+ 
-An example would be to prompt at least one person in the operations group+*State-level notification can be specified as one of the predefined responses when a sentry goes into a particular state. This can be used to implement a follow-up response where the first response fails to correct the problem.
-to check the console to find out more about the problem. Global notification+ 
-lets you specify a blanket notification policy in one place rather than+Figure 5 shows a scheme that combines global and sentry-level notification. The NotifyLevel setting is set to severe, so global notification will normally be triggered by any sentry that goes into a state whose severity is severe or critical. There are two exceptions to this: SentryB will send a notification message (perhaps to a different list of recipients) if it goes into a state whose severity is alarm or higher; SentryC will send a notification message only if it goes into a state whose severity is critical.
-having to set it for every sentry.+ 
-Sentry-level notification occurs when a particular sentry goes into a state at a+ Figure 5 — Example of both global and sentry-level notification
-specified severity level or higher. For example, you can set global notification+ 
-to occur when any sentry goes into a state whose severity is severe or+Figure 6 shows an example of state-level notification. This sentry waits for 300 seconds after entering low state, then runs a script to try to fix the problem. If the sentry is still in low state after another 120 seconds, a notification message is sent to
-critical, but override that for a particular sentry. When that sentry goes+recipients in opsgroup.
-into a state whose severity is alarm notification should be sent. Sentrylevel+ 
-notification lets you supplement the global notification policy by+{| border="1" cellpadding="3" cellspacing="0"
-changing the notification level for selected sentries.+|-
-32 Sentinel3G Concepts+|State: sufficient
-Note that operators can disable notification for selected sentries from the+ 
-console.+Severity: normal
-State-level notification can be specified as one of the predefined responses+|
-when a sentry goes into a particular state. This can be used to implement a+|
-follow-up response where the first response fails to correct the problem.+|-
-Figure 5 shows a scheme that combines global and sentry-level notification. The+| State: low
-NotifyLevel setting is set to severe, so global notification will normally be+ 
-triggered by any sentry that goes into a state whose severity is severe or critical+Severity: warning
-There are two exceptions to this: SentryB will send a notification message+|Response 1:
-(perhaps to a different list of recipients) if it goes into a state whose severity is+ 
-alarm or higher; SentryC will send a notification message only if it goes into a+
-state whose severity is critical.+
-Figure 5 — Example of both global and sentry-level notification+
-Figure 6 shows an example of state-level notification. This sentry waits for 300 seconds+
-after entering low state, then runs a script to try to fix the problem. If the sentry+
-is still in low state after another 120 seconds, a notification message is sent to+
-recipients in opsgroup.+
-warning alarm severe critical+
-sentryA+
-sentryB+
-sentryC+
-sentryD+
-Global notification+
-Sentry-level notification+
-Sentinel3G Concepts 33+
-Figure 6 — Example of state-level notification+
-Global notification is the simplest form to implement as it is set in one place and+
-applies to all sentries. In more complex environments where different people should+
-be notified when different events occur, it may be more appropriate to configure+
-notification at the sentry or state level.+
-State:+
-Severity:+
-sufficient+
-normal+
-State:+
-Severity:+
-low+
-warning+
-Response 1:+
After 300 secs: After 300 secs:
-Command:+|Command:
 + 
/usr/local/bin/rmtmpfiles /usr/local/bin/rmtmpfiles
-Response 2:+|-
 +|
 +|Response 2:
 + 
After 120 secs: After 120 secs:
-Notify:+|Notify:
 + 
opsgroup opsgroup
-State:+|-
-Severity:+|State: very_low
-very_low+ 
-alarm+Severity: alarm
-34 Sentinel3G Concepts+|
 +|
 +|}
 + 
 +Figure 6 — Example of state-level notification
 + 
 +Global notification is the simplest form to implement as it is set in one place and applies to all sentries. In more complex environments where different people should be notified when different events occur, it may be more appropriate to configure notification at the sentry, instance group or state level.
== Agents and Variables == == Agents and Variables ==
-Agents collect data on behalf of sentries. A typical agent works by polling, or running+Agents collect data on behalf of sentries. A typical agent works by polling, or running a command at regular intervals. Each time the command runs, its output is stored in a number of variables. These variables are passed to the host monitor to be processed on behalf of sentries, for example to evaluate what state the sentry is in and to display data on the console.
-a command at regular intervals. Each time the command runs, its output is+ 
-stored in a number of variables. These variables are passed to the host monitor to be+Other types of agent don’t poll but simply wait to receive data, for example from:
-processed on behalf of sentries, for example to evaluate what state the sentry is in+*the Logfile agent
-and to display data on the console.+*an existing application that has been instrumented through the Host Monitor API to send data for monitoring direct to the Host Monitor.
-Another type of agent doesn’t poll but simply waits to receive data, either from:+ 
-the Logfile agent+=== Primary and secondary agents ===
-an existing application that has been instrumented through the Host Monitor+Each sentry has one agent, called its primary agent, that supplies most or all of its variables. A sentry can also access variables belonging to other agents, which are called its secondary agents. Variables are simply referred to by name: $pct_free, $count. If a primary agent and a secondary agent both have a variable with the same name, the primary agent’s variable is used.
-API to send data for monitoring direct to the Host Monitor.+ 
-Primary and secondary agents+There is an important difference between primary and secondary agents that you should be aware of. A sentry’s state evaluations are normally done when its primary agent returns data, not when the secondary ones do. However each secondary agent can also be configured to "trigger" the sentry, causing state evaluations to happen BOTH when the primary and secondary agents return data. However, this can lead to some unexpected behaviour as the data from one agent may be old and out of date.
-Each sentry has one agent, called its primary agent, that supplies most or all of its+ 
-variables. A sentry can also access variables belonging to other agents, which are+For example: there are two agents. The first agent monitors whether the Staff database is up or down. The other agent monitors whether the Payroll application (which happens to use the Staff database) is up or down. There is a sentry for the application. This sentry has different states to distinguish between the Payroll application being down because the Staff database is down, and the application being down for another reason.
-called its secondary agents. Variables are simply referred to by name: $pct_free;+ 
-$count. If a primary agent and a secondary agent both have a variable with the+You would configure the Staff database agent as a secondary agent so you could use the "is_up" variable that belongs to it. However, if the poll times of the two agents are not exactly the same (and they usually won't be) there is a potential problem. You can have a situation where the application agent reports that the Payroll application is down because it has detected that the Staff database is down, but the database agent hasn’t had its poll yet, and still ‘thinks’ the Staff database is up. (The solution to this is to use sentry Dependencies).
-same name, the primary agent’s variable is used.+ 
-There is an important difference between primary and secondary agents that you+=== Discovery program ===
-should be aware of. A sentry’s state evaluations are normally done when its primary+This is an optional command that is run before the agent starts. Its job is to return an exit status of true or false based on the existence or status of a resource. If the discovery program returns false, this agent and its associated sentries will not be started. This means the same set of KBs can be installed on several servers, and an agent on a particular server can be switched off if it ‘discovers’ that the resource it monitors is not present.
-agent returns data, not when the secondary ones do. This can lead to some unexpected+ 
-behavior if the secondary agent data is relatively out of date.+
-For example: there are two agents. The first agent monitors whether the Staff+
-database is up or down. The other agent monitors whether the Payroll application+
-(which happens to use the Staff database) is up or down. There is a sentry for+
-the application. This sentry has different states to distinguish between the Payroll+
-application being down because the Staff database is down, and the application+
-being down for another reason.+
-You would configure the Staff database agent as a secondary agent so you could+
-use the IS_UP variable that belongs to it. However, if the poll times of the two+
-agents are not the same (and they usually won't be) there is a potential problem. You+
-can have a situation where the application agent reports that the Payroll applicaSentinel3G+
-Concepts 35+
-tion is down because it has detected that the Staff database is down, but the database+
-agent hasn’t had its poll yet, and still ‘thinks’ the Staff database is up.+
-The solution is to ‘trigger’ the sentry, which forces its state to be reevaluated when+
-the secondary agent returns new data. The effect is to force the primary and secondary+
-agents to synchronize their polling.+
-Discovery program+
-This is an optional command that is run before the agent starts. Its job is to return+
-an exit status of true or false based on the existence or status of a resource. If+
-the discovery program returns false, this agent and its associated sentries will not be+
-started. This means the same set of KBs can be installed on several servers, and an+
-agent on a particular server can be switched off if it ‘discovers’ that the resource it+
-monitors is not present.+
Here are two examples: Here are two examples:
-The discovery program for an Oracle agent checks whether Oracle is+*The discovery program for an Oracle agent checks whether Oracle is installed on a server by testing for the existence of a particular executable or directory. If Oracle is not installed on that server, the Oracle monitoring agent is not started.
-installed on a server by testing for the existence of a particular executable or+*A network monitoring agent pings other hosts to check whether the host is up and communications are working. There’s no need to have more than one host sending pings as they should all return the same answer. To make sure only one host tests connectivity, the discovery program tests whether it is running on the Event Manager host: ["$EventHost" = "$HOSTNAME"] On every other host the discovery program returns false and the agent doesn’t start.
-directory. If Oracle is not installed on that server, the Oracle monitoring+ 
-agent is not started.+=== Monitoring file updates: the FileInfo agent ===
-A network monitoring agent pings other hosts to check whether the host is+Sometimes you may need to monitor when a particular file or files change in some way. For example, you could log when a system file such as the password file has been updated and perhaps generate an alert.
-up and communications are working. There’s no need to have more than+ 
-one host sending pings as they should all return the same answer. To make+
-sure only one host tests connectivity, the discovery program tests whether it+
-is running on the Event Manager host:+
-["$EventHost" = "$HOSTNAME"]+
-On every other host the discovery program returns false and the agent+
-doesn’t start.+
-Monitoring file updates: the FileInfo agent+
-Sometimes you may need to monitor when a particular file or files change in some+
-way. For example, you could log when a system file such as the password file has+
-been updated and perhaps generate an alert.+
-36 Sentinel3G Concepts+
Sentinel3G provides a standard agent that can monitor these events: Sentinel3G provides a standard agent that can monitor these events:
-The file has been created or deleted+*The file has been created or deleted
-Any change to the file (modification time has changed)+*Any change to the file (modification time has changed)
-Size has changed+*Size has changed
-Ownership has changed+*Ownership has changed
-Access permissions have changed+*Access permissions have changed
-=== ’Windowing’ States Based on Time: the Clock Agent ===+
-The Schedule agent can be used to stop a sentry from monitoring during particular+=== "Windowing" States Based on Time: the Clock Agent ===
-periods. For example, if you run batch jobs between 11pm and 6am daily that use+ 
-lots of CPU, you don't want to be notified if the run_queue gets too high as this is+The Clock agent can be used to stop a sentry from monitoring during particular periods. For example, if you run batch jobs between 11pm and 6am daily that use lots of CPU, you don't want to be notified if the run_queue gets too high as this is expected during these times. So you "window" the monitoring.
-expected during these times. So you ‘window’ the monitoring.+ 
-Add the Clock agent to the sentry you wish to window (in our example,+Add the Clock agent as a secondary agent to the sentry you wish to window (in our example, Run_Queue).
-Run_Queue). Make sure you also check the Trigger sentry? box.+ 
-Add a new state to the sentry, called NOT_MONITORED, and give it a severity of Disabled.+Add a new state to the sentry, called Not_Monitored, and give it a severity of Disabled. In the Condition field, enter a boolean expression describing the time you want to exclude the sentry from monitoring. In our previous example, where the batch jobs run between 11pm and 6am, this would be:
-In the Condition field, enter a boolean expression describing the time you+ 
-want to exclude the sentry from monitoring. In our previous example, where the+ $Hour >= 23 || $Hour < 6
-batch jobs run between 11pm and 6am, this would be:+ 
-$Hour >= 23 || $Hour <= 6+Make sure that the Not_Monitored state appears at the top of the list of states so that its condition is evaluated first.
-Table 2 lists the variables you can use to window monitoring for a sentry.+ 
-Variable Type Description+If the requirement was to disable monitoring during 11pm - 6am Monday to Friday only, it gets a bit more complicated, because you need to remember that Friday night's batch jobs actually go until 6am on Saturday:
-Day number Day of the month (1-31)+ 
-DayName string Name of the day of the week, capitalized, e.g.+ $Hour >= 23 && $DayOfWeek >= 1 && $DayOfWeek < 6 || $Hour < 6 && $DayOfWeek > 1 && $DayOfWeek <= 6
-Monday+ 
-DayofWeek number Day of the week as a number (Sunday = 0)+Table 2 lists the variables you can use to window monitoring for a sentry.
-DayofYear number Day of the year (1-366)+ 
-Hour number Hour of the day (0-23)+{| border="1" cellpadding="3" cellspacing="0"
-Sentinel3G Concepts 37+|<strong>Variable</strong>
-Table 2 — Clock variables+|<strong>Type</strong>
 +|<strong>Description</strong>
 +|-
 +|Day
 +|number
 +|Day of the month (1-31)
 +|-
 +|DayName
 +|string
 +|Name of the day of the week, capitalized, e.g. Monday
 +|-
 +|DayofWeek
 +|number
 +|Day of the week as a number (Sunday = 0)
 +|-
 +|DayofYear
 +|number
 +|Day of the year (1-366)
 +|-
 +|Hour
 +|number
 +|Hour of the day (0-23)
 +|-
 +|LastDayofMonth
 +|boolean
 +|True when today is the last day of the month
 +|-
 +|LastWeekofMonth
 +|boolean
 +|True when within 7 days of the end of the month
 +|-
 +|Minute
 +|number
 +|Minute in the hour (0-59)
 +|-
 +|Month
 +|number
 +|Month as a number (1-12)
 +|-
 +|MonthName
 +|string
 +|Name of the month, capitalized, e.g. June
 +|-
 +|Time
 +|clock
 +|Number of seconds since 1st January 1970 GMT
 +|-
 +|TimeOfDay
 +|string
 +|Time in the form HH:MM (00:00 - 23:59)
 +|-
 +|TimeZone
 +|string
 +|Timezone configured on the system as a string, e.g. GMT
 +|-
 +|Week
 +|number
 +|Week of the year (0-52), week begins Sunday
 +|-
 +|Year
 +|number
 +|4 digit year, e.g. 2003
 +|}
 + 
 +Table 2 — Clock agent variables
=== Process Monitoring: the ProcessInfo Agent === === Process Monitoring: the ProcessInfo Agent ===
-The ProcessInfo agent provides data for a process monitoring console on each+The ProcessInfo agent provides data for a process monitoring console on each host.
-host.+ 
-The ProcessInfo agent returns data about processes running on the local (Host+The ProcessInfo agent returns data about processes running on the local (Host Monitor) host [see ps(1)]. It is typically used to determine whether a process is running or to monitor its CPU or memory usage. ProcessInfo is a multi-instance agent, whose instances are the usually the names of the processes being monitored.
-Monitor) host [see ps(1)]. It is typically used to determine whether a process is+ 
-running or to monitor its CPU or memory usage. ProcessInfo is a multi-instance+They must be specified in the Instances field of each sentry using this agent.
-agent, whose instances are the usually the names of the processes being monitored.+ 
-They must be specified in the Instances field of each sentry using this agent.+Processes are matched to instances by doing pattern matches on the Command field (as returned by the ps -efl command). If the Agent data field of an instance is NULL, then an exact string match is performed using the instance name. Otherwise the Agent data is interpreted as an unanchored full regular expression.
-Processes are matched to instances by doing pattern matches on the Command field+ 
-(as returned by the ps -efl command). If the Agent data field of an instance is+Note that if one instance matches more than one process, only details of the first process found are returned. However the count variable is set to the number of matching processes.
-NULL, then an exact string match is performed using the instance name. Otherwise+ 
-the Agent data is interpreted as an unanchored full regular expression.+The variables returned by the ProcessInfo agent are:
-LastDayofMonth boolean True when today is the last day of the month+;command: The command running (the full name including command line options). Note that it maybe truncated if long.
-LastWeekofMonth boolean True when within 7 days of the end of the+;count: The number of matching processes found.
-month+;cpu: The number of CPU seconds used by the process.
-Minute number Minute in the hour (0-59)+;pid: The numeric process ID.
-Month number Month as a number (1-12)+;ppid: The numeric parent process ID.
-MonthName string Name of the month, capitalized, e.g. June+;priority: The numeric priority at which the process is running.
-Time clock Number of seconds since 1st January 1970+;size: The size of the memory image of the process.
-GMT+;state: The state of the process (see ps(1)).
-TimeOfDay string Time in the form HH:MM (00:00 - 23:59)+;tty: The controlling terminal of the process.
-TimeZone string Timezone configured on the system as a+;user: The name of the user owning the process.
-string, e.g. GMT+
-Week number Week of the year (0-52), week begins Sunday+
-Year number 4 digit year, e.g. 2003+
-Variable Type Description+
-38 Sentinel3G Concepts+
-Note that if one instance matches more than one process, only details of the first+
-process found are returned. However the count variable is set to the number of+
-matching processes.+
-The variables returned by the ProcessInfo agent are:+
-command The command running (the full name including command line+
-options). Note that it maybe truncated if long.+
-count The number of matching processes found.+
-cpu The number of CPU seconds used by the process.+
-pid The numeric process ID.+
-ppid The numeric parent process ID.+
-priority The numeric priority at which the process is running.+
-size The size of the memory image of the process.+
-state The state of the process (see ps(1)).+
-tty The controlling terminal of the process.+
-user The name of the user owning the process.+
=== Agent Classes and Variables === === Agent Classes and Variables ===
-Agents make data available to sentries in the form of variables. The agent class tells+Agents make data available to sentries in the form of variables. The agent class tells sentinel3G the format and location (e.g. STDOUT, a file name) of the agent data, how to parse it, and how to assign key data to variables.
-Sentinel3G the format and location (e.g. STDOUT, a file name) of the agent+ 
-data, how to parse it, and how to assign key data to variables.+The format of the agent output, and the way you identify which part of it to assign to a variable, differs depending on the agent class. This topic explains the attributes of each agent class.
-The format of the agent output, and the way you identify which part of it to assign+ 
-to a variable, differs depending on the agent class. This topic explains the attributes+==== API ====
-of each agent class.+An external application sends data via the Sentinel3G API. The application must be instrumented to send a string of variable names and their values to the host monitor at certain processing points, such as when a transaction is committed.
-Sentinel3G Concepts 39+ 
-API+The API class is different from other agent classes in that data is ‘pushed’ to the host monitor at intervals decided by the external application, rather than being ‘pulled’ in by Sentinel3G. Therefore you don’t specify a column name when adding a variable. Instead you define one variable for each varname= value pair that is passed in the SENAPIdata command by the external
-An external application sends data via the Sentinel3G API. The application must+application.
-be instrumented to send a string of variable names and their values to the host monitor+ 
-at certain processing points, such as when a transaction is committed.+==== DB ====
-The API class is different from other agent classes in that data is ‘pushed’ to the+The agent returns data in Functional Database format (a set of one or more records, each containing text fields delimited by tabs and terminating in a newline). Typically the data comprises several fields or one or more whole rows returned as a result of a query on a Functional Database table.
-host monitor at intervals decided by the external application, rather than being+ 
-‘pulled’ in by Sentinel3G. Therefore you don’t specify a column name when adding+Each column name that you assign to an agent variable is a field name as specified in the Functional Database dictionary entry.
-a variable. Instead you define one variable for each varname= value pair that is+ 
-passed in the SENAPIdata command by the external application.+==== ExitStatus ====
-DB+The agent returns the exit status of the command. This can be used to monitor scheduled processes such as batch jobs and backups where there are a few common exit statuses, each relating to a different error condition. Example: when a backup job fails, the sentry can translate the exit status into a meaningful console message (such as "media change failed" or "error writing to device") and provide appropriate responses and actions.
-The agent returns data in Functional Database format (a set of one or more records,+ 
-each containing text fields delimited by tabs and terminating in a newline). Typically+You don’t need to specify a column name when adding a variable to store the exit status. Instead you define one variable of type raw, leaving the Column field blank. The exit status of the agent command will automatically be assigned to this variable.
-the data comprises several fields or one or more whole rows returned as a result of a+ 
-query on a Functional Database table.+==== LogFile ====
-Each column name that you assign to an agent variable is a field name as specified in+A convenient way of detecting events in an existing application with minimal intrusion is by monitoring its log file(s) for certain messages. The LogFile agent class allows alarms to be generated based on the contents of log files such as:
-the Functional Database dictionary entry.+*Operating System logs
-ExitStatus+*Unix/Linux syslog
-The agent returns the exit status of the command. This can be used to monitor+*Bad login attempts
-scheduled processes such as batch jobs and backups where there are a few common+*COSmanager™ audit trails
-exit statuses, each relating to a different error condition. Example: when a backup+*Third-party applications
-job fails, the sentry can translate the exit status into a meaningful console message+ 
-(such as "media change failed" or "error writing to device")+The agent searches in the file for messages that match a pattern. In the Agent options form you can specify the file name, a select pattern to select records of interest, and an extract pattern for each text string in the record that must be assigned
-and provide appropriate responses and actions.+
-You don’t need to specify a column name when adding a variable to store the exit+
-status. Instead you define one variable of type raw, leaving the Column field blank.+
-The exit status of the agent command will automatically be assigned to this variable.+
-40 Sentinel3G Concepts+
-LogFile+
-A convenient way of detecting events in an existing application with minimal intrusion+
-is by monitoring its log file(s) for certain messages. The LogFile agent class+
-allows alarms to be generated based on the contents of log files such as:+
-Operating System logs+
-Unix/Linux syslog+
-Bad login attempts+
-COSmanager™ audit trails+
-Third-party applications+
-The agent searches in the file for messages that match a pattern. In the Agent+
-options form you can specify the file name, a select pattern to select records of+
-interest, and an extract pattern for each text string in the record that must be assigned+
to a variable. to a variable.
-The log file may contain a mixture of messages of different types but typically we+ 
-are only interested in one type. If you are interested in differently formatted messages+The log file may contain a mixture of messages of different types but typically we are only interested in one type. If you are interested in differently formatted messages you could define one agent per record type.
-you could define one agent per record type.+ 
-Agents in the Logfile class generate one or more lines of text output, such as an+Agents in the Logfile class generate one or more lines of text output, such as an error message. Table 3 explains how to split the data into patterns or columns. Table 4 explains how to assign each column to a variable.
-error message. Table 3 explains how to split the data into patterns or columns.+ 
-Table 4 explains how to assign each column to a variable.+==== SNMPPolled ====
-SNMPPolled+The agent polls for the results of SNMP ‘Get’ requests. Typically these requests test the current status of a managed object in an SNMP MIB, such as a device or port.
-The agent polls for the results of SNMP ‘Get’ requests. Typically these requests test+ 
-the current status of a managed object in an SNMP MIB, such as a device or port.+Each column name that you assign to an agent variable must be an object ID as specified in the SNMP MIB.
-Each column name that you assign to an agent variable must be an object ID as+ 
-specified in the SNMP MIB.+Note: This agent class is only available if the SNMP KB has been installed.
-Text+ 
-This is used to filter the output from a command. The agent runs the command,+==== Text ====
-which writes text output to STDOUT. If the output is complex or split over several+This is used to filter the output from a command. The agent runs the command, which writes text output to STDOUT. If the output is complex or split over several lines, you can use the Agent options form to filter out extraneous text such as blank lines, header lines, and labels.
-Sentinel3G Concepts 41+ 
-lines, you can use the Agent options form to filter out extraneous text such as+Agents in the Text class generate one or more lines of text output, such as a formatted report. Table 3 explains how to split the data into patterns or columns.
-blank lines, header lines, and labels.+ 
-Agents in the Text class generate one or more lines of text output, such as a formatted+
-report. Table 3 explains how to split the data into patterns or columns.+
-Table 4 explains how to assign each column to a variable.+
=== Assigning Text and Log File Data to Variables === === Assigning Text and Log File Data to Variables ===
-For agents in the Text and Logfile class, the data is split into one or more columns+For agents in the Text and Logfile class, the data is split into one or more fields, which are identified by number. How the fields are split is determined by the Split data by field in the ‘Agent options’ form, as shown in Table 3:
-or fields, which are identified by number. How the fields are split is determined+ 
-by the Split data by field in the ‘Agent options’ form, as shown in+{| border="1" cellpadding="3" cellspacing="0"
-Table 3:+|-
-Split data by Notes+|<strong>Split data by </strong>
-column The data is not split into columns, but is passed as one text+|<strong>Notes</strong>
-string for each line.+|-
-whitespace The line is split into a series of tokens separated by whitespace.+|column
-The first token is column 1, the second token is column 2, and+|The line is not split into fields. All variables must be identified by character position on the line.
-so on.+|-
-Example: to assign to the variable the characters from the start+|whitespace
-of the line to the first whitespace character, enter 1 in the Column+|The line is split into a series of fields separated by whitespace. The first field is column 1, the second is column 2, and so on.
-field.+ 
-tab The line is split into a series of tokens, each separated by a tab.+Example: to assign to the variable the characters from the start of the line to the first whitespace character, enter 1 in the Column field.
-The first token is column 1, the second token is column 2, and+|-
-so on.+|tab
-Example: to assign to the variable the characters between the+|The line is split into a series of fields, each separated by a tab. The first field is column 1, the second field is column 2, and so on.
-first and second tab, enter 2 in the Column field.+ 
-pattern Specify in the Pattern line fields of the Agent Options+Example: to assign to the variable the characters between the first and second tab, enter 2 in the Column field.
-form one or more extract patterns. The first extract pattern is+|-
-treated as column 1, the second extract pattern column 2, and so+|pattern
-on.+|Specify in the Pattern line fields of the Agent Options form one or more extract patterns. The first extract pattern is treated as column 1, the second extract pattern column 2, and so on.
-Example: to assign to the variable the string that matches the+ 
-third extract pattern, enter 3 in the Column field.+Example: to assign to the variable the string that matches the third extract pattern, enter 3 in the Column field.
 +|}
Table 3 — How agent data is split into one or more numbered fields Table 3 — How agent data is split into one or more numbered fields
-42 Sentinel3G Concepts+ 
-When you add a variable, you specify in the Column field which of these columns+When you add a variable, you specify in the Column field which of these columns to assign to the variable.
-to assign to the variable.+ 
-For agents where the data is split by pattern, you simply enter the column number+For agents where the data is split pattern, you simply enter the column number <col>. If the data is split by whitespace or tab, you can enter a single column number if the agent returns all data on one line. If the agent returns several lines of data, you can specify a particular line by prefixing the column number with the line number, like this: <line>: <col>.
-<col>. If the data is split by whitespace or tab, you can enter a single column+ 
-number, or a range of columns in the form c <col>- <col>. The second+For agents where the data is split by column, each line is treated as a string of characters. You must specify in the Column field a range of characters in the form
-<col> may be end, which means the last column in the line. If the agent returns+c <pos>-<pos>. If the agent returns several lines of data, you can specify a particular line by prefixing the character range with the line number, like this:
-several lines of data, you can specify a particular line by prefixing the column number+ <line>:c <pos>-<pos>
-with the line number, like this: <line>: <col> or <line>: <col>- <col>.+ 
-For agents where the data is split by column, each line is treated as a string of characters,+{| border="1" cellpadding="3" cellspacing="0"
-with the first character being column 1, the second character column 2, and+|-
-so on. You must specify in the Column field a range of characters in the form+|<strong>Column field</strong>
-c <col>- <col>. If the agent returns several lines of data, you can specify a particular+|<strong>Split data by </strong>
-line by prefixing the character range with the line number, like this:+|<strong>Notes</strong>
-<line>:c <col>- <col>.+|-
-Column+|3
-field+|pattern
-Split data+|the third extract pattern in the output
-by Notes+|-
-3 pattern the third extract pattern in the output+|2:3
-2:3 tab the third field in the second line of output+|whitespace
-c6-7 whitespace the sixth and seventh fields in the output+|the third field in the second line of output
-2:c3-end tab the third field to the last field in the second line+|-
-of output+|c10-40
-c10-40 column the tenth to the fortieth characters inclusive+|column
-3:c10-11 column the tenth and eleventh characters on the third+|the tenth to the fortieth characters inclusive
-line of output+|-
-2:c4-end column from the fourth to the last character on the second+|3:c10-11
-line+|column
 +|the tenth and eleventh characters on the third line of output
 +|-
 +|2:c4-end
 +|column
 +|from the fourth to the last character on the second line
 +|}
 + 
Table 4 — Examples: assigning columns to a variable Table 4 — Examples: assigning columns to a variable
-Sentinel3G Concepts 43+ 
-Note The Agent Options form includes several fields (e.g. Clear+<b>Note</b>: The Agent Options form includes several fields (e.g. Clear pattern, Skip initial lines, Strip initial chars) that allow you to strip unwanted lines and characters from the output before assigning columns to variables. All processing of whitespace, tabs, column numbers, or patterns, takes place after these fields have been processed. For example if Skip initial chars = 6, the column that the variable sees as c1 would actually be the 7th character in the original data (assuming that Skip pattern hasn’t removed even more characters).
-pattern, Skip initial lines, Strip initial chars)+
-that allow you to strip unwanted lines and characters from the output+
-before assigning columns to variables. Any processing of whitespace,+
-tabs, column numbers, or patterns, takes place after these fields have+
-been processed. For example if Skip initial chars = 6, the+
-column that the variable sees as c1 would actually be the 7th character+
-in the original data (assuming that Skip pattern hasn’t removed+
-even more characters).+
=== Trigger Variables === === Trigger Variables ===
-A trigger variable is used to compare an earlier value of a variable with its current+A trigger variable is used to compare an earlier value of a variable with its current value. Trigger variables ‘remember’ a value from an earlier poll. (The name comes from the way in which the saving of the variable is triggered by a state change.) All other variables are set or recalculated every time the agent returns data, meaning that the previous value is overwritten.
-value. Trigger variables ‘remember’ a value from an earlier poll. (The name comes+ 
-from the way in which the saving of the variable is triggered by a state change.) All+Example: batches of update transactions are added to a data file once or twice a day. You want to write a sentry that notifies you whenever the spool file’s modification time changes. Using the FileInfo agent, you create a raw variable called mtime to store the current modification time, and a trigger variable called prev_mtime.
-other variables are set or recalculated every time the agent returns data, meaning+ 
-that the previous value is overwritten.+The Initial value of prev_mtime is set to $mtime, and the Expression is also set to $mtime.
-Example: batches of update transactions are added to a data file once or twice a day.+ 
-You want to write a sentry that notifies you whenever the spool file’s modification+
-time changes. Using the FileInfo agent, you create a raw variable called mtime+
-to store the current modification time, and a trigger variable called prev_mtime.+
-The Initial value of prev_mtime is set to $mtime, and the Expression+
-is also set to $mtime.+
Next you create a sentry with two states: Next you create a sentry with two states:
-Normal state has prev_mtime in its list of Trigger vars.+*Normal state has prev_mtime in its list of Trigger vars.
-NewTrans state has no Trigger vars. Its Entry condition is+*NewTrans state has no Trigger vars. Its Entry condition is "$mtime != $prev_mtime". It has a response to return to normal state after receiving acknowledgement from an operator.
-"$mtime != $prev_mtime". It has a response to return to normal state+ 
-after receiving acknowledgement from an operator.+When the file changes, the operating system updates its modification time. The sentry detects that the new value for $mtime is different from $prev_mtime, and changes to NewTrans state. When an operator acknowledges the event, indicating that the new transactions have been noted, the sentry is returned to normal state.
-When the file changes, the operating system updates its modification time. The sentry+ 
-detects that the new value for $mtime is different from $prev_mtime, and+At this point the trigger variable is recomputed, setting $prev_mtime to the new $mtime.
-changes to NewTrans state. When an operator acknowledges the event, indicating+
-that the new transactions have been noted, the sentry is returned to normal state.+
-At this point the trigger variable is recomputed, setting $prev_mtime to the new+
-$mtime.+
-44 Sentinel3G Concepts+
=== History Variables === === History Variables ===
-History variables store the recent values returned by an agent variable. They can be+History variables store the recent values returned by an agent variable. They can be used to generate a realtime graph showing recent changes the data, and in state conditions, to average out spikes and gaps in the data. You can keep either a set number of values, or keep all values in a set period.
-used to generate a realtime graph showing recent changes the data, and in state conditions,+ 
-to average out spikes and gaps in the data. You can keep either a set number+Note that history is suited to fairly short-term analysis. For longer term analysis such as capacity planning, use logged variables.
-of values, or keep all values in a set period.+ 
-Note that history is suited to fairly short-term analysis. For longer term analysis+==== Using history variables in realtime graphs ====
-such as capacity planning, use logged variables.+ 
-Using history variables in realtime graphs+You can generate a realtime graph showing recent changes in a variable. If a variable’s history has been saved, and the graph is defined to graph the last N values, it will use the variable’s history to get these values (or as many as there are available).
-You can generate a realtime graph showing recent changes in a variable. If a variable’s+ 
-history has been saved, and the graph is defined to graph the last N values, it+==== Using history variables in state conditions ====
-will use the variable’s history to get these values (or as many as there are available).+ 
-Using history variables in state conditions+You can use history variables in state conditions to handle exceptions in the data such as ‘spikes’ (high or low values which are transient and do not need to be displayed or acted on), to calculate an average over a number of polls, or to calculate a rate when the agent only returns a raw count etc.
-You can use history variables in state conditions to handle exceptions in the data+ 
-such as ‘spikes’ (high or low values which are transient and do not need to be displayed+=== Functions for Accessing History Variables ===
-or acted on), and to calculate an average rate to use where the data is temporarily+ 
-not being returned from the agent.+This topic describes the methods or functions that can be performed on history variables. You would typically use these in an expression, either in a state condition or in a derived variable.
-Functions for Accessing History Variables+ 
-This topic describes the methods or functions that can be performed on history+History variables can be thought of as an array containing two fields: the value and the time. The array is accessed backwards in time: index 0 is the current value, 1 is the previous value, and so on. It can be written as two arrays: Hval[n] and Htime[n]. You can reference the variable’s history by putting "@" in front of the variable name (normally you refer to a variable by putting a "$" in front of the name, which returns the current value). You access history variables using one of the predefined methods. The TCL syntax is:
-variables. You would typically use these in an expression, either in a state condition+ 
-or in a derived variable.+ [@<hist-var> <method> <optional params>]
-History variables can be thought of as an array containing two fields: the value and+ 
-the time. The array is accessed backwards in time: index 0 is the current value, 1 is the+Example: [@cpu_idle_hist value 1]
-previous value, and so on. It can be written as two arrays: Hval[n] and+ 
-Htime[n]. You can reference the variable’s history by putting "@" in front of the+This returns the previous value (index 1) of the history variable cpu_idle_hist. (Note the square brackets around the call to the method). You can use this within an expression or condition:
-variable name (normally you refer to a variable by putting a "$" in front of the name,+ [@cpu_idle_hist value 1] / 100.0
-Sentinel3G Concepts 45+ 
-which returns the current value). You access history variables using one of the predefined+
-methods. The TCL syntax is:+
-[@ <hist-var> <method> <optional params>]+
-Example: [@cpu_idle_hist value 1]+
-This returns the previous value (index 1) of the history variable cpu_idle_hist.+
-(Note the square brackets around the call to the method). You can use this within an+
-expression or condition:+
-[@cpu_idle_hist value 1] / 100.0+
Table 5 lists the functions available to process history variables. Table 5 lists the functions available to process history variables.
-Function Description+ 
-value <index> Return the value of a particular index+{| border="1" cellpadding="3" cellspacing="0"
-of a history variable. Omitting the+|-
-index will return the most recent+|<strong>Function</strong>
-element.+|<strong>Description</strong>
-[@accesses_kb value] = 0+|
-[@accesses_kb value 14] = 12.5+|-
-value_at <clock> The value at a given time (clock+|value <index>
-format)+|Return the value of a particular index of a history variable. Omitting the index will return the most recent element.
-[@free_space value_at+|[@accesses_kb value] => 0
-1052824642] = 5+ 
-average Return the arithmetic mean of a history+[@accesses_kb value 14] => 12.5
-variable over its entire history+|-
-[@response_time average] = 9.5+|value_at <clock>
-max Return the maximum history value+|The value at a given time (clock format)
-(of the current values)+|[@free_space value_at 1052824642] => 5
-[@cpu_usage max] = 17.0+|-
-min Return the minimum history value+|average
-(of the current values)+|Return the arithmetic mean of a history variable over its entire history
-[@raw_packets_out min] = 0.0+|[@response_time average] => 9.5
-earliest_time The value of Htime[ end] where end+|-
-is the oldest value+|max
-[@cpu_usage earliest_time] = 50+|Return the maximum history value (of the current values)
-diff <index> Return the difference between the+|[@cpu_usage max] => 97.0
-most recent element and the element+|-
-at the specified index. Omitting the+|min
-index returns the difference between+|Return the minimum history value (of the current values)
-the most and least recent elements.+|[@raw_packets_out min] => 0.0
-[@cpu_usage diff 5] = -25.4+|-
-[@free_space diff] = 15+|earliest_time
-Table 5 — Functions for accessing history variables+|The value of Htime[end] where end is the oldest value
-46 Sentinel3G Concepts+|[@cpu_usage earliest_time] => 1171194067
-Notes:+|-
-• “end” refers to the oldest index available. You cannot actually use the string+|earliest_value
-“end” in your expressions.+|The value of Hval[end] where end is the oldest value
-• Generally, if index is not specified, end is assumed.+|[@cpu_usage earliest_value] => 51.6
-• Although you can have history on any type of variable, some methods such+|-
-as average assume that the variable is a number.+|diff <index>
-• If a function references a variable that does not exist, the Host Monitor log+|Return the difference between the most recent element and the element at the specified index. Omitting the index returns the difference between the most and least recent elements.
-file will display a message and the agent or sentry will not be started.+|[@cpu_usage diff 5] => -25.4
-rate <index> The elements are averaged over the+ 
-time between the elements. Whatever+[@free_space diff] => 15
-the unit of the history variable, the+|-
-result is always “units per second”.+|rate <index>
-For example, if the history is in+|The elements are averaged over the time between the elements. Whatever the unit of the history variable, the result is always “units per second”. For example, if the history is in “MB”, the result will be in “MB per second”.
-“MB”, the result will be in “MB per+|[@free_space rate] => 0.6
-second”.+ 
-[@free_space rate] = 0.6+[@file_size rate 5] => 1.5
-[@file_size rate 5] = 1.5+|-
-diff_rate <index> This is similar to rate, but is but+|diff_rate <index>
-used for agents that return a cumulative+|This is similar to rate, but is but used for agents that return a cumulative number. First the difference between the elements is calculated, then this is divided by the time.
-number. First the difference+|[@total_kbs_sent diff_rate] => 1.5
-between the elements is calculated,+ 
-then this is divided by the time.+[@cum_cpu_time diff_rate 1] => 0.2
-[@total_kbs_sent] = 1.5+|-
-[@cum_cpu_time 1] = 0.2+|count <condition>
-count <condition> Return the number of values that+|Return the number of values that match a simple condition, such as “< 7”. If <condition> is omitted a count of the current number of history values is returned.
-match a simple condition, such as+|[@free_space count "< 150"] => 1
-“< 7”.+ 
-If <condition> is omitted a count of+
-the current number of history values+
-is returned.+
-[@free_space count "< 150"] = 1+
[@cpu_idle_hist count “> 90”] [@cpu_idle_hist count “> 90”]
-(This returns the number of values+ 
-whose value is greater than 90.)+(This returns the number of values whose value is greater than 90.)
-percent <condition> This is similar to count, but returns+|-
-the percentage of values meeting the+|percent <condition>
-condition.+|This is similar to count, but returns the percentage of values meeting the condition.
-[@response_time "< 2000"] = 9.5+|[@response_time "< 2000"] => 9.5
-Function Description+|}
Table 5 — Functions for accessing history variables Table 5 — Functions for accessing history variables
-Sentinel3G Concepts 47+ 
-Using Standard TCL Functions+Notes:
-To use a function, enclose it in square brackets. For example: [string range+*“end” refers to the oldest index available. You cannot actually use the string “end” in your expressions.
-"hello" 0 1] will return the value of “he”. All standard TCL functions are available.+*Generally, if index is not specified, end is assumed.
-Using variables in functions+*Although you can have history on any type of variable, some methods such as average assume that the variable is a number.
-Scalar variables are referenced by adding a $ to the front. For example, if you had a+*If a function references a variable that does not exist, the Host Monitor log file will display a message and the agent or sentry will not be started.
-string called hostname, you could convert it to uppercase using this command:+ 
-[string toupper $hostname].+=== Using Standard TCL Functions ===
-History variables are referenced by adding an @ to the front. History variables cannot+ 
-be used in standard TCL functions, only the functions mentioned below. History+To use a function, enclose it in square brackets. For example: [string range "hello" 0 1] will return the value of “he”. All standard TCL functions are available. See http://www.tcl.tk/man/tcl8.4/TclCmd/contents.htm
-functions are accessed slightly differently than scalar functions in that the+ 
-variable comes first. For example, if you wanted to calculate the average over a history+==== Using variables in functions ====
-variable called response_time you would enter: [@response_time+Scalar variables are referenced by adding a $ to the front. For example, if you had a string called hostname, you could convert it to uppercase using this command: [string toupper $hostname].
-average]. If the function takes parameters, they come after the function name:+ 
-[@my_history function <args>].+History variables are referenced by adding an @ to the front. History variables cannot be used in standard TCL functions, only the functions mentioned below. History functions are accessed slightly differently than scalar functions in that the variable comes first. For example, if you wanted to calculate the average over a history variable called response_time you would enter:
-History variables are indexed in reverse order. The newest element will always be at+ [@response_time average]
-index 0. Therefore index 1 is the second most recent and index 5 is the sixth most+ 
-recent element.+If the function takes parameters, they come after the function name:
-Additional scalar functions+ [@my_history function <args>]
-Table 6 lists some additional functions. The examples all use numbers, but you can+ 
-replace any of these with scalar variables (e.g: $my_variable). If you reference a+History variables are indexed in reverse order. The newest element will always be at index 0. Therefore index 1 is the second most recent and index 5 is the sixth most recent element.
-variable that does not exist, the Host Monitor log file will display a message and the+ 
-agent or sentry will not be started.+==== Additional scalar functions ====
-48 Sentinel3G Concepts+Table 6 lists some additional functions. The examples all use numbers, but you can replace any of these with scalar variables (e.g: $my_variable). If you reference a variable that does not exist, the Host Monitor log file will display a message and the
 +agent or sentry will not be started.
 + 
Table 6 — Additional TCL functions Table 6 — Additional TCL functions
-Function Description Examples+{| border="1" cellpadding="3" cellspacing="0"
-percent <value> <total> Divide <value> by <total>+|-
-and return answer as a percentage.+|<strong>Function</strong>
-If <total> is zero,+|<strong>Description</strong>
-the return value is always zero.+|<strong>Examples</strong>
-[percent 45 100] = 45.0+|-
-[percent 75 150] = 50.0+|percent <value> <total>
-[percent 100 0] = 0+|Divide <value> by <total> and return answer as a percentage.
-div <v1> <v2> Safely divide <v1> by <v2>.+ 
-If <v2> is zero, the return+If <total> is zero, the return value is always zero.
-value is always zero.+|[percent 45 100] => 45.0
-[div 5 10] = 0.5[div 10 0] = 0+ 
-round <n> <multiple> Round <n> to the nearest+[percent 75 150] => 50.0
-<multiple>.+ 
-[round 12345 100] = 12300+[percent 100 0] => 0
-[round 1.2345 0.01] = 1.23+|-
-[round 1.3 0.5] = 1.5+|div <v1> <v2>
-hostname <IP address> Return the host name for a+|Safely divide <v1> by <v2>.
-given IP address+If <v2> is zero, the return value is always zero.
-[hostname 99.99.99.99] =+|[div 5 10] => 0.5
-www.s3gxyz.com+ 
-clock_to_db <secs> Convert a time in “clock seconds”+[div 10 0] => 0
-to internal date/time+|-
-(YYYYMMDD.hhmmss).+|round <n> <multiple>
-[clock_to_db 1052824642] =+|Round <n> to the nearest <multiple>.
-20030513.121722+|[round 12345 100] => 12300
-db_to_clock <date> Convert internal format+ 
-date/time to “clock seconds”.+[round 1.2345 0.01] => 1.23
-[db_to_clock 20030513.121722] =+ 
-1052824642+[round 1.3 0.5] => 1.5
-fmt_clock <secs> Convert “clock seconds”+|-
-date/time to display format.+|hostname <IP address>
-[fmt_clock 1052824642] =+|Return the host name for a given IP address
-13/05/03-12:17+|[hostname 99.99.99.99] => www.s3gxyz.com
-[fmt_clock 0] = 01/01/70-01:00+|-
-fmt_boolean <val> Format boolean type for+|clock_to_db <secs>
-display.+|Convert a time in “clock seconds” to internal date/time (YYYYMMDD.hhmmss).
-[fmt_boolean 0] = false+|[clock_to_db 1052824642] => 20030513.121722
-[fmt_boolean 1] = true+|-
-fmt_date <date> Format date type (YYYYMMDD)+|db_to_clock <date>
-for display.+|Convert internal format date/time to “clock seconds”.
-[fmt_date 20030513] = 13/05/03+|[db_to_clock 20030513.121722] => 1052824642
-fmt_datetime <date> Format datetime type+|-
-(YYYYMMDD.hhmmss) for+|fmt_clock <secs>
-display.+|Convert “clock seconds” date/time to display format.
-[fmt_datetime 20030513.121722] =+|[fmt_clock 1052824642] => 13/05/03-12:17
-13/05/03-12:17+ 
-fmt_uptime <secs> Format uptime (in seconds)+[fmt_clock 0] => 01/01/70-01:00
-for display.+|-
-[fmt_uptime 0] = 0 secs+|fmt_boolean <val>
-[fmt_uptime 300] = 5.0 mins+|Format boolean type for display.
-[fmt_uptime 4000] = 1.1 hrs+|[fmt_boolean 0] => false
-[fmt_uptime 100000] = 1.2 days+ 
-Sentinel3G Concepts 49+[fmt_boolean 1] => true
-Constants and Thresholds+|-
-Constants are like variables, but unlike variables are associated with a sentry rather+|fmt_date <date>
-than an agent. They are typically used to set different thresholds for sentries that+|Format date type (YYYYMMDD) for display.
-share the same states. This is useful for monitoring different “sizes” of the same+|[fmt_date 20030513] => 13/05/03
-type of resource using different criteria.+|-
-How it works: States refer to constants by name, so the same set of states can be+|fmt_datetime <date>
-shared between the two sentries. Because constants are not shared, you can set the+|Format datetime type (YYYYMMDD.hhmmss) for display.
-constants of the two sentries to different values.+|[fmt_datetime 20030513.121722] => 13/05/03-12:17
-For example, the acceptable minimum amount of free space on a filesystem depends+|-
-on its size and volatility. 3% may be an acceptable threshold for a fairly static 100+|fmt_uptime <secs>
-GB filesystem, but dangerously low for a volatile 15 GB filesystem. In this case you+|Format uptime (in seconds) for display.
-could have each sentry sharing the same states, including a state called VERY_LOW.+|[fmt_uptime 0] => 0 secs
-The entry condition for this state would test the current value reported by the agent+ 
-against a constant, also called VERY_LOW. The difference is that for the+[fmt_uptime 300] => 5.0 mins
-large_filesys sentry, the constant VERY_LOW is set to 3%, while for the+ 
-small_filesys sentry, the constant VERY_LOW is set to 8%:+[fmt_uptime 4000] => 1.1 hrs
 + 
 +[fmt_uptime 100000] => 1.2 days
 +|-
 +|in_schedule <schedule>
 +|Determine if the current time is in the given schedule.
 +|[in_schedule Weekends] => false
 +|}
 + 
 +=== Constants and Thresholds ===
 + 
 +Constants are like variables, but unlike variables are associated with a sentry rather than an agent. They are typically used to set different thresholds for sentries that share the same states. This is useful for monitoring different “sizes” of the same type of resource using different criteria. Constants are by convention on UPPER CASE to differentiate them from variables.
 + 
 +<b>How it works</b>: States refer to constants by name, so the same set of states can be shared between the two sentries. Because constants are not shared, you can set the constants of the two sentries to different values.
 + 
 +For example, the acceptable minimum amount of free space on a filesystem depends on its size and volatility. 3% may be an acceptable threshold for a fairly static 100 GB filesystem, but dangerously low for a volatile 15 GB filesystem. In this case you could have each sentry sharing the same states, including a state called Very_Low.
 + 
 +The entry condition for this state would test the current value reported by the agent against a constant, called VERY_LOW. The difference is that for the large_filesys sentry, the constant VERY_LOW is set to 3%, while for the small_filesys sentry, the constant VERY_LOW is set to 8%:
 + 
 +{| border="1" cellpadding="3" cellspacing="0"
 +|-
 +|<strong>sentry </strong>
 +|<strong>state name (shared)</strong>
 +|<strong>entry condition for this state</strong>
 +|<strong>VERY_LOW constant</strong>
 +|-
 +|small_filesys
 +|Very_Low
 +|$pct_free < $VERY_LOW
 +|8
 +|-
 +|large_filesys
 +|Very_Low
 +|$pct_free < $VERY_LOW
 +|3
 +|}
 + 
Table 7 — Different thresholds for sentries that share the same states Table 7 — Different thresholds for sentries that share the same states
 +
Constants may also be used as a visual aid on realtime graphs. Constants may also be used as a visual aid on realtime graphs.
-sentry state name (shared) entry condition for this state+ 
-VERY_LOW+<b>Note</b>: The values of constants may also be set in <b>Instance Groups</b>, and these will override the values defined in the <b>sentry</b> but only for sentries in that particular instance group. In fact, this example is probably better implemented using a single sentry with two <b>Instance Groups</b>.
-constant+
-small_filesys VERY_LOW $pct_free < $VERY_LOW 8+
-large_filesys VERY_LOW $pct_free < $VERY_LOW 3+
-50 Sentinel3G Concepts+
== Sentinel Processes and Configuration == == Sentinel Processes and Configuration ==

Current revision

This section explains the main concepts underlying Sentinel3G.

Contents

How Sentinel3G Works

Sentinel3G has four main functions:

Sentries monitor resources or processes such as devices, subsystems or applications.

Agents collect data about sentries on each host on behalf of the host monitor. An event occurs when a sentry being monitored changes state. Typically, events are diagnosed by a host monitor from data supplied by agents. The state change could result from a single raw value crossing some predetermined threshold; or it could be a trend derived from the raw data, such as the rate of change of a value. Sentinel3G includes file-watching agents that detect events in system files and log files.

Host monitors report events to the Event Manager on the central host.

The Host Monitor takes some action based on the severity of the event, such as running a predefined command. Persistent problems that can’t be resolved automatically are escalated and passed to operations staff for action. Staff are notified of events by a console, and if necessary by some other means such as e-mail.

A console is a kind of ‘heads-up’ display that gives a concise hierarchical view of the current state of the sentries being monitored. It is both a means to alert operators of an event and a means for them to monitor and respond to events.

Consoles can present information in customized views, such as by region, by host, or by function. Different classes of user see an appropriate level of detail: from a broad enterprise-wide summary for managers to fine detail for operators and enduser administrators. Console users can select a predefined view or sort and filter sentry data to help diagnose problems. Reports provide more details about the current state of the sentry. Graphs chart the changes in the value of agent variables and provide a recent history of the state of the sentry.

At any time, a user can see from the console what events are happening and how serious they are. If reports have been defined, the operator can choose to view them. If there are predefined actions attached to the sentry the user can choose one to run.

Related information about an application or system component (basically its sentries, agents, events and responses) is grouped into a knowledge base. Sentinel3G includes a UNIX/Linux knowledge base. Other knowledge bases (such as Oracle, Windows NT/2000/XP, Network Services) are available as add-ons.

Because events are detected on each local host, only state transitions and not raw data need to be reported to the central host, so data traffic is minimal. This means that Sentinel3G itself adds little extra load to your system, even when monitoring large networks.

The Console

The console is the primary user interface to Sentinel3G, and is the most common way for Sentinel3G to report events and for users to monitor sentries and respond to events. A console displays a series of sentries as icons, which are grouped hierarchically into folders. Different users may have different console views, and different privileges controlling what they are able to see and do.

Day-to-day console operations are described in Monitoring Sentries From the Console.

Overlays and Indicator Icons

Each sentry and folder is represented on the console by an icon. Overlays are small icons that modify the appearance the main icon to show:

This topic lists all of the overlay icons by type and gives a brief description.

Overlays that represent the type of a sentry or folder Indicators that represent a sentry’s state If an indicator is not specified for a state, the default indicator specified in the sentry (thermometer or pie chart) will be used. See Indicators that represent data values from a sentry’s variable.

If no overlay is specified in the sentry or state details, the default overlay icon and color for the current severity is used. See Indicators that represent a sentry’s severity

Icon Where Description
Bottom left This object is a user-defined folder, containing sentries and possibly other sub-folders.
Bottom left This is a locked or system folder– its contents can be modified but the folder itself can’t be removed as it is required by Sentinel3G.
Top right This is an information-only sentry. It has no states. It gives information through its console text and property sheet.


Icon Where Description
Bottom right This sentry is requesting acknowledgement from an operator before changing to another state.
Top right This sentry represents a service that is running.
Top right This sentry represents a service that is not running. Check the console text, notes, or property sheet to find out why.
Top left Notification for this sentry has been disabled.


Indicators that represent a sentry’s severity

If it is not specified for a given sentry then the default indicator representing the current severity is used. The color of the overlay is determined by the severity of the sentry.

The severity of a folder is the maximum severity of all the sentries and sub-folders it contains.

Icon Color Severity Description
Wait Sentry is starting
grey Disabled Sentry is in a state where data is not being returned. Examples: the Host Monitor is down, or the resource itself is disabled.
grey Down Sentry is reporting that a service is not running
Normal Sentry is indicating that there are no problems
blue Information Sentry is reporting matters of interest
orange Warning Sentry is reporting a potential problem
red Alarm Sentry has detected a serious problem that should be investigated as soon as possible
red, flashing Severe Sentry has detected a very serious problem that must be investigated now
magenta, flashing Critical Sentry has detected an extremely serious problem affecting the network or a key application, system, or service. Immediate action is needed.

Indicators that represent data values from a sentry’s variable

Two types of overlay icon, called indicators, can represent actual data from a variable.

A percentage value is mapped to either small pie chart or thermometer, in increments of at least 10 percent. This gives an immediate indication of what the data value is. The type of overlay (pie chart or thermometer) may be specified for each sentry.

The amount of the ‘pie’ that is filled in or the height of the thermometer’s filled-in area gives a rough indication of the quantity being reported. Here are some examples:

Indicator Type 0% 30% 50% 80% 100%
Pie Chart
Thermometer

Sentries and States

A sentry is an individual object or resource that is being monitored though Sentinel3G. Some examples:

Sentries are grouped into classes and are represented on the console as icons. Each sentry has an agent (or possibly more than one agent) that collects data on its behalf about the resource or object being monitored. The data determines what state the sentry is in at any time.

An information-only sentry has no states attached to it, but simply provides useful status information to operators in the form of console text or via its property sheet.

You can maintain most things about a sentry from the console, including its constants, actions, agent, and variables. For example, to configure a sentry’s states, just select the sentry and then select Configure > States.

States

A sentry’s state represents its current operating status or condition. The entry condition for each state is evaluated in turn until one evaluates to true. Most sentries have a normal state, indicating that it is operating satisfactorily and requires no action, and a number of other abnormal states of increasing severity. For example, a simple sentry that monitors a service may have only a couple of states showing whether the service is running or not running. A sentry that monitors a resource such as disk space or memory may have several states whose severity increases as the availability of the resource decreases.

For each state you can:

A sentry does not have to have a state for every severity level. You can define more than one state for the same severity. Although it is possible to define a large number of states, representing small changes in the sentry, it’s better to have a minimum number of states corresponding to real differences in urgency or severity.

Events

An event is an external incident or condition on a particular host or in a particular application or device that is detected by Sentinel3G and passed to the Event Manager for action. In simple terms, an event is a condition that causes a sentry to move from one state to another state.

Entry condition

The entry condition is a TCL expression made up of any combination of agent variables, constants, text strings, numbers, history variables, boolean values, and TCL functions. Typically an entry condition tests the value of an agent variable against a predefined constant or threshold. Some examples:

The entry conditions should cover all possible values returned by the agent. If none of the entry conditions is true, the sentry is put in undefined state. If a sentry is in Failed state, it indicates a problem with the agent (usually that it failed to start or has never returned any valid data). If a state’s entry condition is left blank it always evaluates to true.

Copying states

When you add a sentry, you can choose to copy the states of another sentry. If the states for the new sentry need to be similar but not identical, you can first copy then edit them. Changes to the states of the new sentry will not affect the original sentry.

Severity

Each state that a sentry can be in has a severity level, representing how serious the event is. When you define each state, the standard severity levels are listed in order of increasing severity from normal to critical.

The severity determines how the sentry is displayed on the console—its color, and if it has an indicator icon, the color of the indicator and whether it flashes. The severity is also used for notification. A notification message will be sent if the severity of a sentry is greater than or equal to either:

Notes about severities

disabled is a special severity that can be used to indicate when a sentry is ‘down’ or otherwise unavailable, but doesn’t require attention. Examples:

information severity shows operators that the sentry has some useful information to report. This can be used as a state above normal state, where there is no problem serious enough to require going into a warning state or higher.

Note that this is different from an ‘information-only’ sentry, which has no states and only exists to provide information.

Instances

Some agents can return data for multiple objects, such as disks on a computer, tablespaces in a database etc. In Sentinel3G these objects are called "Instances", and only one sentry need be configured to handle all the instances. Instances can be listed explicitly in the sentry, or more commonly, the sentry can be defined as "cloning", which means that for each unique instance returned by the primary agent, the sentry will automatically create a new instance of itself.

A sentry can optionally define a number of "Instance Groups", providing the ability, among other things, to assign different threshold values to different instances, depending upon their group.

Expressions

TCL expressions are used when defining state conditions, console text, and variables.

For example, state conditions include a expression which, when evaluated, returns true or false to indicate whether the sentry is currently in that state.

Expression are written using TCL syntax and can refer to any variables belonging to the sentry’s primary agent or secondary agents. Normally variables are prefixed with "$". However in console text, variables may instead be prefixed with "&" which displays the variable in a formatted form, including any units. Finally, the history of a variable can be accessed by prefixing the variable with "@". See History Variables for more details.

Example:

Disk $disk I/O rate: $io_rate => Disk hd2 I/O rate: 145.7

Disk $disk I/O rate: &io_rate => Disk hd2 I/O rate: 145.7MB/sec

The following tables list other internal variables that are also available for use in expressions.


Variable Description
$Sentry The name of the sentry
$Class The name of the sentry's class (aka folder)
$Host The sentry's host
$Instance The sentry's instance (if any)
$Group The name of the instance group (if any)
$State The current state that the sentry is in
$Since The time that when the sentry last changed state
$Severity The current severity of the sentry
$PrevState The previous state of the sentry
$Agent The name of the primary agent
$PollTime The polltime of the primary agent in seconds (polled agents only)

Table 1a — Internal variables available in sentry and state expressions


Variable Description
$Agent The name of the agent
$PollTime The polltime of the agent in seconds (polled agents only)
$Instance The agent's instance (if any)
$data The value of the variable as received from the agent (raw variables only)

Table 1b — Internal variables available in raw and derived variable expressions

Actions

Actions are predefined responses associated with a sentry that may be invoked by an operator from the console. Each action is a command that is run on the same host as the host monitor. Actions may be associated with a particular state or may be available at any time.

There are two types:

You can design a single action to work both on selected instances of a multi-instance sentry and on every instance in a selected parent folder. For example, you can set up an action so that the output for every selected instance is combined into one report.

Tasks that don’t require any action or judgement by an operator and can safely be run automatically are better implemented as responses. Data is passed to an action from the host monitor either by being written to the action’s STDIN, or, if the flag Uses agent data: is set to yes, through the environment variables $Sentry, $Host, and $Action. For multi-instance sentries you can refer to a specific named instance or use $Instance, which contains the instance name of the primary agent.

Note: History data and functions and the & <varname> syntax, which are available in state conditions and console text, 
cannot be used in an action. To pass the value of a history function, use a derived variable.

If you wish to format a value returned by an agent you must do it manually in the command.

Examples: defining reports

Example 1 shows how to define a simple report for a single-instance sentry, without using any agent variables. When the report is run it will display in a browser window the name of this action (‘Sentry Details Report 1’), the date, the name of the sentry, and the host it runs on.

Action Sentry Details Report 1
Type report
Command echo -n "Report '$Action' "; date; echo " Sentry: $Sentry"; echo " Host: $Host"
Display command browser
Uses agent data? no
Reads from STDIN? (N/A)
Export to parent? no

In example 2 the agent variables associated with the sentry are exported to the environment (Uses agent data? yes) so that they can be used in the command.

Action Sentry Details Report 2
Type report
Command echo "Free space on $Filesystem = $pct_free%"
Display command browser
Uses agent data? yes
Reads from STDIN? no
Export to parent? no

When you select a filesystem from the console and run the action, the report will show the free space on that filesystem. If you select multiple filesystems, the command will be run once for each instance, and the output window will show one row for each filesystem.

Example 3 demonstrates another way to make data available to an action, this time by reading from STDIN. This passes any agent data to the sentry in Functional Database format (a plain-text table, with rows separated by a newline and fields separated by a tab).

Action Sentry Details Report 3
Type report
Command cat -
Display command db_scroll
Uses agent data? yes
Reads from STDIN? yes
Export to parent? no

When you select a filesystem from the console and run the action, the report will show the raw database row containing the filesystem variables. To read this in a script, you would then need to use db_readrow, a Functional Toolset program.

Use this option if you are familiar with the Functional Toolset and wish to use it to manipulate the data. With this method, unlike the previous example, the command is only run once. The database rows are accumulated before piping them to the Command. Try selecting multiple filesystems and running the action. Note that there is one header row and multiple data rows.

In Example 4, you export the action to the parent folder (Export to parent? yes). This makes the action available from the context menu when the operator clicks on the folder background (that is, no sentries are selected), or on the parent class folder.

Action Sentry Details Report 4
Type report
Command echo "Free space on $Filesystem = $pct_free%"
Display command browser
Uses agent data? yes
Reads from STDIN? yes
Export to parent? yes

When you run this action by clicking on the background of the folder or on the parent class folder, it is the same as selecting all instances and then running the action. If the action were configured on a single instance sentry, it is the same as selecting that single sentry and running the action.

Example: defining an action

Example 5 runs a command to stop the service represented by this sentry.

Action Stop service
Type action
Command system_service $Filename stop
Access role Manager
Authenticate yes
Run as user root
In state(s) Confused Running
Uses agent data? yes

Responses

Responses are actions that are run automatically by the Host Monitor when a sentry is in a particular state. You can define a series of responses for each state that is tailored to the severity of the problem.

Each response may run immediately, or there may be a waiting period after the sentry first enters this state or after the running of a previous response. Figure 4 shows an example of the full set of responses defined for a sentry while it is in warning state.

Each response period is cumulative. In other words the period for Response #2 is counted from the end of the period for Response #1. Example: Response #1 is defined to go to a new severity of warning after 120 seconds. Response #2 is defined to notify after 60 seconds, which will be 180 seconds after the sentry entered this state.

The response Command can attempt to remedy a situation. If successful it will typically return the sentry to a normal state. If the Command does not succeed, you may choose to leave the sentry in that state, and specify a later response to run another command or to notify someone, or simply to elevate the severity.

Another possible response is to force an agent to be polled at the end of the response period. This is called ‘firing’ the agent. You can fire the primary agent to refresh the variables used by the sentry, or fire another agent to collect additional data. This is useful if you performed an immediate response to try to correct the situation, and you want to check quickly if this has worked rather than waiting until the next poll of the agent.

Where a sentry experiences occasional temporary situations which usually correct themselves quickly, you may not want to take action or be notified unless the sentry has been in that state for some minimum period.

If a sentry changes state while it is waiting to process a response (that is, before the end of the waiting period), then all responses for this state are cancelled, and any responses for the new state are started.

Example: as free disk space in a filesystem reaches a dangerously low level, Sentinel3G can run a series of commands such as:

Any helpful task that can safely be run without prior checking can be set up as an automatic response to an event. Tasks that require some action or judgement by an operator are better implemented as actions.

Escalation

Another way to respond to an alert is simply to wait for a while to see if the problem corrects itself, then to change to another state at the end of that period.

For example, a sentry may be defined to wait up to 300 seconds in warning state, then to change to alarm state. The change of state may depend on manual confirmation from an operator (Acknowledgement) or it may happen automatically (Escalation).

If the problem is normally transient and self-correcting, you could put the sentry into a warning state for a few minutes. At this point the appearance of the sentry is simply a passive signal that the sentry is not in its normal state. If the sentry is still in warning state at the end of this period, it indicates that the problem is unlikely to resolve itself. In this case you could change the sentry to a more severe state with its own set of responses.

In other cases you might return the sentry to a normal state if no other events have occurred by the end of the period. For example, a warning message appearing in a system log file may indicate a potential performance problem, but if no other messages are logged in the next few minutes it may be safe to return the sentry to normal state.

Another use for escalation is to “chain together” several responses by splitting them over two states. Each state has a maximum of three responses.

Note that it may take several seconds for the escalation to be processed at the end of the waiting period.

Acknowledgement

A sentry may request acknowledgement from an operator before changing to another state. This is usually done to confirm that an operator has been made aware of a probable “one-off ” incident before returning the sentry to normal state. For example, if the Bad_SU sentry detects a single failed attempt to gain root privileges, it remains in Report state until:

Prompting for acknowledgement verifies that an operator was made aware of the condition at the time, which can be useful for audit or training purposes. You should provide monitoring notes to help operators understand what their options are when the sentry is in this state, and what will happen next if they acknowledge the alert.

If a sentry is waiting for acknowledgement this overlay icon will appear next to it.

Notification

Sentinel3G can notify a list of staff by e-mail when an event is detected. This is a useful way to alert staff who do not normally run or are not currently running a console. There are three layers or types of notification:

Note that operators can disable notification for selected sentries from the console.

Figure 5 shows a scheme that combines global and sentry-level notification. The NotifyLevel setting is set to severe, so global notification will normally be triggered by any sentry that goes into a state whose severity is severe or critical. There are two exceptions to this: SentryB will send a notification message (perhaps to a different list of recipients) if it goes into a state whose severity is alarm or higher; SentryC will send a notification message only if it goes into a state whose severity is critical.

Figure 5 — Example of both global and sentry-level notification

Figure 6 shows an example of state-level notification. This sentry waits for 300 seconds after entering low state, then runs a script to try to fix the problem. If the sentry is still in low state after another 120 seconds, a notification message is sent to recipients in opsgroup.

State: sufficient

Severity: normal

State: low

Severity: warning

Response 1:

After 300 secs:

Command:

/usr/local/bin/rmtmpfiles

Response 2:

After 120 secs:

Notify:

opsgroup

State: very_low

Severity: alarm

Figure 6 — Example of state-level notification

Global notification is the simplest form to implement as it is set in one place and applies to all sentries. In more complex environments where different people should be notified when different events occur, it may be more appropriate to configure notification at the sentry, instance group or state level.

Agents and Variables

Agents collect data on behalf of sentries. A typical agent works by polling, or running a command at regular intervals. Each time the command runs, its output is stored in a number of variables. These variables are passed to the host monitor to be processed on behalf of sentries, for example to evaluate what state the sentry is in and to display data on the console.

Other types of agent don’t poll but simply wait to receive data, for example from:

Primary and secondary agents

Each sentry has one agent, called its primary agent, that supplies most or all of its variables. A sentry can also access variables belonging to other agents, which are called its secondary agents. Variables are simply referred to by name: $pct_free, $count. If a primary agent and a secondary agent both have a variable with the same name, the primary agent’s variable is used.

There is an important difference between primary and secondary agents that you should be aware of. A sentry’s state evaluations are normally done when its primary agent returns data, not when the secondary ones do. However each secondary agent can also be configured to "trigger" the sentry, causing state evaluations to happen BOTH when the primary and secondary agents return data. However, this can lead to some unexpected behaviour as the data from one agent may be old and out of date.

For example: there are two agents. The first agent monitors whether the Staff database is up or down. The other agent monitors whether the Payroll application (which happens to use the Staff database) is up or down. There is a sentry for the application. This sentry has different states to distinguish between the Payroll application being down because the Staff database is down, and the application being down for another reason.

You would configure the Staff database agent as a secondary agent so you could use the "is_up" variable that belongs to it. However, if the poll times of the two agents are not exactly the same (and they usually won't be) there is a potential problem. You can have a situation where the application agent reports that the Payroll application is down because it has detected that the Staff database is down, but the database agent hasn’t had its poll yet, and still ‘thinks’ the Staff database is up. (The solution to this is to use sentry Dependencies).

Discovery program

This is an optional command that is run before the agent starts. Its job is to return an exit status of true or false based on the existence or status of a resource. If the discovery program returns false, this agent and its associated sentries will not be started. This means the same set of KBs can be installed on several servers, and an agent on a particular server can be switched off if it ‘discovers’ that the resource it monitors is not present.

Here are two examples:

Monitoring file updates: the FileInfo agent

Sometimes you may need to monitor when a particular file or files change in some way. For example, you could log when a system file such as the password file has been updated and perhaps generate an alert.

Sentinel3G provides a standard agent that can monitor these events:

"Windowing" States Based on Time: the Clock Agent

The Clock agent can be used to stop a sentry from monitoring during particular periods. For example, if you run batch jobs between 11pm and 6am daily that use lots of CPU, you don't want to be notified if the run_queue gets too high as this is expected during these times. So you "window" the monitoring.

Add the Clock agent as a secondary agent to the sentry you wish to window (in our example, Run_Queue).

Add a new state to the sentry, called Not_Monitored, and give it a severity of Disabled. In the Condition field, enter a boolean expression describing the time you want to exclude the sentry from monitoring. In our previous example, where the batch jobs run between 11pm and 6am, this would be:

$Hour >= 23 || $Hour < 6

Make sure that the Not_Monitored state appears at the top of the list of states so that its condition is evaluated first.

If the requirement was to disable monitoring during 11pm - 6am Monday to Friday only, it gets a bit more complicated, because you need to remember that Friday night's batch jobs actually go until 6am on Saturday:

$Hour >= 23 && $DayOfWeek >= 1 && $DayOfWeek < 6 || $Hour < 6 && $DayOfWeek > 1 && $DayOfWeek <= 6

Table 2 lists the variables you can use to window monitoring for a sentry.

Variable Type Description
Day number Day of the month (1-31)
DayName string Name of the day of the week, capitalized, e.g. Monday
DayofWeek number Day of the week as a number (Sunday = 0)
DayofYear number Day of the year (1-366)
Hour number Hour of the day (0-23)
LastDayofMonth boolean True when today is the last day of the month
LastWeekofMonth boolean True when within 7 days of the end of the month
Minute number Minute in the hour (0-59)
Month number Month as a number (1-12)
MonthName string Name of the month, capitalized, e.g. June
Time clock Number of seconds since 1st January 1970 GMT
TimeOfDay string Time in the form HH:MM (00:00 - 23:59)
TimeZone string Timezone configured on the system as a string, e.g. GMT
Week number Week of the year (0-52), week begins Sunday
Year number 4 digit year, e.g. 2003

Table 2 — Clock agent variables

Process Monitoring: the ProcessInfo Agent

The ProcessInfo agent provides data for a process monitoring console on each host.

The ProcessInfo agent returns data about processes running on the local (Host Monitor) host [see ps(1)]. It is typically used to determine whether a process is running or to monitor its CPU or memory usage. ProcessInfo is a multi-instance agent, whose instances are the usually the names of the processes being monitored.

They must be specified in the Instances field of each sentry using this agent.

Processes are matched to instances by doing pattern matches on the Command field (as returned by the ps -efl command). If the Agent data field of an instance is NULL, then an exact string match is performed using the instance name. Otherwise the Agent data is interpreted as an unanchored full regular expression.

Note that if one instance matches more than one process, only details of the first process found are returned. However the count variable is set to the number of matching processes.

The variables returned by the ProcessInfo agent are:

command
The command running (the full name including command line options). Note that it maybe truncated if long.
count
The number of matching processes found.
cpu
The number of CPU seconds used by the process.
pid
The numeric process ID.
ppid
The numeric parent process ID.
priority
The numeric priority at which the process is running.
size
The size of the memory image of the process.
state
The state of the process (see ps(1)).
tty
The controlling terminal of the process.
user
The name of the user owning the process.

Agent Classes and Variables

Agents make data available to sentries in the form of variables. The agent class tells sentinel3G the format and location (e.g. STDOUT, a file name) of the agent data, how to parse it, and how to assign key data to variables.

The format of the agent output, and the way you identify which part of it to assign to a variable, differs depending on the agent class. This topic explains the attributes of each agent class.

API

An external application sends data via the Sentinel3G API. The application must be instrumented to send a string of variable names and their values to the host monitor at certain processing points, such as when a transaction is committed.

The API class is different from other agent classes in that data is ‘pushed’ to the host monitor at intervals decided by the external application, rather than being ‘pulled’ in by Sentinel3G. Therefore you don’t specify a column name when adding a variable. Instead you define one variable for each varname= value pair that is passed in the SENAPIdata command by the external application.

DB

The agent returns data in Functional Database format (a set of one or more records, each containing text fields delimited by tabs and terminating in a newline). Typically the data comprises several fields or one or more whole rows returned as a result of a query on a Functional Database table.

Each column name that you assign to an agent variable is a field name as specified in the Functional Database dictionary entry.

ExitStatus

The agent returns the exit status of the command. This can be used to monitor scheduled processes such as batch jobs and backups where there are a few common exit statuses, each relating to a different error condition. Example: when a backup job fails, the sentry can translate the exit status into a meaningful console message (such as "media change failed" or "error writing to device") and provide appropriate responses and actions.

You don’t need to specify a column name when adding a variable to store the exit status. Instead you define one variable of type raw, leaving the Column field blank. The exit status of the agent command will automatically be assigned to this variable.

LogFile

A convenient way of detecting events in an existing application with minimal intrusion is by monitoring its log file(s) for certain messages. The LogFile agent class allows alarms to be generated based on the contents of log files such as:

The agent searches in the file for messages that match a pattern. In the Agent options form you can specify the file name, a select pattern to select records of interest, and an extract pattern for each text string in the record that must be assigned to a variable.

The log file may contain a mixture of messages of different types but typically we are only interested in one type. If you are interested in differently formatted messages you could define one agent per record type.

Agents in the Logfile class generate one or more lines of text output, such as an error message. Table 3 explains how to split the data into patterns or columns. Table 4 explains how to assign each column to a variable.

SNMPPolled

The agent polls for the results of SNMP ‘Get’ requests. Typically these requests test the current status of a managed object in an SNMP MIB, such as a device or port.

Each column name that you assign to an agent variable must be an object ID as specified in the SNMP MIB.

Note: This agent class is only available if the SNMP KB has been installed.

Text

This is used to filter the output from a command. The agent runs the command, which writes text output to STDOUT. If the output is complex or split over several lines, you can use the Agent options form to filter out extraneous text such as blank lines, header lines, and labels.

Agents in the Text class generate one or more lines of text output, such as a formatted report. Table 3 explains how to split the data into patterns or columns.

Assigning Text and Log File Data to Variables

For agents in the Text and Logfile class, the data is split into one or more fields, which are identified by number. How the fields are split is determined by the Split data by field in the ‘Agent options’ form, as shown in Table 3:

Split data by Notes
column The line is not split into fields. All variables must be identified by character position on the line.
whitespace The line is split into a series of fields separated by whitespace. The first field is column 1, the second is column 2, and so on.

Example: to assign to the variable the characters from the start of the line to the first whitespace character, enter 1 in the Column field.

tab The line is split into a series of fields, each separated by a tab. The first field is column 1, the second field is column 2, and so on.

Example: to assign to the variable the characters between the first and second tab, enter 2 in the Column field.

pattern Specify in the Pattern line fields of the Agent Options form one or more extract patterns. The first extract pattern is treated as column 1, the second extract pattern column 2, and so on.

Example: to assign to the variable the string that matches the third extract pattern, enter 3 in the Column field.

Table 3 — How agent data is split into one or more numbered fields

When you add a variable, you specify in the Column field which of these columns to assign to the variable.

For agents where the data is split pattern, you simply enter the column number <col>. If the data is split by whitespace or tab, you can enter a single column number if the agent returns all data on one line. If the agent returns several lines of data, you can specify a particular line by prefixing the column number with the line number, like this: <line>: <col>.

For agents where the data is split by column, each line is treated as a string of characters. You must specify in the Column field a range of characters in the form c <pos>-<pos>. If the agent returns several lines of data, you can specify a particular line by prefixing the character range with the line number, like this:

<line>:c <pos>-<pos>
Column field Split data by Notes
3 pattern the third extract pattern in the output
2:3 whitespace the third field in the second line of output
c10-40 column the tenth to the fortieth characters inclusive
3:c10-11 column the tenth and eleventh characters on the third line of output
2:c4-end column from the fourth to the last character on the second line

Table 4 — Examples: assigning columns to a variable

Note: The Agent Options form includes several fields (e.g. Clear pattern, Skip initial lines, Strip initial chars) that allow you to strip unwanted lines and characters from the output before assigning columns to variables. All processing of whitespace, tabs, column numbers, or patterns, takes place after these fields have been processed. For example if Skip initial chars = 6, the column that the variable sees as c1 would actually be the 7th character in the original data (assuming that Skip pattern hasn’t removed even more characters).

Trigger Variables

A trigger variable is used to compare an earlier value of a variable with its current value. Trigger variables ‘remember’ a value from an earlier poll. (The name comes from the way in which the saving of the variable is triggered by a state change.) All other variables are set or recalculated every time the agent returns data, meaning that the previous value is overwritten.

Example: batches of update transactions are added to a data file once or twice a day. You want to write a sentry that notifies you whenever the spool file’s modification time changes. Using the FileInfo agent, you create a raw variable called mtime to store the current modification time, and a trigger variable called prev_mtime.

The Initial value of prev_mtime is set to $mtime, and the Expression is also set to $mtime.

Next you create a sentry with two states:

When the file changes, the operating system updates its modification time. The sentry detects that the new value for $mtime is different from $prev_mtime, and changes to NewTrans state. When an operator acknowledges the event, indicating that the new transactions have been noted, the sentry is returned to normal state.

At this point the trigger variable is recomputed, setting $prev_mtime to the new $mtime.

History Variables

History variables store the recent values returned by an agent variable. They can be used to generate a realtime graph showing recent changes the data, and in state conditions, to average out spikes and gaps in the data. You can keep either a set number of values, or keep all values in a set period.

Note that history is suited to fairly short-term analysis. For longer term analysis such as capacity planning, use logged variables.

Using history variables in realtime graphs

You can generate a realtime graph showing recent changes in a variable. If a variable’s history has been saved, and the graph is defined to graph the last N values, it will use the variable’s history to get these values (or as many as there are available).

Using history variables in state conditions

You can use history variables in state conditions to handle exceptions in the data such as ‘spikes’ (high or low values which are transient and do not need to be displayed or acted on), to calculate an average over a number of polls, or to calculate a rate when the agent only returns a raw count etc.

Functions for Accessing History Variables

This topic describes the methods or functions that can be performed on history variables. You would typically use these in an expression, either in a state condition or in a derived variable.

History variables can be thought of as an array containing two fields: the value and the time. The array is accessed backwards in time: index 0 is the current value, 1 is the previous value, and so on. It can be written as two arrays: Hval[n] and Htime[n]. You can reference the variable’s history by putting "@" in front of the variable name (normally you refer to a variable by putting a "$" in front of the name, which returns the current value). You access history variables using one of the predefined methods. The TCL syntax is:

[@<hist-var> <method> <optional params>]

Example: [@cpu_idle_hist value 1]

This returns the previous value (index 1) of the history variable cpu_idle_hist. (Note the square brackets around the call to the method). You can use this within an expression or condition:

[@cpu_idle_hist value 1] / 100.0

Table 5 lists the functions available to process history variables.

Function Description
value <index> Return the value of a particular index of a history variable. Omitting the index will return the most recent element. [@accesses_kb value] => 0

[@accesses_kb value 14] => 12.5

value_at <clock> The value at a given time (clock format) [@free_space value_at 1052824642] => 5
average Return the arithmetic mean of a history variable over its entire history [@response_time average] => 9.5
max Return the maximum history value (of the current values) [@cpu_usage max] => 97.0
min Return the minimum history value (of the current values) [@raw_packets_out min] => 0.0
earliest_time The value of Htime[end] where end is the oldest value [@cpu_usage earliest_time] => 1171194067
earliest_value The value of Hval[end] where end is the oldest value [@cpu_usage earliest_value] => 51.6
diff <index> Return the difference between the most recent element and the element at the specified index. Omitting the index returns the difference between the most and least recent elements. [@cpu_usage diff 5] => -25.4

[@free_space diff] => 15

rate <index> The elements are averaged over the time between the elements. Whatever the unit of the history variable, the result is always “units per second”. For example, if the history is in “MB”, the result will be in “MB per second”. [@free_space rate] => 0.6

[@file_size rate 5] => 1.5

diff_rate <index> This is similar to rate, but is but used for agents that return a cumulative number. First the difference between the elements is calculated, then this is divided by the time. [@total_kbs_sent diff_rate] => 1.5

[@cum_cpu_time diff_rate 1] => 0.2

count <condition> Return the number of values that match a simple condition, such as “< 7”. If <condition> is omitted a count of the current number of history values is returned. [@free_space count "< 150"] => 1

[@cpu_idle_hist count “> 90”]

(This returns the number of values whose value is greater than 90.)

percent <condition> This is similar to count, but returns the percentage of values meeting the condition. [@response_time "< 2000"] => 9.5

Table 5 — Functions for accessing history variables

Notes:

Using Standard TCL Functions

To use a function, enclose it in square brackets. For example: [string range "hello" 0 1] will return the value of “he”. All standard TCL functions are available. See http://www.tcl.tk/man/tcl8.4/TclCmd/contents.htm

Using variables in functions

Scalar variables are referenced by adding a $ to the front. For example, if you had a string called hostname, you could convert it to uppercase using this command: [string toupper $hostname].

History variables are referenced by adding an @ to the front. History variables cannot be used in standard TCL functions, only the functions mentioned below. History functions are accessed slightly differently than scalar functions in that the variable comes first. For example, if you wanted to calculate the average over a history variable called response_time you would enter:

[@response_time average]

If the function takes parameters, they come after the function name:

[@my_history function <args>]

History variables are indexed in reverse order. The newest element will always be at index 0. Therefore index 1 is the second most recent and index 5 is the sixth most recent element.

Additional scalar functions

Table 6 lists some additional functions. The examples all use numbers, but you can replace any of these with scalar variables (e.g: $my_variable). If you reference a variable that does not exist, the Host Monitor log file will display a message and the agent or sentry will not be started.

Table 6 — Additional TCL functions

Function Description Examples
percent <value> <total> Divide <value> by <total> and return answer as a percentage.

If <total> is zero, the return value is always zero.

[percent 45 100] => 45.0

[percent 75 150] => 50.0

[percent 100 0] => 0

div <v1> <v2> Safely divide <v1> by <v2>.

If <v2> is zero, the return value is always zero.

[div 5 10] => 0.5

[div 10 0] => 0

round <n> <multiple> Round <n> to the nearest <multiple>. [round 12345 100] => 12300

[round 1.2345 0.01] => 1.23

[round 1.3 0.5] => 1.5

hostname <IP address> Return the host name for a given IP address [hostname 99.99.99.99] => www.s3gxyz.com
clock_to_db <secs> Convert a time in “clock seconds” to internal date/time (YYYYMMDD.hhmmss). [clock_to_db 1052824642] => 20030513.121722
db_to_clock <date> Convert internal format date/time to “clock seconds”. [db_to_clock 20030513.121722] => 1052824642
fmt_clock <secs> Convert “clock seconds” date/time to display format. [fmt_clock 1052824642] => 13/05/03-12:17

[fmt_clock 0] => 01/01/70-01:00

fmt_boolean <val> Format boolean type for display. [fmt_boolean 0] => false

[fmt_boolean 1] => true

fmt_date <date> Format date type (YYYYMMDD) for display. [fmt_date 20030513] => 13/05/03
fmt_datetime <date> Format datetime type (YYYYMMDD.hhmmss) for display. [fmt_datetime 20030513.121722] => 13/05/03-12:17
fmt_uptime <secs> Format uptime (in seconds) for display. [fmt_uptime 0] => 0 secs

[fmt_uptime 300] => 5.0 mins

[fmt_uptime 4000] => 1.1 hrs

[fmt_uptime 100000] => 1.2 days

in_schedule <schedule> Determine if the current time is in the given schedule. [in_schedule Weekends] => false

Constants and Thresholds

Constants are like variables, but unlike variables are associated with a sentry rather than an agent. They are typically used to set different thresholds for sentries that share the same states. This is useful for monitoring different “sizes” of the same type of resource using different criteria. Constants are by convention on UPPER CASE to differentiate them from variables.

How it works: States refer to constants by name, so the same set of states can be shared between the two sentries. Because constants are not shared, you can set the constants of the two sentries to different values.

For example, the acceptable minimum amount of free space on a filesystem depends on its size and volatility. 3% may be an acceptable threshold for a fairly static 100 GB filesystem, but dangerously low for a volatile 15 GB filesystem. In this case you could have each sentry sharing the same states, including a state called Very_Low.

The entry condition for this state would test the current value reported by the agent against a constant, called VERY_LOW. The difference is that for the large_filesys sentry, the constant VERY_LOW is set to 3%, while for the small_filesys sentry, the constant VERY_LOW is set to 8%:

sentry state name (shared) entry condition for this state VERY_LOW constant
small_filesys Very_Low $pct_free < $VERY_LOW 8
large_filesys Very_Low $pct_free < $VERY_LOW 3

Table 7 — Different thresholds for sentries that share the same states

Constants may also be used as a visual aid on realtime graphs.

Note: The values of constants may also be set in Instance Groups, and these will override the values defined in the sentry but only for sentries in that particular instance group. In fact, this example is probably better implemented using a single sentry with two Instance Groups.

Sentinel Processes and Configuration

Event Manager

The Event Manager is a central process that collects state information from all host monitors and updates the icons and data on the consoles as required.

Host Monitor

The host monitor is the main processing ‘engine’. One host monitor process runs on each Sentinel3G host. The program is responsible for:

Knowledge Base

A sentry monitors a particular component or subsystem of your operating system, hardware and applications. A folder is a related set of sentries. Sentries and folders are themselves grouped into knowledge bases.

Several knowledge bases are available for Sentinel3G, including knowledge bases for operating systems, databases, web servers and applications such as COSmanager.

You can add your own custom knowledge bases to hold details of sentries you define yourself.

Host Monitor API

The Host Monitor API can be used to instrument existing applications to send data for monitoring direct to the Host Monitor, rather than having to write a script or program which polls for this data. This is an extremely flexible interface—you just need to tell Sentinel3G what variables to expect, and their types.

Logging

Sentinel3G maintains the following types of log file:

All records are time-stamped.

The status logs EventMgr and HostMon can be viewed from the Logs menu on the console.

Note that there is some overlap in the state change logging done by the Host Monitor and the Event Manager. This is to ensure that even if the Event Manager is down, that state changes are still logged, as well as to keep the network traffic to a minimum.

Global default settings for logging

To conserve disk space, you can control the amount of data that is logged. Agent variable logging can be varied according to the state of a particular sentry. While a sentry is operating normally we are not interested in the exact values being returned by the agent. Therefore, at lower severity levels there is little need to collect data beyond recording state changes. At higher severity levels such as alarm you may wish to log variable values more often, perhaps once every poll.

In the global Sentinel settings you can specify both a logging frequency (DefLog-Time) and the minimum severity level at which it operates (DefLogSeverity).

There is also the option to specify different settings for particular sentries. DefLogTime specifies how often to log data for use in logged data reports. At this interval, the latest data values will be written to disk. DevLogSeverity is the minimum severity at which to start logging data every poll.

For example, if you have specified a log time of 30 minutes and a minimum severity of alarm, under normal conditions Sentinel3G logs a single data point every 30 minutes. When the sentry goes into a state whose severity is alarm or higher, every data point is logged until the severity goes back below the minimum.

Reports on logged data

Reports are provided to extract and summarize data from the data logs, and to graph the value of numerical data. The Service Level Report searches the EMdata log and produces a summary of the amount of time the selected sentries have spent in each state or severity. The Logged Data Report searches the HMdata log for recorded values of particular variables. You choose the variables to display, and a line graph for those variables will be drawn over the chosen period.

Managing log files

The management of logs is integrated with the COSmanager™ audit trail facility, which provides for viewing and cycling (pruning) of log files.

If COSmanager is not installed, an automatically scheduled task such as a cron job should be set up to regularly compress and archive a copy of each log file. Once copies have been archived the original logs can be reset to save disk space.

Access Control via Roles and Capabilities

Each Sentinel3G user has one or more roles. Each role identifies a responsibility or class of users in your organization, such as Manager or Operator. Roles are defined in terms of the access capabilities they grant. In turn, capabilities determine what menu options and actions a user can perform.