FS
Documentation

Operating System KBs

This page was last modified 06:00, 21 July 2006.

From Documentation

(Difference between revisions)
Jump to: navigation, search
Revision as of 05:02, 12 July 2006
Moff (Talk | contribs)
(Sentry Details)
← Previous diff
Current revision
Mike (Talk | contribs)
(Services Sentry)
Line 9: Line 9:
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="125" | Sentry+!width="125" bgcolor="#cccccc" | Sentry
-!width="65" | AIX+!width="65" bgcolor="#cccccc" | AIX
-!width="65" | HPUX+!width="65" bgcolor="#cccccc" | HPUX
-!width="65" | Linux+!width="65" bgcolor="#cccccc" | Linux
-!width="65" | SCO+!width="65" bgcolor="#cccccc" | Solaris
-!width="65" | Solaris+!width="65" bgcolor="#cccccc" | Tru64
-!width="65" | Tru64+!width="65" bgcolor="#cccccc" | Unixware
-!width="65" | Windows³+!width="65" bgcolor="#cccccc" | Windows
|- |-
|Connectivity¹ |Connectivity¹
Line 57: Line 57:
&sup1; Connectivity and Event_Manager sentries are only started on the Event Host.<br> &sup1; Connectivity and Event_Manager sentries are only started on the Event Host.<br>
&sup2; Scheduler sentry is not started by default. Please read the online documentation for details on how to use the Scheduler sentry.<br> &sup2; Scheduler sentry is not started by default. Please read the online documentation for details on how to use the Scheduler sentry.<br>
-&sup3; Agent only. Full Event Manager and Host Monitory available May 2003. 
<br> <br>
 +
=== OS Knowledge Base Versions === === OS Knowledge Base Versions ===
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!OS+!bgcolor="#cccccc" |OS
-!Version+!bgcolor="#cccccc" |Version
-!Availability Date+!bgcolor="#cccccc" |Availability Date
-!Min Sentinel Version+!bgcolor="#cccccc" |Min Sentinel<br>Version
|- |-
-|AIX risc+|AIX risc ||align="center" |2.1 ||17th Mar, 2004 ||align="center" |4.4
-|2.1+
-|17th Mar, 2004+
-|4.4+
|- |-
-|HPUX parisc+|HPUX parisc ||align="center" |2.1 ||11th May, 2004 ||align="center" |4.4
-|2.1+
-|11th May, 2004+
-|4.4+
|- |-
-|HPUX intel+|HPUX intel ||align="center" |2.2 || 6th Jul, 2006 ||align="center" |4.4.3
-|2.2+
-|6th Jul, 2006+
-|4.4+
|- |-
-|Linux intel+|Linux intel ||align="center" |2.1 ||20th Apr, 2004 ||align="center" |4.4.3
-|2.1+
-|12th Mar, 2004+
-|4.4+
|- |-
-|SCO Open Server+|Solaris intel ||align="center" |2.1 ||14th Jan, 2003 ||align="center" |4.4
-|1.1+
-|25th Oct, 2002+
-|4.2+
|- |-
-|Solaris intel+|Solaris sparc ||align="center" |2.2 ||10th Apr, 2006 ||align="center" |4.4.3
-|2.1+
-|14th Jan, 2003+
-|4.4+
|- |-
-|Solaris sparc+|Tru64 alpha ||align="center" |2.1 || 2nd Jun, 2004 ||align="center" |4.4
-|2.2+
-|14th Feb, 2006+
-|4.4+
|- |-
-|Tru64+|Unixware intel ||align="center" |1.0 ||10th Mar, 2004 ||align="center" |4.2
-|1.2+
-|13th Aug, 2002+
-|4.4+
|- |-
-|Windows NT/2000/XP+|Windows intel ||align="center" |2.2 ||28th Apr, 2006 ||align="center" |4.4.3
-|1.0+
-|22nd Jan, 2003+
-|4.4+
|} |}
<br> <br>
 +
== OS Knowledge Bases == == OS Knowledge Bases ==
Line 137: Line 111:
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|- 
-|Processors 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-| 
-|align="center" | &radic; 
-| 
-| 
|- |-
|Context_Switches |Context_Switches
Line 152: Line 117:
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|+|&nbsp;
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
Line 160: Line 125:
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|+|&nbsp;
-|+|&nbsp;
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
Line 173: Line 138:
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
 +|-
 +|Processors
 +|align="center" | &radic;
 +|align="center" | &radic;
 +|align="center" | &radic;
 +|&nbsp;
 +|align="center" | &radic;
 +|&nbsp;
 +|&nbsp;
|- |-
|System_Calls |System_Calls
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|+|&nbsp;
|align="center" | &radic; |align="center" | &radic;
-|+|&nbsp;
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
Line 202: Line 176:
|- |-
|HPUX |HPUX
-|+|&nbsp;
-|+|&nbsp;
|} |}
Line 223: Line 197:
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|+|align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
Line 246: Line 220:
|Error_Log |Error_Log
|align="center" | &radic; |align="center" | &radic;
-|+|&nbsp;
-|+|&nbsp;
-|+|&nbsp;
-|+|&nbsp;
-|+|&nbsp;
-| +|&nbsp;
|} |}
<br> <br>
-==== Filesystem Class ====+==== Event Log Class ====
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 268: Line 242:
!width="65" | Windows !width="65" | Windows
|- |-
-|Filesystem+|EventLog
-|align="center" | &radic;&sup1;+|&nbsp;
-|align="center" | &radic;+|&nbsp;
-|align="center" | &radic;+|&nbsp;
-|align="center" | &radic;+|&nbsp;
-|align="center" | &radic;+|&nbsp;
-|align="center" | &radic;+|&nbsp;
|align="center" | &radic; |align="center" | &radic;
|} |}
- 
-&sup1; AIX provides two sentries for free space monitoring, one sentry specifically for /usr (with less sensitive thresholds) and another for the other filesystems. 
<br> <br>
-==== Memory Class ====+==== Files Class ====
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 294: Line 266:
!width="65" | Windows !width="65" | Windows
|- |-
-|Paging_Rate+|File_Info
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
|align="center" | &radic; |align="center" | &radic;
-|align="center" | &radic;+|}
-|align="center" | &radic;+ 
-|align="center" | &radic;+<br>
-|align="center" | &radic;+ 
-|align="center" | &radic;+==== Filesystem Class ====
-|+ 
 +{| border="1" cellpadding="6" cellspacing="0"
 +!width="125" | Sentry
 +!width="65" | AIX
 +!width="65" | HPUX
 +!width="65" | Linux
 +!width="65" | SCO
 +!width="65" | Solaris
 +!width="65" | Tru64
 +!width="65" | Windows
|- |-
-|Physical_Memory+|Filesystem
-|+|align="center" | &radic;&sup1;
-|+
-|align="center" | &radic;+
-|+
-|align="center" | &radic;+
-|+
-|align="center" | &radic;+
-|-+
-|Swap_Rate+
-|+
-|align="center" | &radic;+
-|align="center" | &radic;+
-|+
-|align="center" | &radic;+
-|+
-|+
-|-+
-|Swap_Space+
-|align="center" | &radic;+
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
Line 331: Line 300:
|} |}
-;NOTE:Certain operating systems do not provide all the memory statistics, as it may not be relevant (eg Swap_Rate on Tru64 and AIX).+&sup1; AIX provides two sentries for free space monitoring, one sentry specifically for /usr (with less sensitive thresholds) and another for the other filesystems.
<br> <br>
-==== Network Class ====+==== Memory Class ====
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 347: Line 316:
!width="65" | Windows !width="65" | Windows
|- |-
-|Collisions+|Paging_File_Space
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
|align="center" | &radic; |align="center" | &radic;
-|+|-
 +|Paging_Rate
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|  
-|- 
-|Drops 
|align="center" | &radic; |align="center" | &radic;
-| 
-|align="center" | &radic; 
-| 
-| 
-| 
|align="center" | &radic; |align="center" | &radic;
 +|&nbsp;
|- |-
-|Errors+|Physical_Memory
-|align="center" | &radic;+|&nbsp;
-|+
-|+
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
 +|&nbsp;
|align="center" | &radic; |align="center" | &radic;
 +|&nbsp;
|align="center" | &radic; |align="center" | &radic;
|- |-
-|Packets_Received+|Swap_Rate
 +|&nbsp;
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|align="center" | &radic;&sup1;+|&nbsp;
|align="center" | &radic; |align="center" | &radic;
-|align="center" | &radic;&sup1;+|&nbsp;
 +|&nbsp;
 +|-
 +|Swap_Space
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|- 
-|Packets_Sent 
-|align="center" | &radic;&sup1; 
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|align="center" | &radic;&sup1; 
|align="center" | &radic; |align="center" | &radic;
 +|&nbsp;
 +|-
 +|Virtual_Memory
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
 +|&nbsp;
|align="center" | &radic; |align="center" | &radic;
|} |}
-&sup1; Packets sent and received are known only as sent and received on Solaris and Linux.+;NOTE:Certain operating systems do not provide all the memory statistics, as it may not be relevant (eg Swap_Rate on Tru64 and AIX).
<br> <br>
Line 410: Line 388:
|- |-
|Printers |Printers
-|+|&nbsp;
-|+|&nbsp;
-|align="center" | &radic;+|&nbsp;
-|+|&nbsp;
-|align="center" | &radic;&sup1;+|&nbsp;
-|+|&nbsp;
|align="center" | &radic; |align="center" | &radic;
|} |}
- 
-&sup1; The Solaris Printer class is "off" by default. This is due to an intermittent issue with the printer agent. The symptoms are excessive cpu usage by the Eventmanager. This only occurs on a very small number of systems. 
<br> <br>
-==== Processes Class ====+==== Network Class ====
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 435: Line 411:
!width="65" | Windows !width="65" | Windows
|- |-
-|CPU_Usage+|Network
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
Line 443: Line 419:
|align="center" | &radic; |align="center" | &radic;
|align="center" | &radic; |align="center" | &radic;
-|- 
-|MEM_Usage 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-| 
-|- 
-|Processes&sup1; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-|align="center" | &radic; 
-| 
|} |}
- 
-;NOTE:All OS Knowledge Bases support the Process Management Console, provided as an action against Processes sentry class. 
- 
-&sup1; On certain OSes the Processes sentry is turned off by default. Certain instances are provided as examples (nmdb, smdb) only, but should be changed to reflect the system on which the KB is installed. Note also that system services (daemons) are normally monitored via the Services sentry, so check in the Services folder before adding processes to be monitored. 
<br> <br>
-==== Security Class ====+==== Processes Class ====
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 480: Line 434:
!width="65" | Windows !width="65" | Windows
|- |-
-|Bad_SU+|Process
-|+|align="center" | &sup1;
-|align="center" | &radic;&sup1;+|align="center" | &sup1;
-|+|align="center" | &sup1;
-|+|align="center" | &sup1;
-|+|align="center" | &sup1;
-|+|align="center" | &sup1;
-|+|align="center" | &radic;
|} |}
- +&sup1; Use the Process Knowledge Base instead.
-&sup1; On Linux, the Bad_SU sentry is not started by default, as it needs specific configuration to work correctly. Please read the sentry notes for more information on how to configure this sentry.+
- +
<br> <br>
Line 573: Line 525:
<br> <br>
 +
== Sentry Details == == Sentry Details ==
Line 578: Line 531:
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="150" | Sentry+!width="150" bgcolor="#cccccc" | Sentry
-!width="65" | Class+!width="65" bgcolor="#cccccc" | Class
-!width="65" | Agent+!width="65" bgcolor="#cccccc" | Agent
-!width="65" | Poll Time+!width="65" bgcolor="#cccccc" | Poll Time
-!width="75" | States+!width="75" bgcolor="#cccccc" | States
-!width="65" | Logging+!width="65" bgcolor="#cccccc" | Logging
|- |-
-|CPU_States+|CPU_States ||CPU ||Performance ||60s ||AIX only ||align="center" | &radic;
-|CPU+
-|Performance+
-|60s+
-|AIX only+
-|align="center" | &radic;+
|- |-
-|Context_Switches+|Context_Switches ||CPU ||Performance ||60s ||&nbsp; ||align="center" | &radic;
-|CPU+
-|Performance+
-|60s+
-|+
-|+
|- |-
-|Interrupts+|Interrupts ||CPU ||Performance ||60s ||&nbsp; ||align="center" | &radic;
-|CPU+
-|Performance+
-|60s+
-|+
-|+
|- |-
-|Run_Queue+|Run_Queue ||CPU ||Performance ||60s ||align="center" | &radic; ||align="center" | &radic;
-|CPU+
-|Performance+
-|60s+
-|align="center" | &radic;+
-|align="center" | &radic;+
|- |-
-|System_Calls+|Processors ||CPU/Processors ||MultiProcessor ||60s ||&nbsp; ||&nbsp;
-|CPU+
-|Performance+
-|60s+
-|align="center" | &radic;+
-|+
|- |-
-|Processors+|System_Calls ||CPU ||Performance ||60s ||&nbsp; ||align="center" | &radic;
-|CPU/Processors+
-|MultiProcessor+
-|60s+
-|+
-|+
|- |-
-|Disk+|Disk ||Disk ||Disk ||120s ||align="center" | &radic; ||align="center" | &radic;
-|Disk+
-|Disk+
-|120s+
-|align="center" | &radic;+
-|align="center" | &radic;+
|- |-
-|Error_Log+|Error_Log ||Error_Log ||ErrorLog ||120s ||align="center" | &radic; ||&nbsp;
-|Error_Log+
-|ErrorLog+
-|120s+
-|align="center" | &radic;+
-|+
|- |-
-|Filesystem+|EventLog ||EventLog ||EventLog ||90s ||align="center" | &radic; ||&nbsp;
-|Filesystem+
-|Filesystem+
-|300s+
-|align="center" | &radic;+
-|align="center" | &radic;+
|- |-
-|Paging_Rate+|File_Info ||Files ||FileInfo ||60s ||align="center" | &radic; ||&nbsp;
-|Memory+
-|Performance+
-|60s+
-|AIX only+
-|+
|- |-
-|Physical_Memory+|Filesystem ||Filesystem ||Filesystem ||300s ||align="center" | &radic; ||align="center" | &radic;
-|Memory+
-|Performance+
-|60s+
-|Solaris only+
-|+
|- |-
-|Swap_Rate+|Paging_File_Space ||Memory ||PageSpace ||180s ||align="center" | &radic; ||align="center" | &radic;
-|Memory+
-|Performance+
-|60s+
-|+
-|+
|- |-
-|Swap_Space (Linux)+|Paging_Rate ||Memory ||Performance ||60s ||AIX only ||align="center" | &radic;
-|Memory+
-|Performance+
-|60s+
-|align="center" | &radic;+
-|align="center" | &radic;+
|- |-
-|Swap_Space (non-Linux)+|Physical_Memory ||Memory ||Performance ||60s ||Solaris only ||align="center" | &radic;
-|Memory+
-|Swap+
-|180s+
-|align="center" | &radic;+
-|align="center" | &radic;+
|- |-
-|Network+|Swap_Rate ||Memory ||Performance ||60s ||&nbsp; ||align="center" | &radic;
-|Network+
-|Network+
-|120s+
-|align="center" | &radic;+
-|align="center" | &radic;+
|- |-
-|Printers+|Swap_Space (Linux)||Memory ||Performance ||60s ||align="center" | &radic; ||align="center" | &radic;
-|Printers+
-|Printer+
-|180s+
-|Solaris only+
-|+
|- |-
-|CPU_Usage+|Swap_Space (Unix) ||Memory ||Swap ||180s ||align="center" | &radic; ||align="center" | &radic;
-|Processes+
-|ProcessInfo+
-|75s+
-|align="center" | &radic;+
-|align="center" | &radic;+
|- |-
-|MEM_Usage+|Virtual_Memory ||Memory ||MemoryInfo ||60s ||&nbsp; ||align="center" | &radic;
-|Processes+
-|ProcessInfo+
-|75s+
-|align="center" | &radic;+
-|align="center" | &radic;+
|- |-
-|Processes+|Network ||Network ||Network ||120s ||align="center" | &radic; ||align="center" | &radic;
-|Processes+
-|ProcessInfo+
-|75s+
-|align="center" | &radic;+
-|+
|- |-
-|Bad_SU+|Printers ||Printers ||Printers ||180s ||align="center" | &radic; ||&nbsp;
-|Security+|-
-|BadSU+|Process ||Processes ||ProcessInfo ||75s ||align="center" | &radic; ||align="center" | &radic;
-|n/a&sup2;+
-|align="center" | &radic;+
-|+
|- |-
-|Services+|Services ||Services ||Service ||120s ||align="center" | &radic; ||&nbsp;
-|Services+|-
-|Service+|CPU_Information ||System ||Information ||n/a&sup3; ||&nbsp; ||&nbsp;
-|120s+
-|align="center" | &radic;+
-|+
|- |-
-|CPU_Information+|Memory_Information||System ||Information ||n/a&sup3; ||&nbsp; ||&nbsp;
-|System+|-
-|Information+|Operating_System ||System ||Information ||n/a&sup3; ||&nbsp; ||&nbsp;
-|n/a&sup3;+|-
-|+|System_Uptime ||System ||Uptime ||100s ||&nbsp; ||&nbsp;
-|+
|} |}
&sup1; Packets sent and received are known only as sent and received on Solaris and Linux.<br> &sup1; Packets sent and received are known only as sent and received on Solaris and Linux.<br>
&sup2; The BadSU agent is a LogFile agent, and so does not have a poll time. Any new data is interpreted whenever the logfile being monitored changes.<br> &sup2; The BadSU agent is a LogFile agent, and so does not have a poll time. Any new data is interpreted whenever the logfile being monitored changes.<br>
-&sup3; The Information agent is essentially run only once.+&sup3; The Information agent (Hardware agent on Linux) is essentially run only once.
<br> <br>
 +
=== Sentry State Details === === Sentry State Details ===
Line 751: Line 603:
;Availability: AIX&sup1;, HPUX, Linux, SCO, Solaris, Tru64, Windows ;Availability: AIX&sup1;, HPUX, Linux, SCO, Solaris, Tru64, Windows
-'''Constants'''+'''Constants (AIX only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 787: Line 639:
|NOT_BUSY |NOT_BUSY
|normal |normal
-|+|&nbsp;
-|+|&nbsp;
|} |}
-&sup1; The CPU States sentry only has states defined for AIX.+&sup1; The CPU States sentry only has constants and states defined for AIX.
<br> <br>
Line 802: Line 654:
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="150" | Constant+!width="150" bgcolor="#cccccc" | Constant
-!width="500" | Description+!width="500" bgcolor="#cccccc" | Description
-!width="65" | Value+!width="65" bgcolor="#cccccc" | Value
|- |-
-|RUNQ_IDLE+|RUNQ_WARN ||Run queue is getting long ||3
-|Empty run queue+
-|0+
|- |-
-|RUNQ_WARN+|RUNQ_PROB ||Run queue is too long ||6
-|Warning run queue length+
-|2+
-|-+
-|RUNQ_PROB+
-|Long run queue length+
-|5+
|} |}
Line 822: Line 666:
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="125" | State+!width="125" bgcolor="#cccccc" | State
-!width="65" | Severity+!width="65" bgcolor="#cccccc" | Severity
-!width="390" | Condition+!width="390" bgcolor="#cccccc" | Condition
-!width="120" | Escalation+!width="120" bgcolor="#cccccc" | Escalation
|- |-
-|OVERLOAD+|Very_Busy ||warning ||$run_queue > $RUNQ_PROB ||alarm after 210s
-|alarm+
-|$run_queue > $RUNQ_PROB+
-|+
|- |-
-|BUSY+|Busy ||normal ||$run_queue > $RUNQ_WARN ||warning after 210s
-|warning+
-|$run_queue > $RUNQ_WARN+
-|+
|- |-
-|NORMAL+|OK ||normal ||&nbsp; ||&nbsp;
-|normal+
-|+
-|+
|} |}
Line 846: Line 681:
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="125" | State+!width="125" bgcolor="#cccccc" | State
-!width="65" | Severity+!width="65" bgcolor="#cccccc" | Severity
-!width="390" | Condition+!width="390" bgcolor="#cccccc" | Condition
-!width="120" | Escalation+!width="120" bgcolor="#cccccc" | Escalation
|- |-
|OVERLOAD |OVERLOAD
Line 863: Line 698:
|NORMAL |NORMAL
|normal |normal
-|+|&nbsp;
-|+|&nbsp;
|} |}
Line 871: Line 706:
==== System Calls Sentry ==== ==== System Calls Sentry ====
-;Availability: AIX, HPUX, SCO, Tru64, Windows+;Availability: AIX, SCO, Tru64
-'''Constants'''+'''Constants (AIX, SCO, Tru64)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="150" | Constant+!width="150" bgcolor="#cccccc" | Constant
-!width="500" | Description+!width="500" bgcolor="#cccccc" | Description
-!width="65" | Value+!width="65" bgcolor="#cccccc" | Value
|- |-
-|CPU_SYSCALLS+|CPU_SYSCALLS ||Too many system calls per second ||10000
-|Too many system calls per second+
-|10000+
|} |}
-'''States'''+'''States (AIX, SCO, Tru64)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="125" | State+!width="125" bgcolor="#cccccc" | State
-!width="65" | Severity+!width="65" bgcolor="#cccccc" | Severity
-!width="390" | Condition+!width="390" bgcolor="#cccccc" | Condition
-!width="120" | Escalation+!width="120" bgcolor="#cccccc" | Escalation
|- |-
-|BUSY+|BUSY ||normal ||$sys_per_sec > $CPU_SYSCALLS ||alarm after 120s
-|normal+
-|$sys_per_sec > $CPU_SYSCALLS+
-|alarm after 120s+
|- |-
-|NORMAL+|NORMAL ||normal ||&nbsp; ||&nbsp;
-|normal+
-|+
-|+
|} |}
 +<br>
==== Disk Sentry ==== ==== Disk Sentry ====
-;Availability: AIX&sup1;, HPUX, SCO, Solaris, Tru64, Windows+;Availability: AIX&sup1;, HPUX, Linux, SCO, Solaris, Tru64, Windows
-'''Constants (Solaris, Tru64)'''+'''Constants (HPUX, Linux, Solaris, Tru64)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="150" | Constant+!width="150" bgcolor="#cccccc" | Constant
-!width="500" | Description+!width="500" bgcolor="#cccccc" | Description
-!width="65" | Value+!width="65" bgcolor="#cccccc" | Value
|- |-
|DSK_BUSY_WARN |DSK_BUSY_WARN
Line 931: Line 759:
|Indicates a very long service time (ms) |Indicates a very long service time (ms)
|50 |50
-|} 
- 
-'''States (Solaris, Tru64)''' 
- 
-{| border="1" cellpadding="6" cellspacing="0" 
-!width="125" | State 
-!width="65" | Severity 
-!width="390" | Condition 
-!width="120" | Escalation 
-|- 
-|DSK_VERYBUSY 
-|alarm 
-|$percent_busy >= $DSK_BUSY_PROB && $service_time >= $DSK_SVCT_PROB 
-| 
-|- 
-|DSK_BUSY 
-|warning 
-|$percent_busy >= $DSK_BUSY_WARN && $service_time >= $DSK_SVCT_WARN 
-| 
-|- 
-|DSK_NORMAL 
-|normal 
-| 
-| 
|} |}
Line 960: Line 764:
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="150" | Constant+!width="150" bgcolor="#cccccc" | Constant
-!width="500" | Description+!width="500" bgcolor="#cccccc" | Description
-!width="65" | Value+!width="65" bgcolor="#cccccc" | Value
|- |-
|DSK_BUSY_WARN |DSK_BUSY_WARN
Line 971: Line 775:
|% busy indicating disk is very busy |% busy indicating disk is very busy
|60 |60
 +|}
 +
 +'''States (HPUX, Linux, Solaris, Tru64)'''
 +
 +{| border="1" cellpadding="6" cellspacing="0"
 +!width="125" bgcolor="#cccccc" | State
 +!width="65" bgcolor="#cccccc" | Severity
 +!width="390" bgcolor="#cccccc" | Condition
 +!width="120" bgcolor="#cccccc" | Escalation
 +|-
 +|Very_Busy ||warning ||$percent_busy >= $DSK_BUSY_PROB && $service_time >= $DSK_SVCT_PROB ||alarm after 390s
 +|-
 +|Busy ||normal ||$percent_busy >= $DSK_BUSY_WARN && $service_time >= $DSK_SVCT_WARN ||warning after 390s
 +|-
 +|OK ||normal ||&nbsp; ||&nbsp;
 +|-
 +|Delete ||built-in ||No data state ||&nbsp;
|} |}
Line 976: Line 797:
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
-!width="125" | State+!width="125" bgcolor="#cccccc" | State
-!width="65" | Severity+!width="65" bgcolor="#cccccc" | Severity
-!width="390" | Condition+!width="390" bgcolor="#cccccc" | Condition
-!width="120" | Escalation+!width="120" bgcolor="#cccccc" | Escalation
|- |-
|DSK_VERYBUSY |DSK_VERYBUSY
Line 993: Line 814:
|DSK_NORMAL |DSK_NORMAL
|normal |normal
-|+|&nbsp;
-|+|&nbsp;
|} |}
&sup1; Unfortunately the service time statistic is not available on AIX. The service time is a better indicator of disk IO performance. Even if a disk is 100% busy, there is no real problem unless the service time for the disk is also getting high. &sup1; Unfortunately the service time statistic is not available on AIX. The service time is a better indicator of disk IO performance. Even if a disk is 100% busy, there is no real problem unless the service time for the disk is also getting high.
 +
 +'''States (Windows only)'''
 +
 +{| border="1" cellpadding="6" cellspacing="0"
 +!width="125" bgcolor="#cccccc" | State
 +!width="65" bgcolor="#cccccc" | Severity
 +!width="390" bgcolor="#cccccc" | Condition
 +!width="120" bgcolor="#cccccc" | Escalation
 +|-
 +|Very_Busy ||warning ||$percent_busy >= $DSK_BUSY_PROB ||alarm after 390s
 +|-
 +|Busy ||normal ||$percent_busy >= $DSK_BUSY_WARN ||warning after 390s
 +|-
 +|OK ||normal ||&nbsp; ||&nbsp;
 +|}
<br> <br>
 +
==== Error Log Sentry ==== ==== Error Log Sentry ====
Line 1,046: Line 883:
<br> <br>
 +==== EventLog Sentry ====
 +
 +;Availability: Windows only
 +
 +'''States (Windows only)'''
 +
 +{| border="1" cellpadding="6" cellspacing="0"
 +!width="125" bgcolor="#cccccc" | State
 +!width="65" bgcolor="#cccccc" | Severity
 +!width="390" bgcolor="#cccccc" | Condition
 +!width="120" bgcolor="#cccccc" | Escalation
 +|-
 +|Error ||severe ||$type == “error” &#124;&#124; $type == “audit failure” ||delete after acknowledgement
 +|-
 +|Warning ||warning ||$type == “temporary” ||delete after acknowledgement
 +|-
 +|Information ||info ||$type == “information” &#124;&#124; $type == “audit success” ||delete after acknowledgement
 +|-
 +|Unknown ||alarm ||&nbsp; ||delete after acknowledgement
 +|}
 +
 +<br>
 +
 +==== FileInfo Sentry ====
 +
 +;Availability: Windows only
 +
 +'''States (Windows only)'''
 +
 +{| border="1" cellpadding="6" cellspacing="0"
 +!width="125" bgcolor="#cccccc" | State
 +!width="65" bgcolor="#cccccc" | Severity
 +!width="390" bgcolor="#cccccc" | Condition
 +!width="120" bgcolor="#cccccc" | Escalation
 +|-
 +|Nonexistent ||alarm ||$exists == 0 ||&nbsp;
 +|-
 +|No_Access ||warning ||$owner == “CAN'T ACCESS FILE” ||&nbsp;
 +|-
 +|Dir_Exists ||normal ||$type == “directory” ||&nbsp;
 +|-
 +|File_Exists ||normal ||&nbsp; ||&nbsp;
 +|}
 +
 +<br>
 +
==== Filesystem Sentry ==== ==== Filesystem Sentry ====
Line 1,057: Line 940:
!width="65" bgcolor="#cccccc" | Value !width="65" bgcolor="#cccccc" | Value
|- |-
-|FS_LOW+|LOW ||Indicating low free space ||10
-|Indicating low free space+
-|10+
|- |-
-|FS_VERY_LOW+|VERY_LOW ||Indicating very low free space ||5
-|Indicating very low free space+
-|5+
|- |-
-|FS_NEARLY_FULL+|NEARLY_FULL ||Indicating the filesystem is nearly full ||2
-|Indicating the filesystem is nearly full+
-|2+
|- |-
-|FS_FULL+|FULL ||Indicating the filesystem is full ||0
-|Indicating the filesystem is full+
-|0+
|} |}
-'''States (Linux, Solaris, Tru64)'''+'''States (HPUX, Linux, Solaris, Tru64, Windows)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,082: Line 957:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|FULL+|Full ||critical ||$pct_free == $FS_FULL ||&nbsp;
-|critical+
-|$pct_free == $FS_FULL+
-|+
|- |-
-|NEARLY_FULL+|Nearly_Full ||alarm ||$pct_free < $FS_NEARLY_FULL ||severe after 930s
-|severe+
-|$pct_free < $FS_NEARLY_FULL+
-|+
|- |-
-|VERY_LOW+|Very_Low ||warning ||$pct_free < $FS_VERY_LOW ||alarm after 930s
-|alarm+
-|$pct_free < $FS_VERY_LOW+
-|+
|- |-
-|LOW+|Low ||normal ||$pct_free < $FS_LOW ||warning after 930s
-|warning+
-|$pct_free < $FS_LOW+
-|+
|- |-
-|SUFFICIENT+|OK ||normal ||&nbsp; ||&nbsp;
-|normal+|-
-|+|Delete ||built-in ||No data state ||&nbsp;
-|+
|} |}
Line 1,119: Line 981:
|critical |critical
|$pct_free == $FS_FULL |$pct_free == $FS_FULL
-|+|&nbsp;
|- |-
|NO_INODES |NO_INODES
|critical |critical
|$pct_free_inodes == $FS_FULL |$pct_free_inodes == $FS_FULL
-|+|&nbsp;
|- |-
|NEARLY_FULL |NEARLY_FULL
|severe |severe
|$pct_free < $FS_NEARLY_FULL |$pct_free < $FS_NEARLY_FULL
-|+|&nbsp;
|- |-
|FEW_INODES |FEW_INODES
|severe |severe
|$pct_free_inodes < $FS_NEARLY_FULL |$pct_free_inodes < $FS_NEARLY_FULL
-|+|&nbsp;
|- |-
|VERY_LOW |VERY_LOW
|alarm |alarm
|$pct_free < $FS_VERY_LOW |$pct_free < $FS_VERY_LOW
-|+|&nbsp;
|- |-
|VLOW_INODES |VLOW_INODES
|alarm |alarm
|$pct_free_inodes < $FS_VERY_LOW |$pct_free_inodes < $FS_VERY_LOW
-|+|&nbsp;
|- |-
|LOW |LOW
|warning |warning
|$pct_free < $FS_LOW |$pct_free < $FS_LOW
-|+|&nbsp;
|- |-
|LOW_INODES |LOW_INODES
|warning |warning
|$pct_free_inodes < $FS_LOW |$pct_free_inodes < $FS_LOW
-|+|&nbsp;
|- |-
|SUFFICIENT |SUFFICIENT
|normal |normal
-|+|&nbsp;
-|+|&nbsp;
|} |}
<br> <br>
-==== Paging Rate Sentry ==== 
-;Availability: AIX, HPUX, Linux, SCO, Solaris, Tru64+==== Paging File Space Sentry ====
 + 
 +;Availability: Windows only
'''Constants''' '''Constants'''
Line 1,174: Line 1,037:
!width="65" bgcolor="#cccccc" | Value !width="65" bgcolor="#cccccc" | Value
|- |-
-|OVER_PAGING+|SWAP_LOW ||Low percent free swap space ||15
-|Too many page ins or outs per second+|-
-|10+|SWAP_VERY_LOW ||Very low percent free swap space ||8
|} |}
Line 1,187: Line 1,050:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|BUSY+|Very_Low ||warning ||$pct_avail_page <= $SWAP_VERY_LOW ||alarm after 570s
-|normal+|-
-|$pgins_per_sec >= $OVER_PAGING | | $pgouts_per_sec >= $OVER_PAGING+|Low ||normal ||$pct_avail_page <= $SWAP_LOW ||warning after 570s
-|alarm after 62s+
|- |-
-|ACCEPTABLE+|OK ||normal ||&nbsp; ||&nbsp;
-|normal+
-|+
-|+
|} |}
<br> <br>
-==== Physical Memory Sentry ==== 
-;Availability: Linux, Solaris, Windows+==== Paging Rate Sentry ====
-'''Constants'''+;Availability: AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows
 + 
 +'''Constants (AIX only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,210: Line 1,070:
!width="65" bgcolor="#cccccc" | Value !width="65" bgcolor="#cccccc" | Value
|- |-
-|RESTIME_LONG+|OVER_PAGING
-|Very long residency time+|Too many page ins or outs per second
-|600+|10
-|-+
-|RESTIME_OK+
-|Acceptable residency time (ms)+
-|40+
-|-+
-|RESTIME_PROB+
-|Indicating residency time is too short+
-|20+
|} |}
-'''States'''+'''States (AIX only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,231: Line 1,083:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|RAM_IDLE+|BUSY
|normal |normal
-|$residency_time >= $RESTIME_LONG+|$pgins_per_sec >= $OVER_PAGING &#124;&#124; $pgouts_per_sec >= $OVER_PAGING
-|+|alarm after 62s
|- |-
-|RAM_OK+|ACCEPTABLE
|normal |normal
-|$residency_time > $RESTIME_OK+|&nbsp;
-|+|&nbsp;
-|-+
-|RAM_WARN+
-|warning+
-|$residency_time > $RESTIME_PROB+
-|+
-|-+
-|RAM_PROB+
-|alarm+
-|+
-|+
|} |}
<br> <br>
-==== Swap Space Sentry ==== 
-;Availability: AIX, HPUX, Linux, Solaris, Tru64, Windows+==== Physical Memory Sentry ====
-'''Constants'''+;Availability: HPUX, Linux, Solaris, Windows
 + 
 +'''Constants (Solaris only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,264: Line 1,107:
!width="65" bgcolor="#cccccc" | Value !width="65" bgcolor="#cccccc" | Value
|- |-
-|SWAP_LOW+|RESTIME_LONG ||Very long residency time ||600
-|Low percent free swap space+
-|15+
|- |-
-|SWAP_VERY_LOW+|RESTIME_OK ||Acceptable residency time (ms) ||40
-|Very low percent free swap space+|-
-|10+|RESTIME_PROB ||Indicating residency time is too short ||20
|} |}
-'''States'''+'''States (Solaris only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,281: Line 1,122:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|VERY_LOW+|Very_Low ||warning ||$residency_time <= $RESTIME_PROB ||alarm after 210s
-|alarm+
-|$swap_pct_free <= $SWAP_VERY_LOW+
-|+
|- |-
-|LOW+|Low ||normal ||$residency_time <= $RESTIME_OK ||warning after 210s
-|warning+
-|$swap_pct_free <= $SWAP_LOW+
-|+
|- |-
-|OK+|OK ||normal ||&nbsp; ||&nbsp;
-|normal+
-|+
-|+
|} |}
<br> <br>
-==== Collisions Sentry ==== 
-;Availability: AIX, Linux, SCO, Solaris, Tru64+==== Swap Space Sentry ====
 + 
 +;Availability: AIX, HPUX, Linux, Solaris, Tru64
'''Constants''' '''Constants'''
Line 1,309: Line 1,142:
!width="65" bgcolor="#cccccc" | Value !width="65" bgcolor="#cccccc" | Value
|- |-
-|NET_COLL_WARN+|SWAP_LOW ||Low percent free swap space ||15
-|Indicating many collisions+
-|15+
|- |-
-|NET_COLL_PROB+|SWAP_VERY_LOW ||Very low percent free swap space ||10
-|Indicating excessive collisions+
-|30+
-|-+
-|NET_WORKING+
-|Less than this many transfers and the network is under-utilised+
-|50+
|} |}
Line 1,330: Line 1,155:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|VERY_BUSY+|Very_Low ||warning ||$swap_pct_free <= $SWAP_VERY_LOW ||alarm after 570s
-|alarm+
-|$collision_pct >= $NET_COLL_PROB && $packets_out > $NET_WORKING+
-|+
|- |-
-|BUSY+|Low ||normal ||$swap_pct_free <= $SWAP_LOW ||warning after 570s
-|warning+
-|$collision_pct >= $NET_COLL_WARN && $packets_out > $NET_WORKING+
-|+
|- |-
-|OK+|OK ||normal ||&nbsp; ||&nbsp;
-|normal+
-|+
-|+
|} |}
<br> <br>
-==== Drops Sentry ==== 
-;Availability: AIX, Linux, Windows+==== Network Sentry ====
-'''Constants'''+;Availability: AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows
 + 
 +'''Constants (AIX, HPUX, Linux, SCO, Solaris, Tru64)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,358: Line 1,175:
!width="65" bgcolor="#cccccc" | Value !width="65" bgcolor="#cccccc" | Value
|- |-
-|DROP_PROBLEM+|NET_WORKING ||Less than this many transfers and the network is under-utilised ||50
-|Indicating incoming packet drop rate problem+|-
-|1+|NET_COLL_PROB ||Indicating excessive collisions ||30
 +|-
 +|NET_COLL_WARN ||Indicating many collisions ||15
 +|-
 +|NET_ERROR_OK ||Indicating hardware is OK ||0
 +|-
 +|NET_ERROR_PROB ||Indicating possible hardware error ||0.05
|} |}
-'''States'''+'''States (AIX, HPUX, Linux, SCO, Solaris, Tru64)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,371: Line 1,194:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|NO_DROPS+|Many_Errors ||warning ||$errors_total >= $NET_ERROR_PROB ||alarm after 390s
-|normal+
-|$drop_in_rate == 0+
-|+
|- |-
-|PROB_DROPS+|Very_Busy ||warning ||$collisions >= $NET_COLL_PROB && $pckts_transmit > $NET_WORKING ||alarm after 390s
-|warning+
-|$drop_in_rate < $DROP_PROBLEM+
-|+
|- |-
-|EXCESS_DROPS+|Some_Errors ||normal ||$errors_total > $NET_ERROR_OK ||warning after 390s
-|alarm+|-
-|$drop_in_rate >= $DROP_PROBLEM+|Busy ||normal ||$collisions >= $NET_COLL_WARN && $pckts_transmit > $NET_WORKING ||warning after 390s
-|+|-
 +|OK ||normal ||&nbsp; ||&nbsp;
|} |}
-<br>+'''Constants (Windows only)'''
-==== Errors Sentry ====+
- +
-;Availability: AIX, Linux, SCO, Solaris, Tru64, Windows+
- +
-'''Constants'''+
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,399: Line 1,212:
!width="65" bgcolor="#cccccc" | Value !width="65" bgcolor="#cccccc" | Value
|- |-
-|NET_ERROR_OK+|NET_DROP_OK ||Indicating excessive collisions ||0
-|Acceptable number of errors+|-
-|0+|NET_DROP_PROB ||Indicating many collisions ||1
 +|-
 +|NET_ERROR_OK ||Indicating hardware is OK ||0
|- |-
-|NET_ERROR_PROB+|NET_ERROR_PROB ||Indicating possible hardware error ||0.05
-|Unacceptable number of errors+
-|0.05+
|} |}
-'''States'''+'''States (Windows only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,416: Line 1,229:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|EXCES_ERRORS+|Many_Errors ||warning ||$pkts_errs_sec >= $NET_ERROR_PROB ||alarm after 390s
-|alarm+
-|$error_rate >= $NET_ERROR_PROB+
-|+
|- |-
-|PROB_ERRORS+|Many_Drops ||warning ||$pkts_drps_sec >= $NET_DROP_PROB ||alarm after 390s
-|warning+|-
-|$error_rate > $NET_ERROR_OK+|Some_Errors ||normal ||$pkts_errs_sec > $NET_ERROR_OK ||warning after 390s
-|+
|- |-
-|EXCESS_DROPS+|Some_Drops ||normal ||$pkts_drps_sec > $NET_DROP_OK ||warning after 390s
-|alarm+|-
-|+|OK ||normal ||&nbsp; ||&nbsp;
-|+
|} |}
<br> <br>
 +
==== Printers Sentry ==== ==== Printers Sentry ====
-;Availability: Linux, Solaris, Windows+;Availability: Windows
-'''States (Solaris only)'''+'''States (Windows only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,445: Line 1,254:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|NO_PAPER+|Idle ||normal ||$status == “Idle” ||&nbsp;
-|alarm+
-|$status == “faulted” && $reason == “paper”+
-|+
|- |-
-|FAULTED+|Printing ||normal ||$status == “Printing” ||&nbsp;
-|alarm+
-|$status == “faulted”+
-|+
|- |-
-|UNKNOWN+|No_Paper ||alarm ||$status == “Paperout” ||&nbsp;
-|warning+
-|$status == “unknown”+
-|+
|- |-
-|IDLE+|Offline ||info ||$status == “Offline” ||&nbsp;
-|normal+
-|$status == “idle”+
-|+
|- |-
-|PRINTING+|Paused ||info ||$status == “Paused” ||&nbsp;
-|normal+
-|$status == “printing”+
-|+
|- |-
-|WAITING+|Problem ||alarm ||$status == “Error” ||&nbsp;
-|normal+
-|$status == “waiting”+
-|+
|- |-
-|DISABLED+|No_Access ||alarm ||$status == “NoAccess” ||&nbsp;
-|disabled+|-
-|$status == “disabled”+|Unknown ||alarm ||&nbsp; ||&nbsp;
-|+
|} |}
<br> <br>
-==== CPU_Usage Sentry ==== 
-;Availability: AIX, HPUX, Linux, SCO, Solaris, Tru64+==== Process Sentry ====
-'''Constants'''+;Availability: Windows
 + 
 +'''Constants (Windows only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,493: Line 1,284:
!width="65" bgcolor="#cccccc" | Value !width="65" bgcolor="#cccccc" | Value
|- |-
-|CPU_HIGH+|CPU_HIGH ||Percentage CPU usage considered high for a process ||10
-|Percentage CPU usage considered high for a process+
-|10+
|- |-
-|CPU_PROBLEM+|CPU_PROBLEM ||Unacceptable percentage CPU usage for a process ||50
-|Unacceptable percentage CPU usage for a process+
-|50+
|} |}
-'''States'''+'''States (Windows only)'''
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,510: Line 1,297:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|VHIGH_CPU+|VeryHigh_CPU ||info ||$pct_proc_time >= $CPU_PROBLEM ||warning after 120s, alarm after 300s
-|info+
-|$cpu_percent >= $CPU_PROBLEM+
-|warning after 120s, alarm after 180s+
|- |-
-|HIGH_CPU+|High_CPU ||info ||$pct_proc_time >= $CPU_HIGH ||&nbsp;
-|normal+
-|$cpu_percent >= $CPU_HIGH+
-|+
|- |-
-|OK_CPU+|OK_CPU ||normal ||&nbsp; ||&nbsp;
-|disabled+|-
-|+|Not_Running ||built-in ||No data state ||&nbsp;
-|delete immediately+
|} |}
<br> <br>
-==== MEM_Usage Sentry ==== 
-;Availability: AIX, HPUX, Linux, SCO, Solaris, Tru64+==== Services Sentry ====
-'''Constants'''+;Availability: AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows
-{| border="1" cellpadding="6" cellspacing="0"+'''States (AIX only)'''
-!width="150" bgcolor="#cccccc" | Constant+
-!width="500" bgcolor="#cccccc" | Description+
-!width="65" bgcolor="#cccccc" | Value+
-|-+
-|MEMORY_HIGH+
-|Percentage memory usage considered high for a process+
-|10+
-|-+
-|MEMORY_PROBLEM+
-|Unacceptable percentage memory usage for a process+
-|50+
-|}+
- +
-'''States'''+
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,555: Line 1,320:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|VHIGH_MEMORY+|INACTIVE
-|info+|disabled
-|$mem_percent >= $MEMORY_PROBLEM+|$status == “inoperative”
-|warning after 120s, alarm after 180s+|&nbsp;
|- |-
-|HIGH_MEMORY+|ACTIVE
|normal |normal
-|$mem_percent >= $MEMORY_HIGH+|&nbsp;
-|+|&nbsp;
-|-+
-|OK_MEMORY+
-|disabled+
-|+
-|delete immediately+
|} |}
-<br>+'''States (Linux only)'''
-==== Processes Sentry ====+
- +
-;Availability: AIX, HPUX, Linux, SCO, Solaris, Tru64+
- +
-'''States (AIX only)'''+
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,584: Line 1,339:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|DOWN+|Confused ||info ||$Status == “Off” && $PID != “-1” ||&nbsp;
-|alarm+
-|$count == 0+
-|+
|- |-
-|UP+|Off ||normal ||$Status == “Off” ||&nbsp;
-|normal+|-
-|+|Not_Running ||warning ||$PID == -1 ||&nbsp;
-|+|-
 +|Running ||normal ||$PID != -1 ||&nbsp;
|} |}
-<br>+'''States (HPUX, Solaris, Tru64)'''
-==== BadSU Sentry ====+
- +
-;Availability: Linux+
- +
-'''States'''+
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,608: Line 1,356:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|Violation+|Not_Running ||warning ||$count == 0 ||&nbsp;
-|alarm+
-|$Count > 1+
-|acknowledgment+
|- |-
-|Report+|Runing ||normal ||&nbsp; ||&nbsp;
-|info+
-|$Count == 1+
-|acknowledgment+
|} |}
-<br>+'''States (Windows only)'''
-==== Services Sentry ====+
- +
-;Availability: AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows+
- +
-'''States (Linux only)'''+
{| border="1" cellpadding="6" cellspacing="0" {| border="1" cellpadding="6" cellspacing="0"
Line 1,632: Line 1,369:
!width="120" bgcolor="#cccccc" | Escalation !width="120" bgcolor="#cccccc" | Escalation
|- |-
-|Confused+|Down ||alarm ||$state == “Stopped” && $start == “Automatic” ||&nbsp;
-|info+
-|$Status == “Off” && $PID != “-1”+
-|+
|- |-
-|Unconfigured+|Confused ||info ||$state == “Running” && $start == “Disabled” ||&nbsp;
-|info+
-|$Status == “Unconfigured”+
-|+
|- |-
-|Off+|Running ||normal ||$state == “Running” ||&nbsp;
-|normal+
-|$Status == “Off”+
-|+
|- |-
-|Stopped+|Disabled ||disabled ||$state == “Stopped” && $start == “Disabled” ||&nbsp;
-|warning+
-|$PID == -1+
-|+
|- |-
-|Running+|Paused ||info ||$state == “Running” ||&nbsp;
-|normal+
-|$PID != -1+
-|+
-|}+
- +
-'''States (Solaris, Tru64)'''+
- +
-{| border="1" cellpadding="6" cellspacing="0"+
-!width="125" bgcolor="#cccccc" | State+
-!width="65" bgcolor="#cccccc" | Severity+
-!width="390" bgcolor="#cccccc" | Condition+
-!width="120" bgcolor="#cccccc" | Escalation+
|- |-
-|DOWN+|Intermediate||info ||$state == “Starting” &#124;&#124; $state == “Stopping” &#124;&#124; $state == “Continue pending” &#124;&#124; $state == “Pause pending” ||&nbsp;
-|warning+
-|$count == 0+
-|+
|- |-
-|UP+|Unknown ||alarm ||&nbsp; ||&nbsp;
-|normal+
-|+
-|+
-|-+
-|UNDEFINED&sup1;+
-|normal+
-|+
-|+
-|}+
- +
-&sup1; This state is not used for monitoring, it is only there as a placeholder for actions.+
- +
-'''States (AIX only)'''+
- +
-{| border="1" cellpadding="6" cellspacing="0"+
-!width="125" bgcolor="#cccccc" | State+
-!width="65" bgcolor="#cccccc" | Severity+
-!width="390" bgcolor="#cccccc" | Condition+
-!width="120" bgcolor="#cccccc" | Escalation+
-|-+
-|INACTIVE+
-|disabled+
-|$status == “inoperative”+
-|+
-|-+
-|ACTIVE+
-|normal+
-|+
-|+
|} |}
<br> <br>

Current revision

Contents

Overview

The primary aim of the operating system knowledge bases in Sentinel3G is to provide a base level of operations monitoring that is consistent across various UNIX/Linux platforms. Due to differences between the various operating systems we monitor, complete consistency is not always achievable. This document describes the general content of the OS knowledge bases, and the discrepancies between them on different platforms.


Standard Knowledge Base

The standard knowledge base is OS independent, and so is packaged with Sentinel3G on all Operating Systems. It can be upgraded, but not uninstalled.

Sentry AIX HPUX Linux Solaris Tru64 Unixware Windows
Connectivity¹
Event_Manager¹
Host_Monitor
Scheduler²

¹ Connectivity and Event_Manager sentries are only started on the Event Host.
² Scheduler sentry is not started by default. Please read the online documentation for details on how to use the Scheduler sentry.


OS Knowledge Base Versions

OS Version Availability Date Min Sentinel
Version
AIX risc 2.1 17th Mar, 2004 4.4
HPUX parisc 2.1 11th May, 2004 4.4
HPUX intel 2.2 6th Jul, 2006 4.4.3
Linux intel 2.1 20th Apr, 2004 4.4.3
Solaris intel 2.1 14th Jan, 2003 4.4
Solaris sparc 2.2 10th Apr, 2006 4.4.3
Tru64 alpha 2.1 2nd Jun, 2004 4.4
Unixware intel 1.0 10th Mar, 2004 4.2
Windows intel 2.2 28th Apr, 2006 4.4.3


OS Knowledge Bases

CPU Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
CPU_States¹
Context_Switches  
Interrupts    
Run_Queue
Processors      
System_Calls    
NOTE
Certain operating systems do not provide all the CPU statistics by default, and collecting them may require kernel patches or third party collection tools. Solaris requires packages SUNWaccr and SUNWaccu. Tru64 requires …

¹ All operating systems monitor % System, % User and % Idle CPU time, some OSes provide more information:

OS More CPU_States
Information
Description
AIX, Solaris, Tru64 % Wait IO The amount of time spent waiting for blocked I/0 to complete.
Linux % Nice CPU The percentage of time that the system is in the user state running processes at low (nice) scheduling priority.
HPUX    


Disk Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
Disk


Error Log Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
Error_Log            


Event Log Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
EventLog            


Files Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
File_Info            


Filesystem Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
Filesystem √¹

¹ AIX provides two sentries for free space monitoring, one sentry specifically for /usr (with less sensitive thresholds) and another for the other filesystems.


Memory Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
Paging_File_Space            
Paging_Rate  
Physical_Memory      
Swap_Rate        
Swap_Space  
Virtual_Memory            
NOTE
Certain operating systems do not provide all the memory statistics, as it may not be relevant (eg Swap_Rate on Tru64 and AIX).


Printers Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
Printers            


Network Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
Network


Processes Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
Process ¹ ¹ ¹ ¹ ¹ ¹

¹ Use the Process Knowledge Base instead.

Services Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
Services √¹ √²

¹ AIX provides a complete service management interface using lssrc, startsrc and stopsrc. This interface has been implemented via actions on the Services sentry on AIX.

² Linux provides a complete service management interface using chkconfig and the startup/shutdown scripts in /etc/init.d (/etc/rc.d/init.d on older systems). This interface has been implemented via actions on the Services sentry on Linux.


System Class

Sentry AIX HPUX Linux SCO Solaris Tru64 Windows
CPU_Information √¹
Memory_Information
Operating_System
System_Uptime

¹ The Linux OS on the i386 platform provides additional CPU information including the approximate speed and vendor of the processors.


Sentry Details

Overview

Sentry Class Agent Poll Time States Logging
CPU_States CPU Performance 60s AIX only
Context_Switches CPU Performance 60s  
Interrupts CPU Performance 60s  
Run_Queue CPU Performance 60s
Processors CPU/Processors MultiProcessor 60s    
System_Calls CPU Performance 60s  
Disk Disk Disk 120s
Error_Log Error_Log ErrorLog 120s  
EventLog EventLog EventLog 90s  
File_Info Files FileInfo 60s  
Filesystem Filesystem Filesystem 300s
Paging_File_Space Memory PageSpace 180s
Paging_Rate Memory Performance 60s AIX only
Physical_Memory Memory Performance 60s Solaris only
Swap_Rate Memory Performance 60s  
Swap_Space (Linux)Memory Performance 60s
Swap_Space (Unix) Memory Swap 180s
Virtual_Memory Memory MemoryInfo 60s  
Network Network Network 120s
Printers Printers Printers 180s  
Process Processes ProcessInfo 75s
Services Services Service 120s  
CPU_Information System Information n/a³    
Memory_InformationSystem Information n/a³    
Operating_System System Information n/a³    
System_Uptime System Uptime 100s    

¹ Packets sent and received are known only as sent and received on Solaris and Linux.
² The BadSU agent is a LogFile agent, and so does not have a poll time. Any new data is interpreted whenever the logfile being monitored changes.
³ The Information agent (Hardware agent on Linux) is essentially run only once.


Sentry State Details

CPU States Sentry

Availability
AIX¹, HPUX, Linux, SCO, Solaris, Tru64, Windows

Constants (AIX only)

Constant Description Value
CPU_BUSY User + System percentage indicating the CPU is busy 90
CPU_OVERLOADED User + System percentage indicating the CPU is overloaded 95

States (AIX only)

State Severity Condition Escalation
OVERLOAD_CPU warning $cpu_user + $cpu_system > $CPU_OVERLOADED severe after 120s
BUSY_CPU normal $cpu_user + $cpu_system > $CPU_BUSY warning after 120s
NOT_BUSY normal    

¹ The CPU States sentry only has constants and states defined for AIX.


Run Queue Sentry

Availability
AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows

Constants

Constant Description Value
RUNQ_WARN Run queue is getting long 3
RUNQ_PROB Run queue is too long 6

States (HPUX, Linux, SCO, Solaris, Tru64, Windows)

State Severity Condition Escalation
Very_Busy warning $run_queue > $RUNQ_PROB alarm after 210s
Busy normal $run_queue > $RUNQ_WARN warning after 210s
OK normal    

States (AIX only)

State Severity Condition Escalation
OVERLOAD warning $run_queue > $RUNQ_PROB severe after 120s
BUSY normal $run_queue > $RUNQ_WARN warning after 120s
NORMAL normal    


System Calls Sentry

Availability
AIX, SCO, Tru64

Constants (AIX, SCO, Tru64)

Constant Description Value
CPU_SYSCALLS Too many system calls per second 10000

States (AIX, SCO, Tru64)

State Severity Condition Escalation
BUSY normal $sys_per_sec > $CPU_SYSCALLS alarm after 120s
NORMAL normal    


Disk Sentry

Availability
AIX¹, HPUX, Linux, SCO, Solaris, Tru64, Windows

Constants (HPUX, Linux, Solaris, Tru64)

Constant Description Value
DSK_BUSY_WARN % busy indicating disk is busy 5
DSK_BUSY_PROB % busy indicating disk is very busy 20
DSK_SVCT_WARN Indicates a long service time (ms) 30
DSK_SVCT_PROB Indicates a very long service time (ms) 50

Constants (AIX, Windows)

Constant Description Value
DSK_BUSY_WARN % busy indicating disk is busy 40
DSK_BUSY_PROB % busy indicating disk is very busy 60

States (HPUX, Linux, Solaris, Tru64)

State Severity Condition Escalation
Very_Busy warning $percent_busy >= $DSK_BUSY_PROB && $service_time >= $DSK_SVCT_PROB alarm after 390s
Busy normal $percent_busy >= $DSK_BUSY_WARN && $service_time >= $DSK_SVCT_WARN warning after 390s
OK normal    
Delete built-in No data state  

States (AIX only)

State Severity Condition Escalation
DSK_VERYBUSY warning $percent_busy > $DSK_BUSY_PROB severe after 120s
DSK_BUSY normal $percent_busy > $DSK_BUSY_WARN warning after 120s
DSK_NORMAL normal    

¹ Unfortunately the service time statistic is not available on AIX. The service time is a better indicator of disk IO performance. Even if a disk is 100% busy, there is no real problem unless the service time for the disk is also getting high.

States (Windows only)

State Severity Condition Escalation
Very_Busy warning $percent_busy >= $DSK_BUSY_PROB alarm after 390s
Busy normal $percent_busy >= $DSK_BUSY_WARN warning after 390s
OK normal    


Error Log Sentry

Availability
AIX only
NOTE
Certain error log entries are ignored by Sentinel 3G. The list of error codes can be found in a file called exclude_errors under the distrib.db folder under the Sentinel installation (/usr/lpp/cosmos/sentinel_4.2/distrib.db by default on AIX)

States (AIX only)

State Severity Condition Escalation
UNKNOWN severe $Type == “unknown” acknowledgement
PERMANENT alarm $Type == “permanent” acknowledgement
TEMPORARY warning $Type == “temporary” acknowledgement
INFORMATION info $Type == “informational” acknowledgement
PENDING info $Type == “pending” acknowledgement
PERFORMANCE info $Type == “performance” acknowledgement


EventLog Sentry

Availability
Windows only

States (Windows only)

State Severity Condition Escalation
Error severe $type == “error” || $type == “audit failure” delete after acknowledgement
Warning warning $type == “temporary” delete after acknowledgement
Information info $type == “information” || $type == “audit success” delete after acknowledgement
Unknown alarm   delete after acknowledgement


FileInfo Sentry

Availability
Windows only

States (Windows only)

State Severity Condition Escalation
Nonexistent alarm $exists == 0  
No_Access warning $owner == “CAN'T ACCESS FILE”  
Dir_Exists normal $type == “directory”  
File_Exists normal    


Filesystem Sentry

Availability
AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows

Constants

Constant Description Value
LOW Indicating low free space 10
VERY_LOW Indicating very low free space 5
NEARLY_FULL Indicating the filesystem is nearly full 2
FULL Indicating the filesystem is full 0

States (HPUX, Linux, Solaris, Tru64, Windows)

State Severity Condition Escalation
Full critical $pct_free == $FS_FULL  
Nearly_Full alarm $pct_free < $FS_NEARLY_FULL severe after 930s
Very_Low warning $pct_free < $FS_VERY_LOW alarm after 930s
Low normal $pct_free < $FS_LOW warning after 930s
OK normal    
Delete built-in No data state  

States (AIX only)

State Severity Condition Escalation
FULL critical $pct_free == $FS_FULL  
NO_INODES critical $pct_free_inodes == $FS_FULL  
NEARLY_FULL severe $pct_free < $FS_NEARLY_FULL  
FEW_INODES severe $pct_free_inodes < $FS_NEARLY_FULL  
VERY_LOW alarm $pct_free < $FS_VERY_LOW  
VLOW_INODES alarm $pct_free_inodes < $FS_VERY_LOW  
LOW warning $pct_free < $FS_LOW  
LOW_INODES warning $pct_free_inodes < $FS_LOW  
SUFFICIENT normal    


Paging File Space Sentry

Availability
Windows only

Constants

Constant Description Value
SWAP_LOW Low percent free swap space 15
SWAP_VERY_LOW Very low percent free swap space 8

States

State Severity Condition Escalation
Very_Low warning $pct_avail_page <= $SWAP_VERY_LOW alarm after 570s
Low normal $pct_avail_page <= $SWAP_LOW warning after 570s
OK normal    


Paging Rate Sentry

Availability
AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows

Constants (AIX only)

Constant Description Value
OVER_PAGING Too many page ins or outs per second 10

States (AIX only)

State Severity Condition Escalation
BUSY normal $pgins_per_sec >= $OVER_PAGING || $pgouts_per_sec >= $OVER_PAGING alarm after 62s
ACCEPTABLE normal    


Physical Memory Sentry

Availability
HPUX, Linux, Solaris, Windows

Constants (Solaris only)

Constant Description Value
RESTIME_LONG Very long residency time 600
RESTIME_OK Acceptable residency time (ms) 40
RESTIME_PROB Indicating residency time is too short 20

States (Solaris only)

State Severity Condition Escalation
Very_Low warning $residency_time <= $RESTIME_PROB alarm after 210s
Low normal $residency_time <= $RESTIME_OK warning after 210s
OK normal    


Swap Space Sentry

Availability
AIX, HPUX, Linux, Solaris, Tru64

Constants

Constant Description Value
SWAP_LOW Low percent free swap space 15
SWAP_VERY_LOW Very low percent free swap space 10

States

State Severity Condition Escalation
Very_Low warning $swap_pct_free <= $SWAP_VERY_LOW alarm after 570s
Low normal $swap_pct_free <= $SWAP_LOW warning after 570s
OK normal    


Network Sentry

Availability
AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows

Constants (AIX, HPUX, Linux, SCO, Solaris, Tru64)

Constant Description Value
NET_WORKING Less than this many transfers and the network is under-utilised 50
NET_COLL_PROB Indicating excessive collisions 30
NET_COLL_WARN Indicating many collisions 15
NET_ERROR_OK Indicating hardware is OK 0
NET_ERROR_PROB Indicating possible hardware error 0.05

States (AIX, HPUX, Linux, SCO, Solaris, Tru64)

State Severity Condition Escalation
Many_Errors warning $errors_total >= $NET_ERROR_PROB alarm after 390s
Very_Busy warning $collisions >= $NET_COLL_PROB && $pckts_transmit > $NET_WORKING alarm after 390s
Some_Errors normal $errors_total > $NET_ERROR_OK warning after 390s
Busy normal $collisions >= $NET_COLL_WARN && $pckts_transmit > $NET_WORKING warning after 390s
OK normal    

Constants (Windows only)

Constant Description Value
NET_DROP_OK Indicating excessive collisions 0
NET_DROP_PROB Indicating many collisions 1
NET_ERROR_OK Indicating hardware is OK 0
NET_ERROR_PROB Indicating possible hardware error 0.05

States (Windows only)

State Severity Condition Escalation
Many_Errors warning $pkts_errs_sec >= $NET_ERROR_PROB alarm after 390s
Many_Drops warning $pkts_drps_sec >= $NET_DROP_PROB alarm after 390s
Some_Errors normal $pkts_errs_sec > $NET_ERROR_OK warning after 390s
Some_Drops normal $pkts_drps_sec > $NET_DROP_OK warning after 390s
OK normal    


Printers Sentry

Availability
Windows

States (Windows only)

State Severity Condition Escalation
Idle normal $status == “Idle”  
Printing normal $status == “Printing”  
No_Paper alarm $status == “Paperout”  
Offline info $status == “Offline”  
Paused info $status == “Paused”  
Problem alarm $status == “Error”  
No_Access alarm $status == “NoAccess”  
Unknown alarm    


Process Sentry

Availability
Windows

Constants (Windows only)

Constant Description Value
CPU_HIGH Percentage CPU usage considered high for a process 10
CPU_PROBLEM Unacceptable percentage CPU usage for a process 50

States (Windows only)

State Severity Condition Escalation
VeryHigh_CPU info $pct_proc_time >= $CPU_PROBLEM warning after 120s, alarm after 300s
High_CPU info $pct_proc_time >= $CPU_HIGH  
OK_CPU normal    
Not_Running built-in No data state  


Services Sentry

Availability
AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows

States (AIX only)

State Severity Condition Escalation
INACTIVE disabled $status == “inoperative”  
ACTIVE normal    

States (Linux only)

State Severity Condition Escalation
Confused info $Status == “Off” && $PID != “-1”  
Off normal $Status == “Off”  
Not_Running warning $PID == -1  
Running normal $PID != -1  

States (HPUX, Solaris, Tru64)

State Severity Condition Escalation
Not_Running warning $count == 0  
Runing normal    

States (Windows only)

State Severity Condition Escalation
Down alarm $state == “Stopped” && $start == “Automatic”  
Confused info $state == “Running” && $start == “Disabled”  
Running normal $state == “Running”  
Disabled disabled $state == “Stopped” && $start == “Disabled”  
Paused info $state == “Running”  
Intermediateinfo $state == “Starting” || $state == “Stopping” || $state == “Continue pending” || $state == “Pause pending”  
Unknown alarm