Operating System KBs
From Documentation
Revision as of 02:33, 12 July 2006 Moff (Talk | contribs) (→Sentry Details) ← Previous diff |
Revision as of 02:45, 12 July 2006 Moff (Talk | contribs) (→Run Queue Sentry) Next diff → |
||
Line 865: | Line 865: | ||
| | | | ||
|} | |} | ||
+ | |||
+ | <br> | ||
+ | ==== System Calls Sentry ==== | ||
+ | |||
+ | ;Availability: AIX, HPUX, SCO, Tru64, Windows | ||
+ | |||
+ | '''Constants''' | ||
+ | |||
+ | {| border="1" cellpadding="6" cellspacing="0" | ||
+ | !width="150" | Constant | ||
+ | !width="500" | Descrition | ||
+ | !width="65" | Value | ||
+ | |- | ||
+ | |CPU_SYSCALLS | ||
+ | |Too many system calls per second | ||
+ | |10000 | ||
+ | |} | ||
+ | |||
+ | '''States''' | ||
+ | |||
+ | {| border="1" cellpadding="6" cellspacing="0" | ||
+ | !width="125" | State | ||
+ | !width="65" | Severity | ||
+ | !width="390" | Condition | ||
+ | !width="120" | Escalation | ||
+ | |- | ||
+ | |BUSY | ||
+ | |normal | ||
+ | |$sys_per_sec > $CPU_SYSCALLS | ||
+ | |alarm after 120s | ||
+ | |- | ||
+ | |NORMAL | ||
+ | |normal | ||
+ | | | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | |||
+ | ==== Disk Sentry ==== | ||
+ | |||
+ | ;Availability: AIX¹, HPUX, SCO, Solaris, Tru64, Windows | ||
+ | |||
+ | '''Constants (Solaris, Tru64)''' | ||
+ | |||
+ | {| border="1" cellpadding="6" cellspacing="0" | ||
+ | !width="150" | Constant | ||
+ | !width="500" | Descrition | ||
+ | !width="65" | Value | ||
+ | |- | ||
+ | |DSK_BUSY_WARN | ||
+ | |% busy indicating disk is busy | ||
+ | |5 | ||
+ | |- | ||
+ | |DSK_BUSY_PROB | ||
+ | |% busy indicating disk is very busy | ||
+ | |20 | ||
+ | |- | ||
+ | |DSK_SVCT_WARN | ||
+ | |Indicates a long service time (ms) | ||
+ | |30 | ||
+ | |- | ||
+ | |DSK_SVCT_PROB | ||
+ | |Indicates a very long service time (ms) | ||
+ | |50 | ||
+ | |} | ||
+ | |||
+ | '''States (Solaris, Tru64)''' | ||
+ | |||
+ | {| border="1" cellpadding="6" cellspacing="0" | ||
+ | !width="125" | State | ||
+ | !width="65" | Severity | ||
+ | !width="390" | Condition | ||
+ | !width="120" | Escalation | ||
+ | |- | ||
+ | |DSK_VERYBUSY | ||
+ | |alarm | ||
+ | |$percent_busy >= $DSK_BUSY_PROB && $service_time >= $DSK_SVCT_PROB | ||
+ | | | ||
+ | |- | ||
+ | |DSK_BUSY | ||
+ | |warning | ||
+ | |$percent_busy >= $DSK_BUSY_WARN && $service_time >= $DSK_SVCT_WARN | ||
+ | | | ||
+ | |- | ||
+ | |DSK_NORMAL | ||
+ | |normal | ||
+ | | | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | '''Constants (AIX, Windows)''' | ||
+ | |||
+ | {| border="1" cellpadding="6" cellspacing="0" | ||
+ | !width="150" | Constant | ||
+ | !width="500" | Descrition | ||
+ | !width="65" | Value | ||
+ | |- | ||
+ | |DSK_BUSY_WARN | ||
+ | |% busy indicating disk is busy | ||
+ | |40 | ||
+ | |- | ||
+ | |DSK_BUSY_PROB | ||
+ | |% busy indicating disk is very busy | ||
+ | |60 | ||
+ | |} | ||
+ | |||
+ | '''States (AIX only)''' | ||
+ | |||
+ | {| border="1" cellpadding="6" cellspacing="0" | ||
+ | !width="125" | State | ||
+ | !width="65" | Severity | ||
+ | !width="390" | Condition | ||
+ | !width="120" | Escalation | ||
+ | |- | ||
+ | |DSK_VERYBUSY | ||
+ | |warning | ||
+ | |$percent_busy > $DSK_BUSY_PROB | ||
+ | |severe after 120s | ||
+ | |- | ||
+ | |DSK_BUSY | ||
+ | |normal | ||
+ | |$percent_busy > $DSK_BUSY_WARN | ||
+ | |warning after 120s | ||
+ | |- | ||
+ | |DSK_NORMAL | ||
+ | |normal | ||
+ | | | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | ¹ Unfortunately the service time statistic is not available on AIX. The service time is a better indicator of disk IO performance. Even if a disk is 100% busy, there is no real problem unless the service time for the disk is also getting high. |
Revision as of 02:45, 12 July 2006
Contents |
Overview
The primary aim of the operating system knowledge bases in Sentinel3G is to provide a base level of operations monitoring that is consistent across various UNIX/Linux platforms. Due to differences between the various operating systems we monitor, complete consistency is not always achievable. This document describes the general content of the OS knowledge bases, and the discrepancies between them on different platforms.
Standard Knowledge Base
The standard knowledge base is OS independent, and so is packaged with Sentinel3G on all Operating Systems. It can be upgraded, but not uninstalled.
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows³ |
---|---|---|---|---|---|---|---|
Connectivity¹ | √ | √ | √ | √ | √ | √ | √ |
Event_Manager¹ | √ | √ | √ | √ | √ | √ | √ |
Host_Monitor | √ | √ | √ | √ | √ | √ | √ |
Scheduler² | √ | √ | √ | √ | √ | √ | √ |
¹ Connectivity and Event_Manager sentries are only started on the Event Host.
² Scheduler sentry is not started by default. Please read the online documentation for details on how to use the Scheduler sentry.
³ Agent only. Full Event Manager and Host Monitory available May 2003.
OS Knowledge Base Versions
OS | Version | Availability Date | Min Sentinel Version |
---|---|---|---|
AIX risc | 2.1 | 17th Mar, 2004 | 4.4 |
HPUX parisc | 2.1 | 11th May, 2004 | 4.4 |
HPUX intel | 2.2 | 6th Jul, 2006 | 4.4 |
Linux intel | 2.1 | 12th Mar, 2004 | 4.4 |
SCO Open Server | 1.1 | 25th Oct, 2002 | 4.2 |
Solaris intel | 2.1 | 14th Jan, 2003 | 4.4 |
Solaris sparc | 2.2 | 14th Feb, 2006 | 4.4 |
Tru64 | 1.2 | 13th Aug, 2002 | 4.4 |
Windows NT/2000/XP | 1.0 | 22nd Jan, 2003 | 4.4 |
OS Knowledge Bases
CPU Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
CPU_States¹ | √ | √ | √ | √ | √ | √ | √ |
Processors | √ | √ | √ | √ | |||
Context_Switches | √ | √ | √ | √ | √ | √ | |
Interrupts | √ | √ | √ | √ | √ | ||
Run_Queue | √ | √ | √ | √ | √ | √ | √ |
System_Calls | √ | √ | √ | √ | √ |
- NOTE
- Certain operating systems do not provide all the CPU statistics by default, and collecting them may require kernel patches or third party collection tools. Solaris requires packages SUNWaccr and SUNWaccu. Tru64 requires …
¹ All operating systems monitor % System, % User and % Idle CPU time, some OSes provide more information:
OS | More CPU_States Information | Description |
---|---|---|
AIX, Solaris, Tru64 | % Wait IO | The amount of time spent waiting for blocked I/0 to complete. |
Linux | % Nice CPU | The percentage of time that the system is in the user state running processes at low (nice) scheduling priority. |
HPUX |
Disk Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
Disk | √ | √ | √ | √ | √ | √ |
Error Log Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
Error_Log | √ |
Filesystem Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
Filesystem | √¹ | √ | √ | √ | √ | √ | √ |
¹ AIX provides two sentries for free space monitoring, one sentry specifically for /usr (with less sensitive thresholds) and another for the other filesystems.
Memory Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
Paging_Rate | √ | √ | √ | √ | √ | √ | |
Physical_Memory | √ | √ | √ | ||||
Swap_Rate | √ | √ | √ | ||||
Swap_Space | √ | √ | √ | √ | √ | √ | √ |
- NOTE
- Certain operating systems do not provide all the memory statistics, as it may not be relevant (eg Swap_Rate on Tru64 and AIX).
Network Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
Collisions | √ | √ | √ | √ | √ | ||
Drops | √ | √ | √ | ||||
Errors | √ | √ | √ | √ | √ | ||
Packets_Received | √ | √ | √¹ | √ | √¹ | √ | √ |
Packets_Sent | √¹ | √ | √ | √ | √¹ | √ | √ |
¹ Packets sent and received are known only as sent and received on Solaris and Linux.
Printers Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
Printers | √ | √¹ | √ |
¹ The Solaris Printer class is "off" by default. This is due to an intermittent issue with the printer agent. The symptoms are excessive cpu usage by the Eventmanager. This only occurs on a very small number of systems.
Processes Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
CPU_Usage | √ | √ | √ | √ | √ | √ | √ |
MEM_Usage | √ | √ | √ | √ | √ | √ | |
Processes¹ | √ | √ | √ | √ | √ | √ |
- NOTE
- All OS Knowledge Bases support the Process Management Console, provided as an action against Processes sentry class.
¹ On certain OSes the Processes sentry is turned off by default. Certain instances are provided as examples (nmdb, smdb) only, but should be changed to reflect the system on which the KB is installed. Note also that system services (daemons) are normally monitored via the Services sentry, so check in the Services folder before adding processes to be monitored.
Security Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
Bad_SU | √¹ |
¹ On Linux, the Bad_SU sentry is not started by default, as it needs specific configuration to work correctly. Please read the sentry notes for more information on how to configure this sentry.
Services Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
Services | √¹ | √ | √² | √ | √ | √ | √ |
¹ AIX provides a complete service management interface using lssrc, startsrc and stopsrc. This interface has been implemented via actions on the Services sentry on AIX.
² Linux provides a complete service management interface using chkconfig and the startup/shutdown scripts in /etc/init.d (/etc/rc.d/init.d on older systems). This interface has been implemented via actions on the Services sentry on Linux.
System Class
Sentry | AIX | HPUX | Linux | SCO | Solaris | Tru64 | Windows |
---|---|---|---|---|---|---|---|
CPU_Information | √ | √ | √¹ | √ | √ | √ | √ |
Memory_Information | √ | √ | √ | √ | √ | √ | √ |
Operating_System | √ | √ | √ | √ | √ | √ | √ |
System_Uptime | √ | √ | √ | √ | √ | √ | √ |
¹ The Linux OS on the i386 platform provides additional CPU information including the approximate speed and vendor of the processors.
Sentry Details
Overview
Sentry | Class | Agent | Poll Time | States | Logging |
---|---|---|---|---|---|
CPU_States | CPU | Performance | 60s | AIX only | √ |
Context_Switches | CPU | Performance | 60s | ||
Interrupts | CPU | Performance | 60s | ||
Run_Queue | CPU | Performance | 60s | √ | √ |
System_Calls | CPU | Performance | 60s | √ | |
Processors | CPU/Processors | MultiProcessor | 60s | ||
Disk | Disk | Disk | 120s | √ | √ |
Error_Log | Error_Log | ErrorLog | 120s | √ | |
Filesystem | Filesystem | Filesystem | 300s | √ | √ |
Paging_Rate | Memory | Performance | 60s | AIX only | |
Physical_Memory | Memory | Performance | 60s | Solaris only | |
Swap_Rate | Memory | Performance | 60s | ||
Swap_Space (Linux) | Memory | Performance | 60s | √ | √ |
Swap_Space (non-Linux) | Memory | Swap | 180s | √ | √ |
Network | Network | Network | 120s | √ | √ |
Printers | Printers | Printer | 180s | Solaris only | |
CPU_Usage | Processes | ProcessInfo | 75s | √ | √ |
MEM_Usage | Processes | ProcessInfo | 75s | √ | √ |
Processes | Processes | ProcessInfo | 75s | √ | |
Bad_SU | Security | BadSU | n/a² | √ | |
Services | Services | Service | 120s | √ | |
CPU_Information | System | Information | n/a³ |
¹ Packets sent and received are known only as sent and received on Solaris and Linux.
² The BadSU agent is a LogFile agent, and so does not have a poll time. Any new data is interpreted whenever the logfile being monitored changes.
³ The Information agent is essentially run only once.
Sentry State Details
CPU States Sentry
- Availability
- AIX¹, HPUX, Linux, SCO, Solaris, Tru64, Windows
Constants
Constant | Descrition | Value |
---|---|---|
CPU_BUSY | User + System percentage indicating the CPU is busy | 90 |
CPU_OVERLOADED | User + System percentage indicating the CPU is overloaded | 95 |
States (AIX only)
State | Severity | Condition | Escalation |
---|---|---|---|
OVERLOAD_CPU | warning | $cpu_user + $cpu_system > $CPU_OVERLOADED | severe after 120s |
BUSY_CPU | normal | $cpu_user + $cpu_system > $CPU_BUSY | warning after 120s |
NOT_BUSY | normal |
¹ The CPU States sentry only has states defined for AIX.
Run Queue Sentry
- Availability
- AIX, HPUX, Linux, SCO, Solaris, Tru64, Windows
Constants
Constant | Descrition | Value |
---|---|---|
RUNQ_IDLE | Empty run queue | 0 |
RUNQ_WARN | Warning run queue length | 2 |
RUNQ_PROB | Long run queue length | 5 |
States (HPUX, Linux, SCO, Solaris, Tru64, Windows)
State | Severity | Condition | Escalation |
---|---|---|---|
OVERLOAD | alarm | $run_queue > $RUNQ_PROB | |
BUSY | warning | $run_queue > $RUNQ_WARN | |
NORMAL | normal |
States (AIX only)
State | Severity | Condition | Escalation |
---|---|---|---|
OVERLOAD | warning | $run_queue > $RUNQ_PROB | severe after 120s |
BUSY | normal | $run_queue > $RUNQ_WARN | warning after 120s |
NORMAL | normal |
System Calls Sentry
- Availability
- AIX, HPUX, SCO, Tru64, Windows
Constants
Constant | Descrition | Value |
---|---|---|
CPU_SYSCALLS | Too many system calls per second | 10000 |
States
State | Severity | Condition | Escalation |
---|---|---|---|
BUSY | normal | $sys_per_sec > $CPU_SYSCALLS | alarm after 120s |
NORMAL | normal |
Disk Sentry
- Availability
- AIX¹, HPUX, SCO, Solaris, Tru64, Windows
Constants (Solaris, Tru64)
Constant | Descrition | Value |
---|---|---|
DSK_BUSY_WARN | % busy indicating disk is busy | 5 |
DSK_BUSY_PROB | % busy indicating disk is very busy | 20 |
DSK_SVCT_WARN | Indicates a long service time (ms) | 30 |
DSK_SVCT_PROB | Indicates a very long service time (ms) | 50 |
States (Solaris, Tru64)
State | Severity | Condition | Escalation |
---|---|---|---|
DSK_VERYBUSY | alarm | $percent_busy >= $DSK_BUSY_PROB && $service_time >= $DSK_SVCT_PROB | |
DSK_BUSY | warning | $percent_busy >= $DSK_BUSY_WARN && $service_time >= $DSK_SVCT_WARN | |
DSK_NORMAL | normal |
Constants (AIX, Windows)
Constant | Descrition | Value |
---|---|---|
DSK_BUSY_WARN | % busy indicating disk is busy | 40 |
DSK_BUSY_PROB | % busy indicating disk is very busy | 60 |
States (AIX only)
State | Severity | Condition | Escalation |
---|---|---|---|
DSK_VERYBUSY | warning | $percent_busy > $DSK_BUSY_PROB | severe after 120s |
DSK_BUSY | normal | $percent_busy > $DSK_BUSY_WARN | warning after 120s |
DSK_NORMAL | normal |
¹ Unfortunately the service time statistic is not available on AIX. The service time is a better indicator of disk IO performance. Even if a disk is 100% busy, there is no real problem unless the service time for the disk is also getting high.