Introduction


windows-logo

One fine day it happened: Nagios missed to alarm us for a server going down. One of the Windows servers (what else) rebooted due to a unknown cause (what else). Only it happened so darn fast that it fell exactly in between the five minute intervals when Nagios sends its 'ping' checks to verify the system is up. It is a quite rare case, only one single Nagios 'ping' check failed. With the 'ping' being set to re-test after one minute for 2 more times to avoid sending false alerts, it was just recording one fail but did not send the necessary notification.
Clearly, passive 'ping' monitoring is not perfect, so a better way to monitor these pesky 'secret' Windows reboots is to make them send SNMP traps. Now, at least we will know for sure when they come back up. ;-)

Plugin Design


The following examples have been developed and verified unter Nagios 3.0.6 running on SuSE Linux Enterprise Server 10, receiving traps from Windows 2003 Server and Windows XP clients. Nagios had been installed into /srv/app/nagios. This path is used in all examples below, please adjust it to your [nagioshome].

The 'Sending' part: Generating SNMP traps from Windows


On the Windows server, we need to have the SNMP service installed. It is available in the normal Windows package (Add/Remove Windows Components) under Management and Monitoring tools. Once installed, we go to "Start->Settings>Control Panel->Administrative Tools->Services-> SNMP Service->Properties". I assume SNMP read access is already set up. So, currently we are only interested in SNMP traps. First we go to the "Traps" tab. Following good practise we configure a dedicated trap community (different from public) and add the SNMP trap server destination IP there.
Now we can start sending our first test traps. Stopping and starting the Windows SNMP service will generate some. Let's check what traps were send and if they are received on our trap sink server, using tcpdump:

# tcpdump -s 0 -X udp port 162
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 10:31:
11.189693

IP 192.168.203.140.capioverlan > susie112.frank4dd.com.snmptrap:  C=SECtrap
 Trap(31) E:311 .1.1.3.1.1 192.168.203.140 coldStart 0
        0x0000:  4500 004b 07f1 0000 7f11 eb08 0afd cb8c  E..K............
        0x0010:  0afd 6722 047b 00a2 0037 f289 302d 0201  ..g".{...7..0-..
        0x0020:  0004 074e 424e 7472 6170 a41f 060c 2b06  ...SECtrap....+.
        0x0030:  0104 0182 3701 0103 0101 4004 0afd cb8c  ....7.....@.....
        0x0040:  0201 0002 0100 4301 0030 00              ......C..0.

10:31:26.227627 IP 192.168.203.140.capioverlan > susie112.frank4dd.com.snmptrap:
 C=SECtrap Trap(49) E:311.1.1.3.1.1 192.168.203.140 linkUp 1532 interfaces.
ifTable.ifEntry.ifIndex.1=1
        0x0000:  4500 005d 07f2 0000 7f11 eaf5 0afd cb8c  E..]............
        0x0010:  0afd 6722 047b 00a2 0049 503d 303f 0201  ..g".{...IP=0?..
        0x0020:  0004 074e 424e 7472 6170 a431 060c 2b06  ...SECtrap.1..+.
        0x0030:  0104 0182 3701 0103 0101 4004 0afd cb8c  ....7.....@.....
        0x0040:  0201 0302 0100 4302 05fc 3011 300f 060a  ......C...0.0...
        0x0050:  2b06 0102 0102 0201 0101 0201 01         +............

10:31:26.229296 IP 192.168.203.140.capioverlan > susie112.frank4dd.com.snmptrap:
 C=SECtrap Trap(49) E:311.1.1.3.1.1 192.168.203.140 linkUp 1538 interfaces.
ifTable.ifEntry.ifIndex.2=2
        0x0000:  4500 005d 07f3 0000 7f11 eaf4 0afd cb8c  E..]............
        0x0010:  0afd 6722 047b 00a2 0049 4f36 303f 0201  ..g".{...IO60?..
        0x0020:  0004 074e 424e 7472 6170 a431 060c 2b06  ...SECtrap.1..+.
        0x0030:  0104 0182 3701 0103 0101 4004 0afd cb8c  ....7.....@.....
        0x0040:  0201 0302 0100 4302 0602 3011 300f 060a  ......C...0.0...
        0x0050:  2b06 0102 0102 0201 0102 0201 02         +............

10:31:26.229692 IP 192.168.203.140.capioverlan > susie112.frank4dd.com.snmptrap:
 C=SECtrap Trap(49) E:311.1.1.3.1.1 192.168.203.140 linkUp 1538 interfaces.
ifTable.ifEntry.ifIndex.3=3
        0x0000:  4500 005d 07f4 0000 7f11 eaf3 0afd cb8c  E..]............
        0x0010:  0afd 6722 047b 00a2 0049 4e35 303f 0201  ..g".{...IN50?..
        0x0020:  0004 074e 424e 7472 6170 a431 060c 2b06  ...SECtrap.1..+.
        0x0030:  0104 0182 3701 0103 0101 4004 0afd cb8c  ....7.....@.....
        0x0040:  0201 0302 0100 4302 0602 3011 300f 060a  ......C...0.0...
        0x0050:  2b06 0102 0102 0201 0103 0201 03         +............

4 packets captured

We can see that 4 traps were send when the Windows SNMP service is started. The first trap packet is a notification of 'coldstart', the following 3 are notifications for each available network interface (including 127.0.0.1) about their "link up" status.

The 'Receiving' part: Picking up the SNMP traps using the 'snmptrapd' daemon


For our purpose of testing and receiving traps from Windows systems, we are adding 2 MIB file to the library in /usr/share/snmp/mibs. The file MSFT.txt describes the Windows OID tree, while TRAP-TEST-MIB.txt will help us to generate a test trap later.

# vi /usr/share/snmp/mibs/MSFT.txt
MSFT-MIB DEFINITIONS ::= BEGIN


IMPORTS
    enterprises
        FROM RFC1155-SMI;

microsoft       OBJECT IDENTIFIER ::= { enterprises 311 }
software        OBJECT IDENTIFIER ::= { microsoft 1 }
systems         OBJECT IDENTIFIER ::= { software 1 }
os              OBJECT IDENTIFIER ::= { systems 3 }
windowsNT       OBJECT IDENTIFIER ::= { os 1 }
windows         OBJECT IDENTIFIER ::= { os 2 }
workstation     OBJECT IDENTIFIER ::= { windowsNT 1 }
server          OBJECT IDENTIFIER ::= { windowsNT 2 }
dc              OBJECT IDENTIFIER ::= { windowsNT 3 }

END

# vi /usr/share/snmp/mibs/TRAP-TEST-MIB.txt
TRAP-TEST-MIB DEFINITIONS ::= BEGIN
        IMPORTS ucdExperimental FROM UCD-SNMP-MIB;

demotraps OBJECT IDENTIFIER ::= { ucdExperimental 990 }

demo-trap TRAP-TYPE
        STATUS current
        ENTERPRISE demotraps
        VARIABLES { sysLocation }
        DESCRIPTION "This is just a demo"
        ::= 17

END

Next, we configure the 'snmptrapd' daemon. Although the daemon comes with the SNMP daemon package and is installed in /usr/sbin, no startup script has been put into /etc/init.d. Fortunately, there is a template in /usr/share/doc/packages/net-snmp.

# cp /usr/share/doc/packages/net-snmp/rc.snmptrapd /etc/init.d/snmptrapd

# vi /etc/init.d/snmptrapd

OPTIONS="-On -p /var/run/snmptrapd.pid -M /usr/share/snmp/mibs -m ALL"

change:
startproc $SNMPTRAPD $OPTIONS -c /etc/snmptrapd.conf -Lf /var/log/net-snmpd.log
to:
startproc $SNMPTRAPD $OPTIONS -c /etc/snmp/snmptrapd.conf -Lf /var/log/
net-snmpd.log

Now we create the configuration file for the 'snmptrapd' daemon. We define the trap community for simple access control and we add a trap handler 'default' to handle all traps by a test script we are going to create. Then we enable and start 'snmptrapd' through yast->system->system services (runlevel)-> enable snmptrapd for runlevel 2 3 5.

# vi /etc/snmp/snmptrapd.conf

# --------------------------------------------------------------------------- #
# snmptrapd.conf:                                                             #
#    configuration file for configuring the ucd-snmp snmptrapd agent.         #
# ----------------------------------------------------------------------------#

# first, we define the access control
authCommunity log,execute,net SECtrap

# next , the trap handlers
traphandle      default                                 /tmp/snmptraptest.sh
# END of snmptrapd.conf ---------------------------------------------------- #

The 'Testing' part: Learning to send, receive and filter SNMP traps


Let's create a simple test script snmptraptest.sh that writes all received SNMP traps into a log file.

# vi /tmp/snmptraptest.sh

#!/bin/sh

TESTLOG=/tmp/test
vars=

read host
read ip

while read oid val; do
  if [ "$vars" = "" ]; then
    vars="$oid = $val"
  else
    vars="$vars, $oid = $val"
  fi
done

if [ -w $TESTLOG ]; then
  touch $TESTLOG
fi

echo trap: $1 $host $ip $vars >> $TESTLOG

We are ready for our first test from the local system, using the 'snmptrap' command, verifying the traps are received and processed by our test script. Also notice the use of the TRAP-TEST-MIB we generated.

# snmptrap -v 2c -c SECtrap 127.0.0.1 "" TRAP-TEST-MIB::demo-trap SNMPv2-MIB::
sysLocation.0 s "here"

# cat /tmp/traptest.log
trap: localhost UDP: [127.0.0.1]:42706 DISMAN-EVENT-MIB::sysUpTimeInstance =
 6:4:53:38.72,
 SNMPv2-MIB::snmpTrapOID.0 = TRAP-TEST-MIB::demo-trap, SNMPv2-MIB::sysLocation
.0 = here
trap: localhost UDP: [127.0.0.1]:42706 DISMAN-EVENT-MIB::sysUpTimeInstance =
 6:4:53:38.72,
 SNMPv2-MIB::snmpTrapOID.0 = TRAP-TEST-MIB::demo-trap, SNMPv2-MIB::sysLocation
.0 = here

Well, we really are receiving traps, but why are we getting them twice? Let's check if our 'snmptraptest.sh' script is called twice. We can change the last line writing the output to include a random string and give it another try.

# vi /tmp/snmptraptest.sh

change:
echo trap: $1 $host $ip $vars >> $TESTLOG
to:
echo `/usr/bin/openssl rand 20 -base64` trap: $1 $host $ip $vars >> $TESTLOG

# snmptrap -v 2c -c SECtrap 127.0.0.1 "" TRAP-TEST-MIB::demo-trap SNMPv2-MIB::
sysLocation.0 s "here"

# cat /tmp/traptest.log
vRgoIkp7Y/66EyxK6fETsR7lqhY= trap: localhost UDP: [127.0.0.1]:58476 DISMAN-
EVENT-MIB::sysUpTimeInstance = 6:20:16:06.35, SNMPv2-MIB::snmpTrapOID.0 = 
TRAP-TEST-MIB::demo-trap, SNMPv2-MIB::sysLocation.0 = here
aRsf084ZC/fcJqeOCjFRH/SCNdI= trap: localhost UDP: [127.0.0.1]:58476 DISMAN-
EVENT-MIB::sysUpTimeInstance = 6:20:16:06.35, SNMPv2-MIB::snmpTrapOID.0 = 
TRAP-TEST-MIB::demo-trap, SNMPv2-MIB::sysLocation.0 = here

Voila, the random hash is different, the script is indeed being called twice! Further down the investigation ... it turns out that 'snmptrapd' is compiled with the default configuration file path being already set to '/etc/snmp/snmptrapd.conf'. The explicit setting of it using the '-c' option in '/etc/init.d/snmptrapd' causes the file being read and executed twice. Feature or bug? No matter, we need to remove the '-c' option from '/etc/init.d/snmptrapd'. Re-test, check, problem solved.

# vi /etc/init.d/snmptrapd

change:
startproc $SNMPTRAPD $OPTIONS -c /etc/snmp/snmptrapd.conf -Lf /var/log/
net-snmpd.log
to:
startproc $SNMPTRAPD $OPTIONS -Lf /var/log/net-snmpd.log

After we are able to reliably receive SNMP traps, its time to be selective about them. This is achieved by defining a explicit snmpTrapOID value match in '/etc/snmp/snmptrapd.conf'. Let's say we only care about the Windows 'coldstart' traps, our match would be the trap having the oid=value pair of the 'SNMPv2-MIB::snmpTrapOID.0 = SNMPv2-MIB::coldStart'. Then we restart the Windows SNMP service once more and verify receiving the trap data. This time whe recorded just a single trap in '/tmp/traptest.log'.

# vi /etc/snmp/snmptrapd.conf

change:
traphandle      default                                 /tmp/snmptraptest.sh
to:
traphandle      SNMPv2-MIB::coldStart           	/tmp/snmptraptest.sh

# /etc/init.d/snmptrapd restart

# cat /tmp/traptest.log
trap: 192.168.203.140 UDP: [192.168.203.140]:1074 DISMAN-EVENT-MIB::sysUpTime
Instance = 0
:0:00:00.00, SNMPv2-MIB::snmpTrapOID.0 = SNMPv2-MIB::coldStart, SNMP-COMMUNITY
-MIB::snmpTrapAddress.0= 192.168.203.140, SNMP-COMMUNITY-MIB::snmpTrapCommunity
.0 = "SECtrap", SNMPv2-MIB::snmpTrapEnterprise.0 = MSFT-MIB::workstation

The 'Translating' part, converting the SNMP traps into Nagios format and send them to Nagios


Nagios can be set to receive and process data sent from external programs. Lets verify the related directives are enabled and set in the Nagios configuration file:

# egrep 'check_external_commands|command_check_interval|command_file' nagios.cfg
check_external_commands=1
#command_check_interval=-1
command_check_interval=5s
command_file=/home/app/nagios/var/rw/nagios.cmd

# grep accept_passive /home/app/nagios/etc/nagios.cfg
accept_passive_service_checks=1
accept_passive_host_checks=1

The Nagios data-receiving part is the named pipe '/home/app/nagios/var/rw/nagios.cmd'. The format of the Nagios event to send to '/home/app/nagios/var/rw/nagios.cmd' is:

[Unix Timestamp] Message Descriptor;host name;service-name;severity-code;text data

Example: [1141163054] PROCESS_SERVICE_CHECK_RESULT;susie112;check_trap_susie112;1;Trap test data

We can now send a test event to Nagios to see if it is received properly:

# echo "`date +[%s]` PROCESS_SERVICE_CHECK_RESULT;testserver;check_trap_test;1;test" > 
/home/app/nagios/var/rw/nagios.cmd

# tail /home/app/nagios/var/nagios.log | grep trap

[1224133947] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;testserver;check_trap_test;1;
test
[1224133947] Warning:  Passive check result was received for service 'check_trap_test' on
 host 'testserver', but the host could not be found!

It is time to think of a program that translates our SNMP trap into a Nagios event and sends it to Nagios through its command file. We want this trap service for Windows reboots associated with each Nagios host in order to allow for a separate notification to the appropriate host support team. We also want the severity code set to warning, but avoid confirmation by hand. Instead we want the event to be cleared quickly to OK state, and no notification should go out for this auto-confirmation.
The association with a Nagios host requires us to get the correct host name derived from the trap IP. The auto-confirmation is made by a second, slighty delayed event submission with severity code '0'. the notification for OK is disabled in the service template. I programmed and named this program send_trap_data.pl, then put it into my nagios-home/libexec directory. If the DEBUG option is set to 1, the program writes some parameters and the submitted Nagios events into a temp file. Let's enable 'send_trap_data.pl' to start process incoming SNMP traps for Nagios:

# vi /etc/snmp/snmptrapd.conf

change:
traphandle      SNMPv2-MIB::coldStart       /tmp/snmptraptest
to:
# traphandle    SNMPv2-MIB::coldStart       /tmp/snmptraptest
traphandle      SNMPv2-MIB::coldStart       /home/app/nagios/libexec/send_trap_data.pl

# /etc/init.d/snmptrapd restart

# cat /tmp/test3
trapline >proxyjp02.frank4dd.com
 UDP: [192.168.100.184]:12380
 DISMAN-EVENT-MIB::sysUpTimeInstance 0:0:00:00.00
 SNMPv2-MIB::snmpTrapOID.0 SNMPv2-MIB::coldStart
 SNMP-COMMUNITY-MIB::snmpTrapAddress.0 192.168.100.184
 SNMP-COMMUNITY-MIB::snmpTrapCommunity.0 "SECtrap"
 SNMPv2-MIB::snmpTrapEnterprise.0 MSFT-MIB::server
<
traphost >proxyjp02.frank4dd.com<
snmpname >SNMPv2-MIB::sysName.0 = STRING: JPNHOMG035<
hostname >winserver03<
eventstr >[1224478743] PROCESS_SERVICE_CHECK_RESULT;winserver03;check_trap_coldstart;1;Syst
em *reboot* or SNMP service restarted.<
Wrote eventstr to /home/app/nagios/var/rw/nagios.cmd
eventstr >[1224478743] PROCESS_SERVICE_CHECK_RESULT;winserver03;check_trap_coldstart;0;Syst
em *reboot* or SNMP service restarted. auto-OK<
Wrote eventstr to /home/app/nagios/var/rw/nagios.cmd
End of send_trap_data.pl.

The 'Processing' part, displaying and notifying SNMP trap generated events with Nagios


Here, we define a service template and add services to it. Depending on how many different notifications we need to generate, we need to separate the actual services.

vi /home/app/nagios/etc/nagios.cfg

# passive service check for SNMP traps
cfg_file=/home/app/nagios/etc/objects/trap-services-template.cfg
cfg_file=/home/app/nagios/etc/objects/trap-services.cfg

# vi /home/app/nagios/etc/objects/trap-services-template.cfg

##############################################################################
# Define a servicegroup for SNMP trap service checks
# All SNMP trap service checks will be members of this group
##############################################################################
define servicegroup{
  servicegroup_name        snmptrap-checks     ; The name of the servicegroup
  alias                    SNMP Trap Services  ; Long name of the group
}
##############################################################################
# Define the database check template service
##############################################################################
define service{
  name                          generic-trap
  active_checks_enabled         0		; traps are only passive checks
  passive_checks_enabled        1               ; yes, check passive
  parallelize_check             1		; yes, please
  obsess_over_service           0		; we don't run extra commands
  check_freshness               0               ; don't check for freshness
  notifications_enabled         1		; send notifications
  event_handler_enabled         1		; yes, but we have none
  flap_detection_enabled        0		; with auto-OK, we don't
  failure_prediction_enabled    1		; dependency checks
  process_perf_data             0		; don't send this to perfdata
  retain_status_information     1		; yes, once auto-OK'ed, keep it
  retain_nonstatus_information  1
  is_volatile                   1               ; enable for passive checks
  check_period                  24x7		; always check for submissions
  max_check_attempts            1		; one trap is enough
  normal_check_interval         1		
  retry_check_interval          1
  contact_groups                frankonly
  notification_options          w               ; notify for warnings only
  notification_interval         120             ; notify every 2 hrs
  notification_period           24x7		; always notify
  register                      0		; template, don't register
  servicegroups                 snmptrap-checks
  check_command                 check_none	; we do not run any checks
}

# vi /home/app/nagios/etc/objects/trap-services.cfg

##############################################################################
# Receive SNMP traps for windows boot events via eventhandler scripts
##############################################################################
define service {
  use                           generic-trap
  host_name                     winserver03
  name                          check_trap_coldstart
  service_description           check_trap_coldstart
}
##############################################################################

Credits, copyrights original scripts etc


Topics:

More Information:


Screenshots

Windows reboot check service group Windows reboot check service detail Windows reboot check service line Windows reboot check service line 2 Windows reboot e-mail notification