Monitoring the Windows Update Status with Nagios through SNMP

Frank4DD, @2009

Introduction

This tutorial describes an approach to check if Windows systems are being properly patched. This is important in particular if you have servers in larger numbers, and you need to evaluate their compliance and risk status for your company. The typical existing solutions are running reports through Windows Update Servers (WSUS), or running scripts against the registry to list and compare the applied patches against a baseline (Security scanners like Foundstone or Nessus do just that). The last approach is certainly the most accurate, but also the most intensive way. With Microsoft releasing patches bi-weekly, these patch lists are growing huge over time. Even when they finally collapse into a service pack after many month's, patch lists are frequently changing and confusing.

Plugin Design

The approach described here does not check for the existence of each single patch. Instead it checks the correct setup of the automatic patch service, either being Microsoft Online or a local WSUS server. It then runs the Windows-build-in check to see if there are any patches outstanding for this system and reports the results to the central monitoring system Nagios. The benefits are:

Leveraging the existing monitoring setup of Nagios:

Using the already defined server inventory (windows server list)
Using the already defined administrator notification and contact list
Integrating with the other systems (LINUX, IOS) OS patch monitoring
for a 'enterprise' view of on patch compliance

Small server footprint and easy rollout
Easy verification for patch infrastructure set up (the server settings are OK)

The drawbacks of this method are:

It is less accurate then the method of comparison against a patch baseline list.

For example, it is impossible to tell directly and independendly if a particular patch has been applied. It does not tell if there are patch-overriding settings that suppress a particular patch installation. The future will show if this alternative method is sufficient and practical enough. The method currently works well with our Linux servers, where the patch-check connects to the online patch service, returning the list of outstanding patches for a particular system. Our Linux Vendors (Novell SuSE Linux Enterprise Server) patch release cycle is even more frequent then Microsoft, with patches being released on a almost daily basis. With a active Nagios notification being send to our admins, I found the patching being done much faster and pro-active. Nobody want's their servers being listed in status warning for too long. With patch reminders being send out until the patching is complete, administrators cannot 'forget' the patch task in their daily struggle of shifting priorities between server maintenance and project work.

Set up and test the Windows patch check: win_update_trapsend.vbs

First, we need to have a program that determines the current patch status. Microsofts Windows Scripting Host is universally available, we can use VBscript to write the check program win_update_trapsend.vbs. First we edit the top of the script to set our SNMP trap destination IP. Running it without further options, Windows scripting runs in interactive mode, opening a output window. We want to suppress that window and redirect any output into a local logfile. I created a batch file called win_update_trapsend.bat so I do not need to re-type the commandline options when I want to run it by hand. Finally, we need to find a good home directory for our script, often admins already have such a script home for their ops scripts. If not, I tend to use C:\update-monitor.

C:\> cscript.exe -NoLogo C:\update-monitor\win_update_trapsend.vbs 
> C:\update-monitor\win_update_trapsend.log

Transmit the check results to the Nagios system: TrapGen

We are using SNMP to monitor Windows severs and SNMP is our central monitoring protocol used across all systems. In Linux, we have the extend function in UCD NET-SNMP that allows to run scripts remotely and receive the output through SNMP. Unfortunately, the SNMP service shipping with Windows is limited: incapable of SNMPv3, no extend. As a result, we face the dilemma how to initiate the check and how to transport our monitoring result back to Nagios. One solution is to add a service such as NRPE-NT, which is exactly made for that purpose. But should we do that just for one single script? Repeat after me: "I dont want another daemon! I don't want another daemon!..." :-) In a enterprise with hundreds of servers, it makes a difference of getting a small client program rolled out vs. going through all the required testing of implementing another service. I tested sucessfully TrapGen from Network Computing Technologies, Inc., a small 136KB binary that can send custom SNMP traps from Windows systems. Together with the setup of a SNMP trap daemon, plus the passive service configuration in Nagios, we receive Windows update check results that are launched daily through the Windows scheduler.

windows scheduler setup

Submit the update status data into Nagios

The client setup is easy on the windows system and also easy on the Nagios side, because we can leverage the existing SNMP trap implementation of our Windows Reboot Monitoring. We just add a new trap handler definition to '/etc/snmp/snmptrapd.conf' and update the send_trap_data.pl script. This script is responsible for processing the received SNMP trap data, parsing it into a Nagios event and submitting it to Nagios as a passive check result. This might require some debugging, since trap OID's could be translated differently - depending on the MIB's loaded by the trap daemon. The script uses trap OID strings to separate different trap types to match them up with their respective Nagios service group.

susie3 ~ # cat /etc/snmp/snmptrapd.conf
###############################################################################
# snmptrapd.conf:
#    configuration file for configuring the ucd-snmp snmptrapd agent.
###############################################################################

# first, we define the access control
authCommunity   log,execute,net SECtrap

# next , the trap handlers.
# Windows reboots: SNMPv2-MIB::snmpTrapOID.0 = SNMPv2-MIB::coldStart
traphandle  SNMPv2-MIB::coldStart  /srv/app/nagios/libexec/send_trap_data.pl

# Win update traps: SNMPv2-MIB::snmpTrapOID.0 = RFC1155-SMI::enterprises.2854.0.1
traphandle  RFC1155-SMI::enterprises.2854.0.1  /srv/app/nagios/libexec/send_trap_data.pl

Passive checks have disadvantages: We cannot force a re-check of the Service from Nagios. If we want to update the Nagios status (Manager after patching: "Make it green!"), we need to either wait for the next scheduled check to kick in, or we need to run the check script on the Windows client by hand. A second disadvantage is that a system's monitoring configuration can break and it is not noticed. Then a passive check will not receive any new data. Fortunately, we can visualize this in Nagios using the 'freshness' parameters together with the check_command definition for 'stale' results (see no-patch-report in the next section).

Configure the plugin and service in Nagios

Here, one important item is the service description name. It must match the name configured in send_trap_data.pl. Otherwise, Nagios cannot relate the event to any existing service for processing it.

vi /srv/app/nagios/etc/nagios.cfg

# passive service check for Windows Patch Update SNMP traps
cfg_file=/home/app/nagios/etc/objects/patch-services-windows.cfg

vi /srv/app/nagios/etc/objects/patch-services-windows.cfg

##############################################################################
# Define a servicegroup for patch service checks
# All patch service checks will be members of this group
##############################################################################
define servicegroup{
  servicegroup_name        patch-checks-win     ; The name of the servicegroup
  alias                    Patch Checks Windows ; Long name of the group
}
##############################################################################
# Define the database check template service
##############################################################################
define service{
  name                          generic-patch-win
  active_checks_enabled         0               ; traps are only passive checks
  passive_checks_enabled        1               ; yes, check passive
  parallelize_check             1               ; yes, please
  obsess_over_service           0               ; we don't run extra commands
  check_freshness               1               ; check if a report came in
  freshness_threshold           93600           ; 26 hour threshold for stale,
                                                ; patchcheck should run daily
  check_command                 no-patch-report ; runs if service is "stale"
  notifications_enabled         1               ; send notifications
  event_handler_enabled         1               ; yes, but we have none
  flap_detection_enabled        0               ; with auto-OK, we don't
  failure_prediction_enabled    1               ; dependency checks
  process_perf_data             0               ; don't send this to perfdata
  retain_status_information     1               ; yes, once auto-OK'd keep it
  retain_nonstatus_information  1
  is_volatile                   1               ; enable for passive checks
  check_period                  24x7            ; always check for submission
  max_check_attempts            1               ; one trap is enough
  normal_check_interval         1
  retry_check_interval          1
  contact_groups                frankonly
  notification_options          w,r             ; notify for warnings+recovery
  notification_interval         1440            ; notify once a day
  notification_period           24x7            ; always notify
  register                      0               ; template, don't register
  service_groups                patch-checks-win
}

##############################################################################
# Receive SNMP traps for Windows update status
##############################################################################
define service {
  use                           generic-patch-win
  hostgroup                     2-windows-servers
  name                          check_trap_winpatch
  service_description           check_trap_winpatch
}

vi command.cfg and add the definition for check_command no-patch-report:

# This will always return "OK" but tells us no patch report came in that day.
# see also http://nagios.sourceforge.net/docs/3_0/freshness.html
define command{
 command_name no-patch-report
 command_line $USER1$/check_dummy 0 "Daily patch check result not reported!"
}

susie3:/srv/app/nagios/etc/objects # echo "cfg_file=/srv/app/nagios/etc/
objects/patch-services-windows.cfg" >> /srv/app/nagios/etc/nagios.cfg

susie3:/srv/app/nagios/etc/objects # /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.

Comments for Windows Update Servers WSUS

With most servers being set to use WSUS, Windows patches are WSUS approved and then deployed on a fixed schedule. That means the patch check could *always* return OK, because patches become only visible to the system shortly before the actual patching. Also, we depend fully on the WSUS administrator to determine which patches are applicable. The solution? For the time of our patch check, we switch from WSUS to the *official* Windows Online update service and back to WSUS after our check. It is quite an effort (registry key changes, proxy settings, etc), but the only way for an independend check. This code is in development/testing, your comments highly welcome.

Troubleshooting Tips

Implementing a passive service with SNMP traps is not for the faint of heart. Here are some tips to get it going:

Check if the Windows trap data has been send, call win_update_trapsend.bat
Check if trap data arrives at the Nagios server using a packet sniffer such as tcpdump or etherreal
Check if the snmptrapd daemon processes this data (firewall might block, daemon config might be wrong, daemon might not be running)
Check if send_trap_data.pl generates the correct data for Nagios, using the tmp file
Check if the data is received by Nagios by checking nagios.log file
Check if the data is correctly associated with a Nagios service and hostname.

Most of these troubleshooting steps are also described (in more detail) here Windows Reboot Monitoring.