Introduction
This tutorial describes an approach to check if Windows systems are being properly patched. This is important in particular if you have servers in larger numbers, and you need to evaluate their compliance and risk status for your company. The typical existing solutions are running reports through Windows Update Servers (WSUS), or running scripts against the registry to list and compare the applied patches against a baseline (Security scanners like Foundstone or Nessus do just that). The last approach is certainly the most accurate, but also the most intensive way. With Microsoft releasing patches bi-weekly, these patch lists are growing huge over time. Even when they finally collapse into a service pack after many month's, patch lists are frequently changing and confusing.
Plugin Design
The approach described here does not check for the existence of each single patch. Instead it checks the correct setup of the automatic patch service, either being Microsoft Online or a local WSUS server. It then runs the Windows-build-in check to see if there are any patches outstanding for this system and reports the results to the central monitoring system Nagios. The benefits are:
- Leveraging the existing monitoring setup of Nagios:
- Using the already defined server inventory (windows server list)
- Using the already defined administrator notification and contact list
- Integrating with the other systems (LINUX, IOS) OS patch monitoring
for a 'enterprise' view of on patch compliance - Small server footprint and easy rollout
- Easy verification for patch infrastructure set up (the server settings are OK)
The drawbacks of this method are:
- It is less accurate then the method of comparison against a patch baseline list.
For example, it is impossible to tell directly and independendly if a particular patch has been applied. It does not tell if there are patch-overriding settings that suppress a particular patch installation. The future will show if this alternative method is sufficient and practical enough. The method currently works well with our Linux servers, where the patch-check connects to the online patch service, returning the list of outstanding patches for a particular system. Our Linux Vendors (Novell SuSE Linux Enterprise Server) patch release cycle is even more frequent then Microsoft, with patches being released on a almost daily basis. With a active Nagios notification being send to our admins, I found the patching being done much faster and pro-active. Nobody want's their servers being listed in status warning for too long. With patch reminders being send out until the patching is complete, administrators cannot 'forget' the patch task in their daily struggle of shifting priorities between server maintenance and project work.
Set up and test the Windows patch check: win_update_trapsend.vbs
First, we need to have a program that determines the current patch status. Microsofts Windows Scripting Host is universally available, we can use VBscript to write the check program win_update_trapsend.vbs. First we edit the top of the script to set our SNMP trap destination IP. Running it without further options, Windows scripting runs in interactive mode, opening a output window. We want to suppress that window and redirect any output into a local logfile. I created a batch file called win_update_trapsend.bat so I do not need to re-type the commandline options when I want to run it by hand. Finally, we need to find a good home directory for our script, often admins already have such a script home for their ops scripts. If not, I tend to use C:\update-monitor.
C:\> cscript.exe -NoLogo C:\update-monitor\win_update_trapsend.vbs
> C:\update-monitor\win_update_trapsend.log
Transmit the check results to the Nagios system: TrapGen
We are using SNMP to monitor Windows severs and SNMP is our central monitoring protocol used across all systems. In Linux, we have the extend function in UCD NET-SNMP that allows to run scripts remotely and receive the output through SNMP. Unfortunately, the SNMP service shipping with Windows is limited: incapable of SNMPv3, no extend. As a result, we face the dilemma how to initiate the check and how to transport our monitoring result back to Nagios. One solution is to add a service such as NRPE-NT, which is exactly made for that purpose. But should we do that just for one single script? Repeat after me: "I dont want another daemon! I don't want another daemon!..." :-) In a enterprise with hundreds of servers, it makes a difference of getting a small client program rolled out vs. going through all the required testing of implementing another service. I tested sucessfully TrapGen from Network Computing Technologies, Inc., a small 136KB binary that can send custom SNMP traps from Windows systems. Together with the setup of a SNMP trap daemon, plus the passive service configuration in Nagios, we receive Windows update check results that are launched daily through the Windows scheduler.
Submit the update status data into Nagios
The client setup is easy on the windows system and also easy on the Nagios side, because we can leverage the existing SNMP trap implementation of our Windows Reboot Monitoring. We just add a new trap handler definition to '/etc/snmp/snmptrapd.conf' and update the send_trap_data.pl script. This script is responsible for processing the received SNMP trap data, parsing it into a Nagios event and submitting it to Nagios as a passive check result. This might require some debugging, since trap OID's could be translated differently - depending on the MIB's loaded by the trap daemon. The script uses trap OID strings to separate different trap types to match them up with their respective Nagios service group.
susie3 ~ # cat /etc/snmp/snmptrapd.conf
###############################################################################
# snmptrapd.conf:
# configuration file for configuring the ucd-snmp snmptrapd agent.
###############################################################################
# first, we define the access control
authCommunity log,execute,net SECtrap
# next , the trap handlers.
# Windows reboots: SNMPv2-MIB::snmpTrapOID.0 = SNMPv2-MIB::coldStart
traphandle SNMPv2-MIB::coldStart /srv/app/nagios/libexec/send_trap_data.pl
# Win update traps: SNMPv2-MIB::snmpTrapOID.0 = RFC1155-SMI::enterprises.2854.0.1
traphandle RFC1155-SMI::enterprises.2854.0.1 /srv/app/nagios/libexec/send_trap_data.pl
Passive checks have disadvantages: We cannot force a re-check of the Service from Nagios. If we want to update the Nagios status (Manager after patching: "Make it green!"), we need to either wait for the next scheduled check to kick in, or we need to run the check script on the Windows client by hand. A second disadvantage is that a system's monitoring configuration can break and it is not noticed. Then a passive check will not receive any new data. Fortunately, we can visualize this in Nagios using the 'freshness' parameters together with the check_command definition for 'stale' results (see no-patch-report in the next section).
Configure the plugin and service in Nagios
Here, one important item is the service description name. It must match the name configured in send_trap_data.pl. Otherwise, Nagios cannot relate the event to any existing service for processing it.
vi /srv/app/nagios/etc/nagios.cfg
# passive service check for Windows Patch Update SNMP traps
cfg_file=/home/app/nagios/etc/objects/patch-services-windows.cfg
vi /srv/app/nagios/etc/objects/patch-services-windows.cfg
##############################################################################
# Define a servicegroup for patch service checks
# All patch service checks will be members of this group
##############################################################################
define servicegroup{
servicegroup_name patch-checks-win ; The name of the servicegroup
alias Patch Checks Windows ; Long name of the group
}
##############################################################################
# Define the database check template service
##############################################################################
define service{
name generic-patch-win
active_checks_enabled 0 ; traps are only passive checks
passive_checks_enabled 1 ; yes, check passive
parallelize_check 1 ; yes, please
obsess_over_service 0 ; we don't run extra commands
check_freshness 1 ; check if a report came in
freshness_threshold 93600 ; 26 hour threshold for stale,
; patchcheck should run daily
check_command no-patch-report ; runs if service is "stale"
notifications_enabled 1 ; send notifications
event_handler_enabled 1 ; yes, but we have none
flap_detection_enabled 0 ; with auto-OK, we don't
failure_prediction_enabled 1 ; dependency checks
process_perf_data 0 ; don't send this to perfdata
retain_status_information 1 ; yes, once auto-OK'd keep it
retain_nonstatus_information 1
is_volatile 1 ; enable for passive checks
check_period 24x7 ; always check for submission
max_check_attempts 1 ; one trap is enough
normal_check_interval 1
retry_check_interval 1
contact_groups frankonly
notification_options w,r ; notify for warnings+recovery
notification_interval 1440 ; notify once a day
notification_period 24x7 ; always notify
register 0 ; template, don't register
service_groups patch-checks-win
}
##############################################################################
# Receive SNMP traps for Windows update status
##############################################################################
define service {
use generic-patch-win
hostgroup 2-windows-servers
name check_trap_winpatch
service_description check_trap_winpatch
}
vi command.cfg and add the definition for check_command no-patch-report:
# This will always return "OK" but tells us no patch report came in that day.
# see also http://nagios.sourceforge.net/docs/3_0/freshness.html
define command{
command_name no-patch-report
command_line $USER1$/check_dummy 0 "Daily patch check result not reported!"
}
susie3:/srv/app/nagios/etc/objects # echo "cfg_file=/srv/app/nagios/etc/
objects/patch-services-windows.cfg" >> /srv/app/nagios/etc/nagios.cfg
susie3:/srv/app/nagios/etc/objects # /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
Comments for Windows Update Servers WSUS
With most servers being set to use WSUS, Windows patches are WSUS approved and then deployed on a fixed schedule. That means the patch check could *always* return OK, because patches become only visible to the system shortly before the actual patching. Also, we depend fully on the WSUS administrator to determine which patches are applicable. The solution? For the time of our patch check, we switch from WSUS to the *official* Windows Online update service and back to WSUS after our check. It is quite an effort (registry key changes, proxy settings, etc), but the only way for an independend check. This code is in development/testing, your comments highly welcome.
Troubleshooting Tips
Implementing a passive service with SNMP traps is not for the faint of heart. Here are some tips to get it going:
- Check if the Windows trap data has been send, call win_update_trapsend.bat
- Check if trap data arrives at the Nagios server using a packet sniffer such as tcpdump or etherreal
- Check if the snmptrapd daemon processes this data (firewall might block, daemon config might be wrong, daemon might not be running)
- Check if send_trap_data.pl generates the correct data for Nagios, using the tmp file
- Check if the data is received by Nagios by checking nagios.log file
- Check if the data is correctly associated with a Nagios service and hostname.
Credits, copyrights original scripts etc
- The script win_update_trapsend.vbs version 1.0 win_update_trapsend.vbs
- The updated script send_trap_data.pl version 1.2 send_trap_data.pl
- The older versions v1.1 and v1.0, just in case.
- Nagios and the Nagios community can be found at http://www.nagios.org/
- The Windows OS belongs to - who guessed: Microsoft http://www.microsoft.com/
- A local copy of TrapGen version 2.8 is saved here.
- Further Nagios documentation is available here http://nagios.fm4dd.com/docs/en/