Introduction


Monitoring custom application log files with Nagios is quite a complex task, it deserves a tutorial of its own. I believe there is no turnkey solution and to solve it we need to break the requirement up into functional parts to combine existing software.

In this particular example, I will look at building the Windows log monitoring from the following 3 software components:

Design


Monitoring custom Windows logfiles design

Part 1: Custom log parsing under Windows based on Error and Warn criteria, generating Nagios compatible output


The check_logfiles plugin is needed to parse the log based on freely defined patterns and return a Nagios readable status. It is available as an executable for Windows and integrates with NSclient++. Simply place it in a good home on your Windows server. Either in a directory by itself, or another good place is to place it later into the scripts directory of the NSClient++ home. Next, we create a log parsing configuration file, here I named it check_logfiles.cfg. We set our example log file to be in D:\data\logs\log1.log, and we define example warning and critical strings to search for.

$MACROS = { LOGDIR => 'D:\data' };
@searches = ({
  tag => 'apperror',
  logfile => '$LOGDIR$\logs\log1.log',
  criticalpatterns => [
      'ERROR timestamp appfailure 1',
      'ERROR timestamp appfailure 2' ],
  warningpatterns => [
      'WARN timestamp appwarning 1', 
      'WARN timestamp appwarning 2' ]
});

Next, we run the check_logfiles.exe on the Windows system over our example logfile to test the configuration we just created. In the example below, I inserted a test string into the log file to see if the critical pattern filter matches.

C:\logmonitor> check_logfiles.exe -f check_logfiles.cfg
CRITICAL - (1 errors in check_logfiles.protocol-2011-06-17-15-02-37) - ERROR timestamp appfailure 1 caused by xxx 
|apperror_lines=2 apperror_warnings=0 apperror_criticals=1 apperror_unknowns=0

Part 2: Initiating the log check and transporting the result back to Nagios


Next comes the NSClient++, which is a free monitoring agent written for Windows (32 and 64bit). It collects various standard performance data, and in addition it can call external scripts for custom monitoring such as our check_logfiles.exe plugin. During the installation of NSClient++ we should enable the NRPE server module to listen for check_nrpe service requests. While it is possible to also call NSClient++ with the Nagios-builtin plugin 'check_nt', the additional plugin 'check_nrpe' is needed to use NSClient++ with external scripts.

After installing the NSClient++ software on the Windows system and the NRPE check plugin on the Nagios server, we test the monitoring connection from Nagios and verify network connectivity works:

susie112:/srv/app/nagios/libexec # ./check_nrpe -H 192.168.103.184
I (0.3.8.75 2010-05-27) seem to be doing fine...

Now we can build the connection between the NSClient++ agent and the check_logfiles.exe plugin on the Windows system. We need to edit the NSClient++ configuration file NSC.ini, enable the line CheckExternalScripts.dll under the [modules] section and add a line describing how we need to call our script in the [External Scripts] section:

check_logfiles=C:\logmonitor\check_logfiles.exe -f C:\logmonitor\check_logfiles.cfg

After that we are ready to try and call the log parsing from our Nagios server with the check_nrpe plugin.

susie112:/srv/app/nagios/libexec # ./check_nrpe -H 192.168.103.184 -c check_logfiles
OK - no errors or warnings|apperror_lines=0 apperror_warnings=0 apperror_criticals=0 apperror_unknowns=0

If this works, the only remaining task is to add the usual Nagios service configuration definitions and test it end-to-end.

Nagios Service Configuration


susie112:~ # vi /srv/app/nagios/etc/objects/command.cfg
# 'define check_nrpe'
define command{
  command_name check_nrpe
  command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 60
}

nagios@susie112:~> vi /srv/app/nagios/etc/objects/win-logfile-services.cfg
###############################################################################
# Define a servicegroup for Windows log file checks
###############################################################################
define servicegroup{
  servicegroup_name        win-logfile-checks  ; The name of the servicegroup
  alias                    Windows Log Checks  ; Long name of the group
}
###############################################################################
# Define the generic patch check service template
###############################################################################
define service{
  name                          generic-win-logfile
  active_checks_enabled         1
  passive_checks_enabled        0
  parallelize_check             1
  obsess_over_service           1
  check_freshness               0
  notifications_enabled         1
  event_handler_enabled         1
  process_perf_data             0
  retain_status_information     1
  retain_nonstatus_information  1
  is_volatile                   1
  check_period                  24x7
  max_check_attempts            1
  normal_check_interval         5               ; check every 5 minutes
  retry_check_interval          1
  contact_groups                win-admins, security-team
  notification_options          u,w,r           ; notify unknown, warn, recovery
  notification_interval         1               ; ignored for volatile services
  notification_period           24x7
  register                      0
  servicegroups                 win-logfile-checks
}
###############################################################################
# Windows logfile checks
###############################################################################
define service {
  use                           generic-win-logfile
  host_name                     winserver1
  service_description           check_logfiles
  check_command                 check_nrpe!check_logfiles
}

The check_logfiles plugin resports only new alerts for each run. In order to alert for each occurence we need to set the Nagios parameter is_volatile.

Example Screenshots


Windows Log monitoring service state detail

Windows Log monitoring service line example ok
Windows Log monitoring service line example error

Monitoring standard performance data through NSClient++ and NRPE


I mentioned that the Windows monitoring agent NSClient++ can be used to retrieve standard performance data. For completeness, here is a quick example to show how we can get the systems CPU load, making the following call from Nagios:

susie112:/srv/app/nagios/libexec # ./check_nrpe -H 192.168.103.184 -c CheckCPU -a warn=80 crit=90 time=20m time=10s time=4
OK CPU Load ok.|'20m'=0%;80;90; '10s'=12%;80;90; '4'=0%;80;90;

If there is no data returned, please check the NSCLient++ log file and make sure to set the parameter 'allowed_arguments=1' in the [NRPE] section of NSC.ini.

Considering alternative solutions


Instead of implementing active monitoring, we could also opt for setting up passive monitoring by scheduling the check_logfiles plugin within Windows and returning the parse result through SNMP traps using the TrapGen program similar to our setup in the Windows Patch Update Monitoring Howto. This would save us the need to install a extra monitoring daemon on Windows. Installing additional network daemons is often a security and support concern.

There is a multitude of software out that make other combinations possible as well. For example, I found that CornerBowls Log Manager software looks promising and it is not expensive. It can do the log parsing and has the SNMP trap send function already integrated. In the end it depends on your preference, network and server envrionment which plugin and method is the best to choose.

Hope this helps in getting started!

Credits, Links and additonal information


More Information: