Introduction


Here I am describing the final step of how to cut-over the new Nagios version into production, after our testing and customization has been completed.

===
This how-to is still being refined. It needs to be simplified, and some steps need to be re-ordered for a better, logical approach.
===

Synchronising the configuration files


First, we stop our running test instance of Nagios.

susie112:~ # ps -ef |grep nagios
nagios    4914     1  0 Aug12 ?        00:00:00 /srv/app/nagios-3.3.1/bin/nagios -d /srv/app/nagios-3.3.1/etc/nagios.cfg
nagios   16213     1  0 16:32 ?        00:00:00 /srv/app/nagios/bin/nagios -d /srv/app/nagios/etc/nagios.cfg
root     17344  2451  0 16:45 pts/0    00:00:00 grep nagios

susie112:~ # killproc /srv/app/nagios-3.3.1/bin/nagios

susie112:~ # ps -ef |grep nagios
nagios   16213     1  0 16:32 ?        00:00:00 /srv/app/nagios/bin/nagios -d /srv/app/nagios/etc/nagios.cfg
root     17372  2451  0 16:45 pts/0    00:00:00 grep nagios

If we have a specific URL for our Nagios production instance like /nagios and wand to continue the symbolic link /srv/app/nagios, re-compiling new binaries with these location parameters is a easier choice then to keep track of the file and link dependencies manually. For example, the Nagios CGI have the configuration file location hardcoded.

Next, we rename our new configuration directoryi we used for testing, and copy the production configuration into its place.

susie112:~ # mv /srv/app/nagios-3.3.1/etc /srv/app/nagios-3.3.1/etc.test
susie112:~ # rsync -uvrogp /srv/app/nagios-3.2.3/etc /srv/app/nagios-3.3.1/
sending incremental file list
etc/
etc/cgi.cfg
etc/nagios.cfg
a...
etc/objects/timeperiods.cfg
etc/objects/website-services.cfg

sent 175519 bytes  received 514 bytes  352066.00 bytes/sec
total size is 173661  speedup is 0.99

The configuration files still have references to the old Nagios directories, we update all configuration files under our new version with the new versions path. A pre-flight check will tell us if we are allright to use this new configuration.

susie112:~ # sed 's/\/srv\/app\/nagios\//\/srv\/app\/nagios-3\.3\.1\//g' -i /srv/app/nagios-3.3.1/etc/*.cfg
susie112:~ # sed 's/\/srv\/app\/nagios\//\/srv\/app\/nagios-3\.3\.1\//g' -i /srv/app/nagios-3.3.1/etc/objects/*

susie112:~ # grep /srv/app/nagios/ /srv/app/nagios-3.3.1/etc/*          
susie112:~ # grep /srv/app/nagios/ /srv/app/nagios-3.3.1/etc/*/*
susie112:~ #

susie112:~ # /srv/app/nagios-3.3.1/bin/nagios -v /srv/app/nagios-3.3.1/etc/nagios.cfg

Nagios Core 3.3.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 07-25-2011
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/srv/app/nagios/etc/objects/commands.cfg'...

Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

We edit this configuration to disable all notifications for our first start of Nagios running the new production configuration. This way we avoid potentially sending out a large amount of false alerts to a lot of people in case something goes wrong.

The way I handle this is by separating out all notification commands into a dedicated notification.cfg file. A separate file called no-notification.cfg has identical notification definitions, but the command behind all is a shell sleep. By switching between notification.cfg to no-notification.cfg I can quickly turn all notifications on or off.

susie112:~ # sed 's/^cfg_file=\/srv\/app\/nagios-3.3.1\/etc\/objects\/notification.cfg/cfg_file=\/srv\/app\/nagios-3.3.1\/etc\/objects\/no-notification.cfg/' -i /srv/app/nagios-3.3.1/etc/nagios.cfg
susie112:~ # grep cfg_file=/srv/app/nagios-3.3.1/etc/objects/no-notification.cfg /srv/app/nagios-3.3.1/etc/nagios.cfgi
cfg_file=/srv/app/nagios-3.3.1/etc/objects/no-notification.cfg

Nagios State Synchronisation


Now we synchronise all state information from the existing Nagios instance into our new version.

susie112:~ # /etc/init.d/nagios stop
Stopping nagios: done.
susie112:~ # ps -ef |grep nagios
root      1202   344  0 08:31 pts/1    00:00:00 vi nagios-upgrade-procedure-golive.htm
root      1805   360  0 08:41 pts/2    00:00:00 grep nagios

susie112:~ # mv /srv/app/nagios-3.3.1/var /srv/app/nagios-3.3.1/var.test

susie112:~ # rsync -uvrogp /srv/app/nagios-3.2.3/var /srv/app/nagios-3.3.1/ 
sending incremental file list
var/nagios.log
var/objects.cache
var/retention.dat
var/archives/nagios-01-01-2011-00.log
var/archives/nagios-01-02-2011-00.log
...
var/archives/nagios-12-30-2010-00.log
var/archives/nagios-12-31-2010-00.log

sent 20970854 bytes  received 5582 bytes  13984290.67 bytes/sec
total size is 20949097  speedup is 1.00
susie112:~ #

Production Cut-over


susie112:~ # ps -ef |grep nagios
root      2281   360  0 08:54 pts/2    00:00:00 grep nagios

susie112:~ # /etc/init.d/nagios start
Starting nagios: done.
susie112:/home/fm # tail /srv/app/nagios-3.3.1/var/nagios.log 
[1313279645] Caught SIGTERM, shutting down...
[1313279645] Successfully shutdown... (PID=1919)
[1313279661] Nagios 3.3.1 starting... (PID=2323)
[1313279661] Local time is Sun Aug 14 08:54:21 JST 2011
[1313279661] LOG VERSION: 2.0
[1313279661] Finished daemonizing... (New PID=2324)

susie112:~ # ps -ef |grep nagios
nagios    2324     1  0 08:54 ?        00:00:00 /srv/app/nagios/bin/nagios -d /srv/app/nagios/etc/nagios.cfg
root      2372   360  0 08:55 pts/2    00:00:00 grep nagios

Fallback Procedure



susie112:~ # /etc/init.d/nagios stop
susie112:~ # rm /srv/app/nagios
susie112:~ # ln -s /srv/app/nagios-3.2.3 /srv/app/nagios

susie112:~ # rsync -uvrogp /srv/app/nagios-3.3.1/etc /srv/app/nagios-3.2.3
susie112:~ # rsync -uvrogp /srv/app/nagios-3.3.1/var /srv/app/nagios-3.2.3

susie112:~ # mv /srv/www/std-root/nagios.frank4dd.com/nagios /srv/www/std-root/nagios.frank4dd.com/nagios-3.3.1
susie112:~ # mv /srv/www/std-root/nagios.frank4dd.com/nagios-3.2.3 /srv/www/std-root/nagios.frank4dd.com/nagios

susie112:~ # /etc/init.d/nagios start

Post-Migration Tasks


After we monitored Nagios works properly and there are no check failures, we can re-enable the notifications.

susie112:~ # sed 's/^cfg_file=\/srv\/app\/nagios-3.3.1\/etc\/objects\/no-notification.cfg/cfg_file=\/srv\/app\/nagios-3.3.1\/etc\/objects\/notification.cfg/' -i /srv/app/nagios-3.3.1/etc/nagios.cfg

susie112:~ # grep notification.cfg /srv/app/nagios-3.3.1/etc/nagios.cfg
cfg_file=/srv/app/nagios-3.3.1/etc/objects/notification.cfg

susie112:~ # /etc/init.d/nagios reload

Finally, after our cool-off period (the time we assume sufficient until we don't need to fall back anymore), we can archive and remove the old Nagios instance.


This concludes the last stage of the upgrade. A command scratchpad for copy-and-paste is here. This site has a live view of Nagios 3.3.1.

Upgrading Nagios

Upgrade Preparations

Testing and Customization

Production Migration

More Information: