Introduction
Here I am describing the final step of how to cut-over the new Nagios version into production, after our testing and customization has been completed.
===
This how-to is still being refined. It needs to be simplified, and some steps need to be re-ordered for a better, logical approach.
===
Synchronising the configuration files
First, we stop our running test instance of Nagios.
susie112:~ # ps -ef |grep nagios
nagios 4914 1 0 Aug12 ? 00:00:00 /srv/app/nagios-3.3.1/bin/nagios -d /srv/app/nagios-3.3.1/etc/nagios.cfg
nagios 16213 1 0 16:32 ? 00:00:00 /srv/app/nagios/bin/nagios -d /srv/app/nagios/etc/nagios.cfg
root 17344 2451 0 16:45 pts/0 00:00:00 grep nagios
susie112:~ # killproc /srv/app/nagios-3.3.1/bin/nagios
susie112:~ # ps -ef |grep nagios
nagios 16213 1 0 16:32 ? 00:00:00 /srv/app/nagios/bin/nagios -d /srv/app/nagios/etc/nagios.cfg
root 17372 2451 0 16:45 pts/0 00:00:00 grep nagios
If we have a specific URL for our Nagios production instance like /nagios and wand to continue the symbolic link /srv/app/nagios, re-compiling new binaries with these location parameters is a easier choice then to keep track of the file and link dependencies manually. For example, the Nagios CGI have the configuration file location hardcoded.
Next, we rename our new configuration directoryi we used for testing, and copy the production configuration into its place.
susie112:~ # mv /srv/app/nagios-3.3.1/etc /srv/app/nagios-3.3.1/etc.test
susie112:~ # rsync -uvrogp /srv/app/nagios-3.2.3/etc /srv/app/nagios-3.3.1/
sending incremental file list
etc/
etc/cgi.cfg
etc/nagios.cfg
a...
etc/objects/timeperiods.cfg
etc/objects/website-services.cfg
sent 175519 bytes received 514 bytes 352066.00 bytes/sec
total size is 173661 speedup is 0.99
The configuration files still have references to the old Nagios directories, we update all configuration files under our new version with the new versions path. A pre-flight check will tell us if we are allright to use this new configuration.
susie112:~ # sed 's/\/srv\/app\/nagios\//\/srv\/app\/nagios-3\.3\.1\//g' -i /srv/app/nagios-3.3.1/etc/*.cfg
susie112:~ # sed 's/\/srv\/app\/nagios\//\/srv\/app\/nagios-3\.3\.1\//g' -i /srv/app/nagios-3.3.1/etc/objects/*
susie112:~ # grep /srv/app/nagios/ /srv/app/nagios-3.3.1/etc/*
susie112:~ # grep /srv/app/nagios/ /srv/app/nagios-3.3.1/etc/*/*
susie112:~ #
susie112:~ # /srv/app/nagios-3.3.1/bin/nagios -v /srv/app/nagios-3.3.1/etc/nagios.cfg
Nagios Core 3.3.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 07-25-2011
License: GPL
Website: http://www.nagios.org
Reading configuration data...
Read main config file okay...
Processing object config file '/srv/app/nagios/etc/objects/commands.cfg'...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
We edit this configuration to disable all notifications for our first start of Nagios running the new production configuration. This way we avoid potentially sending out a large amount of false alerts to a lot of people in case something goes wrong.
The way I handle this is by separating out all notification commands into a dedicated notification.cfg file. A separate file called no-notification.cfg has identical notification definitions, but the command behind all is a shell sleep. By switching between notification.cfg to no-notification.cfg I can quickly turn all notifications on or off.
susie112:~ # sed 's/^cfg_file=\/srv\/app\/nagios-3.3.1\/etc\/objects\/notification.cfg/cfg_file=\/srv\/app\/nagios-3.3.1\/etc\/objects\/no-notification.cfg/' -i /srv/app/nagios-3.3.1/etc/nagios.cfg
susie112:~ # grep cfg_file=/srv/app/nagios-3.3.1/etc/objects/no-notification.cfg /srv/app/nagios-3.3.1/etc/nagios.cfgi
cfg_file=/srv/app/nagios-3.3.1/etc/objects/no-notification.cfg
Nagios State Synchronisation
Now we synchronise all state information from the existing Nagios instance into our new version.
susie112:~ # /etc/init.d/nagios stop
Stopping nagios: done.
susie112:~ # ps -ef |grep nagios
root 1202 344 0 08:31 pts/1 00:00:00 vi nagios-upgrade-procedure-golive.htm
root 1805 360 0 08:41 pts/2 00:00:00 grep nagios
susie112:~ # mv /srv/app/nagios-3.3.1/var /srv/app/nagios-3.3.1/var.test
susie112:~ # rsync -uvrogp /srv/app/nagios-3.2.3/var /srv/app/nagios-3.3.1/
sending incremental file list
var/nagios.log
var/objects.cache
var/retention.dat
var/archives/nagios-01-01-2011-00.log
var/archives/nagios-01-02-2011-00.log
...
var/archives/nagios-12-30-2010-00.log
var/archives/nagios-12-31-2010-00.log
sent 20970854 bytes received 5582 bytes 13984290.67 bytes/sec
total size is 20949097 speedup is 1.00
susie112:~ #
Production Cut-over
susie112:~ # ps -ef |grep nagios
root 2281 360 0 08:54 pts/2 00:00:00 grep nagios
susie112:~ # /etc/init.d/nagios start
Starting nagios: done.
susie112:/home/fm # tail /srv/app/nagios-3.3.1/var/nagios.log
[1313279645] Caught SIGTERM, shutting down...
[1313279645] Successfully shutdown... (PID=1919)
[1313279661] Nagios 3.3.1 starting... (PID=2323)
[1313279661] Local time is Sun Aug 14 08:54:21 JST 2011
[1313279661] LOG VERSION: 2.0
[1313279661] Finished daemonizing... (New PID=2324)
susie112:~ # ps -ef |grep nagios
nagios 2324 1 0 08:54 ? 00:00:00 /srv/app/nagios/bin/nagios -d /srv/app/nagios/etc/nagios.cfg
root 2372 360 0 08:55 pts/2 00:00:00 grep nagios
Fallback Procedure
susie112:~ # /etc/init.d/nagios stop
susie112:~ # rm /srv/app/nagios
susie112:~ # ln -s /srv/app/nagios-3.2.3 /srv/app/nagios
susie112:~ # rsync -uvrogp /srv/app/nagios-3.3.1/etc /srv/app/nagios-3.2.3
susie112:~ # rsync -uvrogp /srv/app/nagios-3.3.1/var /srv/app/nagios-3.2.3
susie112:~ # mv /srv/www/std-root/nagios.frank4dd.com/nagios /srv/www/std-root/nagios.frank4dd.com/nagios-3.3.1
susie112:~ # mv /srv/www/std-root/nagios.frank4dd.com/nagios-3.2.3 /srv/www/std-root/nagios.frank4dd.com/nagios
susie112:~ # /etc/init.d/nagios start
Post-Migration Tasks
After we monitored Nagios works properly and there are no check failures, we can re-enable the notifications.
susie112:~ # sed 's/^cfg_file=\/srv\/app\/nagios-3.3.1\/etc\/objects\/no-notification.cfg/cfg_file=\/srv\/app\/nagios-3.3.1\/etc\/objects\/notification.cfg/' -i /srv/app/nagios-3.3.1/etc/nagios.cfg
susie112:~ # grep notification.cfg /srv/app/nagios-3.3.1/etc/nagios.cfg
cfg_file=/srv/app/nagios-3.3.1/etc/objects/notification.cfg
susie112:~ # /etc/init.d/nagios reload
Finally, after our cool-off period (the time we assume sufficient until we don't need to fall back anymore), we can archive and remove the old Nagios instance.
This concludes the last stage of the upgrade. A command scratchpad for copy-and-paste is here. This site has a live view of Nagios 3.3.1.