Introduction
Nagiosgraph collects performance measurement data from the Nagios monitoring system and stores them into RRD databases. This data can be displayed as graph images to visualize trends and compared against historic values.
- Original Author: (c) Soren Dossing, 2005
- License: OSI Artistic License
The information in this page is valid for Nagiosgraph version 0.9.1. I have been running this version since 2008 without any issues in medium-large environments (>1.000 hosts, >4.000 service checks).
Design
Nagiosgraph is a interface program between Nagios and RRD data files. Simplicity comes from three factors; it doesn't do much, behavior is programmed rather than configurable, and new Nagios data is detected and included automatically.
Nagiosgraph has two functions:
- Collect performance data provided by Nagios
- Create and display performance data graph images
Nagios can be configured to write the service check's additional performance data to an external location for further processing. The Nagiosgraph script insert.pl collects this performance data, parses it against a configuration file called map, and creates or updates the RRD database files as necessary. Typically, there is one RRD database file per service check and host. For a minimum of file organisation, all checks belonging to one host are stored in a directory named after its Nagios host name. This works fine and has no issues even in larger configurations with 1.500+ hosts.
Nagiosgraph will automatically detect when new hosts or services have been added in Nagios. An configuration update is only needed when Nagios gets a completely new type of service that does not have a existing Nagiosgraph map entry.
Performance graphs are integrated into Nagios through so-called have extended service information pages. A icon and link is added to each service check page in Nagios that opens up the matching performance graph image page. The Nagiosgraph script show.cgi provides the necessary functionality.
Basic Installation - Nagiosgraph Side
- Check required packages: Perl, CGI, and RRDtool. We assume Nagios is already installed and working.
susie114:/tmp # rpm -q -a |grep -E 'CGI|rrd' perl-CGI-Application-4.31-16.1.2.i586 rrdtool-1.4.5-10.1.4.i586
- Download the Nagiosgraph package nagiosgraph-fixed-v0.9.1.tar.gz:
susie114:/tmp # wget http://nagios.fm4dd.com/nagiosgraph/source/nagiosgraph-fixed-v0.9.1.tar.gz Resolving nagios.fm4dd.com (nagios.fm4dd.com)... 70.85.16.97 Connecting to nagios.fm4dd.com (nagios.fm4dd.com)|70.85.16.97|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 20492 (20K) [application/x-gzip] Saving to: `nagiosgraph-fixed-v0.9.1.tar.gz' 100%[==================================================>] 20,492 --.-K/s in 0s 2012-05-12 18:47:46 (141 MB/s) - `nagiosgraph-fixed-v0.9.1.tar.gz' saved [20492/20492] susie114:/tmp #
- Extract Nagiosgraph and place it into the filesystem. In my example, I put it into /srv/app/.
- Set file access rights:
- For the "rrddir" location: The Nagios user (nagios) must be able to write and the Apache user (wwwrun) must be able to read from there.
- For the "logfile" location: Both Nagios and the Apache user (nagios, wwwrun) must be able to write to it
susie114:/tmp # chown -R nagios:nagios /srv/app/nagiosgraph-0.9.1 susie114:/tmp # chmod 6775 /srv/app/nagiosgraph-0.9.1/log susie114:/tmp # chown wwwrun /srv/app/nagiosgraph-0.9.1/log/nagiosgraph.log susie114:/tmp # cd /srv/app/nagiosgraph-0.9.1
- Verify the Nagiosgraph file structure:
susie114:/srv/app/nagiosgraph-0.9.1 # ls -lR .: total 20 drwxr-xr-x 2 nagios nagios 4096 May 8 14:07 bin drwxr-xr-x 2 nagios nagios 4096 May 8 13:52 etc drwxrwsr-x 2 nagios nagios 4096 May 12 17:22 log drwxr-xr-x 6 nagios nagios 4096 Sep 10 2011 rrd ./bin: total 32 -rwxr-xr-x 1 nagios nagios 6699 May 8 13:40 insert.pl -rwxr-xr-x 1 nagios nagios 974 Dec 12 2008 testentry.pl ./cgi-bin: total 28 -rwxr-xr-x 1 nagios nagios 21583 Mar 24 2009 show.cgi -rwxr-xr-x 1 nagios nagios 3646 Dec 12 2008 testcolor.cgi ./etc: total 44 -rw-r--r-- 1 nagios nagios 37969 Oct 28 2011 map -rw-r--r-- 1 nagios nagios 1640 May 8 13:52 nagiosgraph.conf ./log: total 8 -rw-rw-r-- 1 nagios nagios 0 May 12 17:22 host-perfdata.log -rw-r--r-- 1 wwwrun nagios 0 May 8 14:02 nagiosgraph.log -rw-rw-r-- 1 nagios nagios 0 May 12 17:22 service-perfdata.log ./rrd: total 16 drwxrwxr-x 2 nagios nagios 4096 May 8 13:53 susie114 ./rrd/susie114: total 908 -rw-rw-r-- 1 nagios nagios 47624 May 12 17:20 check%2Dhost%2Dalive___ping.rrd
- Set correct paths, debug level etc. in nagiosgraph.conf.
- Update insert.pl and show.cgi scripts: Set the line "my $configfile = '...' to point to the full path of the nagiosgraph.conf file.
- Configure Apache to point to show.cgi. For example:
susie114:/tmp # vi /etc/apache2/vhosts/nagios.fm4dd.com ... ScriptAlias /nagiosgraph/ /srv/app/nagiosgraph/cgi-bin ...
Consider security by configuring Apache access control and/or authentication rules. Another easy way to add show.cgi is to simply copy it into the existing Nagios 'cgi-bin' location.
- Copy a icon image like this
(approx. 40x40 pixels) into /<nagioshome>/share/images/logos/notes.gif for Nagios to link to graphs.
- Copy nagiosgraph.css to /<nagioshome>/share/stylesheets/. Optionally, merge it with your own site styling for a consistent look and feel.
susie114:/tmp # tar xfvz nagiosgraph-fixed-v0.9.1.tar.gz nagiosgraph-0.9.1/ nagiosgraph-0.9.1/bin/ nagiosgraph-0.9.1/bin/testentry.pl nagiosgraph-0.9.1/bin/insert.pl nagiosgraph-0.9.1/etc/ nagiosgraph-0.9.1/etc/map nagiosgraph-0.9.1/etc/nagiosgraph.conf nagiosgraph-0.9.1/log/ nagiosgraph-0.9.1/log/host-perfdata.log nagiosgraph-0.9.1/log/nagiosgraph.log nagiosgraph-0.9.1/log/service-perfdata.log nagiosgraph-0.9.1/cgi-bin/ nagiosgraph-0.9.1/cgi-bin/show.cgi nagiosgraph-0.9.1/cgi-bin/testcolor.cgi nagiosgraph-0.9.1/docs/ nagiosgraph-0.9.1/docs/README nagiosgraph-0.9.1/docs/README.map nagiosgraph-0.9.1/docs/CHANGELOG nagiosgraph-0.9.1/docs/INSTALL susie114:/tmp # mv nagiosgraph-0.9.1 /srv/app
Configuration - Nagios Side
Nagios must be configured to send performance data in append mode to a log file. Then it needs to invoke insert.pl at regular intervals to retrieve and parse lines from that file in order to update the RRD database files. The Nagios log file format must be set to match Nagiosgraph expectations.
- Set and verify the following parameters in nagios.cfg:
Make sure that the location of service_perfdata_file matches the 'perflog' setting defined in nagiosgraph.conf.susie114:/tmp # vi /srv/app/nagios/etc/nagios.cfg ... # This option enables host performance data to be processed using the # host_perfdata_command and service performance data will be processed # using the service_perfdata_command. process_performance_data=1 # HOST AND SERVICE PERFORMANCE DATA FILES # These files are used to store host and service performance data. # I am using the same file for both host and service perfdata. host_perfdata_file=/srv/app/nagiosgraph/log/service-perfdata.log service_perfdata_file=/srv/app/nagiosgraph/log/service-perfdata.log # HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES # Here I define the format required for Nagiosgraph host_perfdata_file_template=$LASTHOSTCHECK$||$HOSTNAME$||check-host-alive||$HOSTOUTPUT$||$HOSTPERFDATA$ service_perfdata_file_template=$LASTSERVICECHECK$||$HOSTNAME$||$SERVICEDESC$||$SERVICEOUTPUT$||$SERVICEPERFDATA$ # HOST AND SERVICE PERFORMANCE DATA FILE MODES # This option determines whether or not the host and service # performance data files are opened in write ("w") or append ("a") service_perfdata_file_mode=a # HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL # These options determine how often (in seconds) the host and service # performance data files are processed (written to). service_perfdata_file_processing_interval=30 # HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS # These commands are used to periodically process the host and # service performance data files. I am using the same for both # host and service perfdata processing. service_perfdata_file_processing_command=process-service-perfdata ...
- Add a service perfdata processing command statement in commands.cfg:
susie114:/tmp # vi /srv/app/nagios/etc/objects/command.cfg ... define command { command_name process-service-perfdata command_line /srv/app/nagiosgraph/bin/insert.pl } ...
- After setting a default for the graph URL, start creating hostextinfo and serviceextinfo definitions:
susie114:/tmp # vi /srv/app/nagios/etc/objects/templates.cfg ... ############################################################################### # Host definition template for nagiosgraph - This is NOT a real service, just a template! ############################################################################### define hostextinfo { name basic notes_url /nagios/cgi-bin/show.cgi?host=$HOSTNAME$&service=check-host-alive&geom=634x80 register 0 } ############################################################################### # Service definition template for nagiosgraph - This is NOT a real service, just a template! ############################################################################### define serviceextinfo { name basic notes_url /nagios/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&geom=634x80 register 0 }
The following options can be added to the notes_url parameter for control over the graph image page:
- Add the 'geom' option (e.g. &geom=350x100) to the 'notes_url' line for custom sizes of graphs.
- Add the 'rrdopts' option (e.g. &rrdopts=%2Dl%200%20%2Du%20100 (meaning: "-l 0 -u 100")) to the 'notes_url' line for custom Y axis ranges. Any rrdgraph options can be specified, but they have to be url-encoded.
- Add the 'fixedscale' option to set the Y-axis to be in the same units as the supplied perf data. This will also set the legends to have identical units.
Example for enabling performance data in host checks. Below is a typical configuration file I am using:
susie114:/tmp # vi /srv/app/nagios/etc/objects/linux-servers.cfg ############################################################################### # HOST GROUP DEFINITION linux servers ############################################################################### define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group } define hostextinfo { hostgroup_name linux-servers use basic } ############################################################################### # Linux host definition template - This is NOT a real host, just a template! ############################################################################### define host{ name linux-server ; The name of this host template use generic-host ; Inherit values from the generic-host template check_period 24x7 ; By default, check Linux hosts round the clock check_interval 5 ; Actively check the host every 5 minutes retry_interval 1 ; Retry host checks in 1 minute intervals max_check_attempts 10 ; Check each Linux host 10 times (max) check_command check-host-alive; Default command to check Linux hosts notification_period 24x7 ; always notify notification_interval 120 ; Resend notifications every 2 hours notification_options s,d,u,r ; Only send messages for specific host states contact_groups linux-admins ; Notifications get sent to the admins by default hostgroups linux-servers,1-all-servers ; Host groups for Linux servers icon_image suse-logo.png ; the default image for the device statusmap_image suse-logo.gd2 ; the default image for the statusmap display register 0 ; DONT REGISTER THIS DEFINITION, IT'S A TEMPLATE! } ############################################################################### # HOST DEFINITIONS ############################################################################### define host{ use linux-server ; Inherit default values from a template host_name susie alias susie.fm4dd.com address 127.0.0.1 } ...
Example for enabling performance data in service checks. Here I am going to set the extinfo to a single host, rather then a group:
susie114:/tmp # vi /srv/app/nagios/etc/objects/website-checks.cfg ############################################################################### # Define a servicegroup for web service checks # web service checks will be a member of this group ############################################################################### define servicegroup{ servicegroup_name website-checks ; The name of the hostgroup alias Web Site Checks ; Long name of the group } ############################################################################### # Define the environment check template service ############################################################################### define service{ name generic-website active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 4 normal_check_interval 5 retry_check_interval 1 contact_groups frankonly notification_options c,r notification_interval 120 notification_period 24x7 register 0 servicegroups website-checks } ############################################################################### # Check web access to susie114 ############################################################################### define service{ use generic-website host_name susie114 service_description website-check check_command check_http } define serviceextinfo { service_description website-check host_name susie114 use basic } ...
- Now let's re-load Nagios, and the following checks help to determine if all is well:
- check if the service performance is written into the new performance data logfile.
- check if the Nagiosgraph URL show.cgi is accessible in general
- check if the first RRD's are generated
- Wait 20 mins and check if graphs start to be visible at the Nagiosgraph URL
- check if the Nagiosgraph URL to show.cgi is correctly embedded within Nagios
- check Nagiosgraph log file, rotate it or decrease the debug level to manage its size
Adding Nagios service configuration types to Nagiosgraph
In order to add new Nagios service types, we need edit the map file. This file contains regular expressions to identify service types, and to define how to store data in RRD files.
The map configuration file is perl code containing regular expressions. There is one entry per Nagios service type, and these will be eval'ed by insert.pl and show.cgi. Several examples of servicechecks are included in the distributed 'map' file, but generally it becomes necessary to make modifications or add entries to match the output of the particular nagiosplugins in use. Knowing perl is helpful when making modifications, but the examples supplied should cover most types of performance data.
By default all available data for a servicecheck will be displayed in the same graph. With extra configuration, embedded in the url, it's possible to display less data or to split values into multiple graphs. There is also a general method for specifying any rrd graph options.How it works
When the script insert.pl picks up the data, it formats it for using it with the map file by creating one string consisting of three lines of text. This string might look like this:
servicedescr:ping output:PING OK - Packet loss = 0%, RTA = 0.00 ms perfdata:Or like this:
servicedescr:CPU Load output:OK - load average: 0.06, 0.12, 0.10 perfdata:load1=0;15;30;0 load5=0;10;25;0 load15=0;5;20;0
In some plugins, perfdata is not available. Since Nagios 3.3.1, performance data is required in order to write it into the external logfile. Nagiosgraph can create graphs from either of the output data or performance data lines.
For the ping example above, data can be extracted from the output line with a regular expression like this:
/output:PING.*?(\d+)%.+?([.\d]+)\sms/
In this case, two values are extracted and become available in $1 and $2. We can use them to create a data structure describing the content of the database. The general format is:
[ DB-name,
[ DS-name, TYPE, DS-value ],
[ DS-name, TYPE, DS-value ],
...
]
Parameter | Explanation |
---|---|
DB-name | Service check name that will show as a legend on RRD graphs |
DS-name | Name that will be assigned to a data set (line), showing on RRD graphs |
TYPE | either GAUGE or DERIVE, see RRDtool for more details |
DS-value | the data extracted in the regular expression. The DS value itself can be an expression, i.e. to normalize to SI units |
Finally, each database definition must be added to the @s array for returning to insert.pl's code. Here is a complete code example for the PING example above:
/output:PING.*?(\d+)%.+?([.\d]+)\sms/
and push @s, [ ping,
[ losspct, GAUGE, $1 ],
[ rta, GAUGE, $2/1000 ] ];
In this case the database name is called 'ping' and the DS-names stored are losspct and rta. The Nagios output reports round trip time in milliseconds, so the value is multiplied by 1000 to convert to seconds. Both DS types are set as GAUGE.
Be careful about the database names and DS names. In the code example above the names are barewords, which only works as long as the don't conflict with perl functions or subroutines. For example the word 'sleep' will not work without quoting. here is a safer version of the above example, using single quotes:
/output:PING.*?(\d+)%.+?([.\d]+)\sms/
and push @s, [ 'ping',
[ 'losspct', 'GAUGE', $1 ],
[ 'rta', 'GAUGE', $2/1000 ] ];
Caution: map files can grow large and complex. If there is a single syntax error, nothing will be inserted into the RRD files until the map file is fixed. It is best not to edit production map files directly, and to always check their syntax with perl -c map before making them active. A simple example on how to handle them is below. Version control through rcs or similar is also a good idea.
susie114:/tmp # cd /src/app/nagiosgraph/etc susie114:/src/app/nagiosgraph/etc # cp map map.new susie114:/src/app/nagiosgraph/etc # vi map.new susie114:/src/app/nagiosgraph/etc # perl -c map.new susie114:/src/app/nagiosgraph/etc # mv map.new map
Conclusion
If the instructions above sound complicated, remember that Nagiosgraph is one of the easiest and simplest solutions. If you decide to use it, consider contributing by sharing your work and experience. For example, if you have a good map file entry for standard Nagios plugins, then please post it on the forum, or send it to me.