Introduction


Nagiosgraph collects performance measurement data from the Nagios monitoring system and stores them into RRD databases. This data can be displayed as graph images to visualize trends and compared against historic values.

The information in this page is valid for Nagiosgraph version 0.9.1. I have been running this version since 2008 without any issues in medium-large environments (>1.000 hosts, >4.000 service checks).

Design


Nagiosgraph is a interface program between Nagios and RRD data files. Simplicity comes from three factors; it doesn't do much, behavior is programmed rather than configurable, and new Nagios data is detected and included automatically.

Nagiosgraph has two functions:

  1. Collect performance data provided by Nagios
  2. Create and display performance data graph images

Nagios can be configured to write the service check's additional performance data to an external location for further processing. The Nagiosgraph script insert.pl collects this performance data, parses it against a configuration file called map, and creates or updates the RRD database files as necessary. Typically, there is one RRD database file per service check and host. For a minimum of file organisation, all checks belonging to one host are stored in a directory named after its Nagios host name. This works fine and has no issues even in larger configurations with 1.500+ hosts.
Nagiosgraph will automatically detect when new hosts or services have been added in Nagios. An configuration update is only needed when Nagios gets a completely new type of service that does not have a existing Nagiosgraph map entry.

Performance graphs are integrated into Nagios through so-called have extended service information pages. A icon and link is added to each service check page in Nagios that opens up the matching performance graph image page. The Nagiosgraph script show.cgi provides the necessary functionality.

Basic Installation - Nagiosgraph Side


  1. Check required packages: Perl, CGI, and RRDtool. We assume Nagios is already installed and working.
    susie114:/tmp # rpm -q -a |grep -E 'CGI|rrd'
    perl-CGI-Application-4.31-16.1.2.i586
    rrdtool-1.4.5-10.1.4.i586
  2. Download the Nagiosgraph package nagiosgraph-fixed-v0.9.1.tar.gz:
    susie114:/tmp # wget http://nagios.fm4dd.com/nagiosgraph/source/nagiosgraph-fixed-v0.9.1.tar.gz
    Resolving nagios.fm4dd.com (nagios.fm4dd.com)... 70.85.16.97
    Connecting to nagios.fm4dd.com (nagios.fm4dd.com)|70.85.16.97|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 20492 (20K) [application/x-gzip]
    Saving to: `nagiosgraph-fixed-v0.9.1.tar.gz'
    
    100%[==================================================>] 20,492      --.-K/s   in 0s      
    
    2012-05-12 18:47:46 (141 MB/s) - `nagiosgraph-fixed-v0.9.1.tar.gz' saved [20492/20492]
    
    susie114:/tmp # 
  3. Extract Nagiosgraph and place it into the filesystem. In my example, I put it into /srv/app/.
  4. susie114:/tmp # tar xfvz nagiosgraph-fixed-v0.9.1.tar.gz 
    nagiosgraph-0.9.1/
    nagiosgraph-0.9.1/bin/
    nagiosgraph-0.9.1/bin/testentry.pl
    nagiosgraph-0.9.1/bin/insert.pl
    nagiosgraph-0.9.1/etc/
    nagiosgraph-0.9.1/etc/map
    nagiosgraph-0.9.1/etc/nagiosgraph.conf
    nagiosgraph-0.9.1/log/
    nagiosgraph-0.9.1/log/host-perfdata.log
    nagiosgraph-0.9.1/log/nagiosgraph.log
    nagiosgraph-0.9.1/log/service-perfdata.log
    nagiosgraph-0.9.1/cgi-bin/
    nagiosgraph-0.9.1/cgi-bin/show.cgi
    nagiosgraph-0.9.1/cgi-bin/testcolor.cgi
    nagiosgraph-0.9.1/docs/
    nagiosgraph-0.9.1/docs/README
    nagiosgraph-0.9.1/docs/README.map
    nagiosgraph-0.9.1/docs/CHANGELOG
    nagiosgraph-0.9.1/docs/INSTALL
    
    susie114:/tmp # mv nagiosgraph-0.9.1 /srv/app
  5. Set file access rights:

    • For the "rrddir" location: The Nagios user (nagios) must be able to write and the Apache user (wwwrun) must be able to read from there.
    • For the "logfile" location: Both Nagios and the Apache user (nagios, wwwrun) must be able to write to it
    susie114:/tmp # chown -R nagios:nagios /srv/app/nagiosgraph-0.9.1
    susie114:/tmp # chmod 6775 /srv/app/nagiosgraph-0.9.1/log
    susie114:/tmp # chown wwwrun /srv/app/nagiosgraph-0.9.1/log/nagiosgraph.log
    susie114:/tmp # cd /srv/app/nagiosgraph-0.9.1
  6. Verify the Nagiosgraph file structure:
    susie114:/srv/app/nagiosgraph-0.9.1 # ls -lR
    .:
    total 20
    drwxr-xr-x 2 nagios nagios 4096 May  8 14:07 bin
    drwxr-xr-x 2 nagios nagios 4096 May  8 13:52 etc
    drwxrwsr-x 2 nagios nagios 4096 May 12 17:22 log
    drwxr-xr-x 6 nagios nagios 4096 Sep 10  2011 rrd
    
    ./bin:
    total 32
    -rwxr-xr-x 1 nagios nagios  6699 May  8 13:40 insert.pl
    -rwxr-xr-x 1 nagios nagios   974 Dec 12  2008 testentry.pl
    
    ./cgi-bin:
    total 28
    -rwxr-xr-x 1 nagios nagios 21583 Mar 24  2009 show.cgi
    -rwxr-xr-x 1 nagios nagios  3646 Dec 12  2008 testcolor.cgi
    
    ./etc:
    total 44
    -rw-r--r-- 1 nagios nagios 37969 Oct 28  2011 map
    -rw-r--r-- 1 nagios nagios  1640 May  8 13:52 nagiosgraph.conf
    
    ./log:
    total 8
    -rw-rw-r-- 1 nagios nagios 0 May 12 17:22 host-perfdata.log
    -rw-r--r-- 1 wwwrun nagios 0 May  8 14:02 nagiosgraph.log
    -rw-rw-r-- 1 nagios nagios 0 May 12 17:22 service-perfdata.log
    
    ./rrd:
    total 16
    drwxrwxr-x 2 nagios nagios 4096 May  8 13:53 susie114
    
    ./rrd/susie114:
    total 908
    -rw-rw-r-- 1 nagios nagios 47624 May 12 17:20 check%2Dhost%2Dalive___ping.rrd
  7. Set correct paths, debug level etc. in nagiosgraph.conf.
  8. Update insert.pl and show.cgi scripts: Set the line "my $configfile = '...' to point to the full path of the nagiosgraph.conf file.
  9. Configure Apache to point to show.cgi. For example:
    susie114:/tmp # vi /etc/apache2/vhosts/nagios.fm4dd.com
    ...
         ScriptAlias /nagiosgraph/ /srv/app/nagiosgraph/cgi-bin
    ...

    Consider security by configuring Apache access control and/or authentication rules. Another easy way to add show.cgi is to simply copy it into the existing Nagios 'cgi-bin' location.

  10. Copy a icon image like this (approx. 40x40 pixels) into /<nagioshome>/share/images/logos/notes.gif for Nagios to link to graphs.
  11. Copy nagiosgraph.css to /<nagioshome>/share/stylesheets/. Optionally, merge it with your own site styling for a consistent look and feel.

Configuration - Nagios Side


Nagios must be configured to send performance data in append mode to a log file. Then it needs to invoke insert.pl at regular intervals to retrieve and parse lines from that file in order to update the RRD database files. The Nagios log file format must be set to match Nagiosgraph expectations.

  1. Set and verify the following parameters in nagios.cfg:
    susie114:/tmp # vi /srv/app/nagios/etc/nagios.cfg
    ...
    # This option enables host performance data to be processed using the
    # host_perfdata_command and service performance data will be processed
    # using the service_perfdata_command.
    
    process_performance_data=1
    
    # HOST AND SERVICE PERFORMANCE DATA FILES
    # These files are used to store host and service performance data.
    # I am using the same file for both host and service perfdata.
    
    host_perfdata_file=/srv/app/nagiosgraph/log/service-perfdata.log
    service_perfdata_file=/srv/app/nagiosgraph/log/service-perfdata.log
    
    # HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES
    # Here I define the format required for Nagiosgraph
    
    host_perfdata_file_template=$LASTHOSTCHECK$||$HOSTNAME$||check-host-alive||$HOSTOUTPUT$||$HOSTPERFDATA$
    service_perfdata_file_template=$LASTSERVICECHECK$||$HOSTNAME$||$SERVICEDESC$||$SERVICEOUTPUT$||$SERVICEPERFDATA$
    
    # HOST AND SERVICE PERFORMANCE DATA FILE MODES
    # This option determines whether or not the host and service
    # performance data files are opened in write ("w") or append ("a")
    
    service_perfdata_file_mode=a
    
    # HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL
    # These options determine how often (in seconds) the host and service
    # performance data files are processed (written to).
    
    service_perfdata_file_processing_interval=30
    
    # HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS
    # These commands are used to periodically process the host and
    # service performance data files. I am using the same for both
    # host and service perfdata processing.
    
    service_perfdata_file_processing_command=process-service-perfdata
    ...
    Make sure that the location of service_perfdata_file matches the 'perflog' setting defined in nagiosgraph.conf.
  2. Add a service perfdata processing command statement in commands.cfg:
    susie114:/tmp # vi /srv/app/nagios/etc/objects/command.cfg
    ...
         define command {
           command_name  process-service-perfdata
           command_line  /srv/app/nagiosgraph/bin/insert.pl
         }
    ...
  3. After setting a default for the graph URL, start creating hostextinfo and serviceextinfo definitions:
    susie114:/tmp # vi /srv/app/nagios/etc/objects/templates.cfg
    ...
    ###############################################################################
    # Host definition template for nagiosgraph - This is NOT a real service, just a template!
    ###############################################################################
    define hostextinfo {
      name            basic
      notes_url       /nagios/cgi-bin/show.cgi?host=$HOSTNAME$&service=check-host-alive&geom=634x80
      register        0
    }
    ###############################################################################
    # Service definition template for nagiosgraph - This is NOT a real service, just a template!
    ###############################################################################
    define serviceextinfo {
      name            basic
      notes_url       /nagios/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&geom=634x80
      register        0
    }

    The following options can be added to the notes_url parameter for control over the graph image page:

    • Add the 'geom' option (e.g. &geom=350x100) to the 'notes_url' line for custom sizes of graphs.
    • Add the 'rrdopts' option (e.g. &rrdopts=%2Dl%200%20%2Du%20100 (meaning: "-l 0 -u 100")) to the 'notes_url' line for custom Y axis ranges. Any rrdgraph options can be specified, but they have to be url-encoded.
    • Add the 'fixedscale' option to set the Y-axis to be in the same units as the supplied perf data. This will also set the legends to have identical units.

    Example for enabling performance data in host checks. Below is a typical configuration file I am using:

    susie114:/tmp # vi /srv/app/nagios/etc/objects/linux-servers.cfg
    ###############################################################################
    # HOST GROUP DEFINITION linux servers
    ###############################################################################
    define hostgroup{
      hostgroup_name        linux-servers ; The name of the hostgroup
      alias                 Linux Servers ; Long name of the group
    }
    define hostextinfo {
            hostgroup_name          linux-servers
            use                     basic
    }
    ###############################################################################
    # Linux host definition template - This is NOT a real host, just a template!
    ###############################################################################
    define host{
      name                  linux-server    ; The name of this host template
      use                   generic-host    ; Inherit values from the generic-host template
      check_period          24x7            ; By default, check Linux hosts round the clock
      check_interval        5               ; Actively check the host every 5 minutes
      retry_interval        1               ; Retry host checks in 1 minute intervals
      max_check_attempts    10              ; Check each Linux host 10 times (max)
      check_command         check-host-alive; Default command to check Linux hosts
      notification_period   24x7            ; always notify
      notification_interval 120             ; Resend notifications every 2 hours
      notification_options  s,d,u,r         ; Only send messages for specific host states
      contact_groups        linux-admins    ; Notifications get sent to the admins by default
      hostgroups            linux-servers,1-all-servers   ; Host groups for Linux servers
      icon_image            suse-logo.png   ; the default image for the device
      statusmap_image       suse-logo.gd2   ; the default image for the statusmap display
      register              0               ; DONT REGISTER THIS DEFINITION, IT'S A TEMPLATE!
    }
    ###############################################################################
    # HOST DEFINITIONS
    ###############################################################################
    define host{
      use                   linux-server  ; Inherit default values from a template
      host_name             susie
      alias                 susie.fm4dd.com
      address               127.0.0.1
    }
    ...

    Example for enabling performance data in service checks. Here I am going to set the extinfo to a single host, rather then a group:

    susie114:/tmp # vi /srv/app/nagios/etc/objects/website-checks.cfg
    ###############################################################################
    # Define a servicegroup for web service checks
    # web service checks will be a member of this group
    ###############################################################################
    define servicegroup{
      servicegroup_name        website-checks ; The name of the hostgroup
      alias                    Web Site Checks ; Long name of the group
    }
    ###############################################################################
    # Define the environment check template service
    ###############################################################################
    define service{
      name                          generic-website
      active_checks_enabled         1
      passive_checks_enabled        1
      parallelize_check             1
      obsess_over_service           1
      check_freshness               0
      notifications_enabled         1
      event_handler_enabled         1
      flap_detection_enabled        1
      failure_prediction_enabled    1
      process_perf_data             1
      retain_status_information     1
      retain_nonstatus_information  1
      is_volatile                   0
      check_period                  24x7
      max_check_attempts            4
      normal_check_interval         5
      retry_check_interval          1
      contact_groups                frankonly
      notification_options          c,r
      notification_interval         120
      notification_period           24x7
      register                      0
      servicegroups                 website-checks
    }
    ###############################################################################
    # Check web access to susie114
    ###############################################################################
    define service{
      use                           generic-website
      host_name                     susie114
      service_description           website-check
      check_command                 check_http
    }
    define serviceextinfo {
      service_description           website-check
      host_name                     susie114
      use                           basic
    }
    ...
  4. Now let's re-load Nagios, and the following checks help to determine if all is well:

    • check if the service performance is written into the new performance data logfile.
    • check if the Nagiosgraph URL show.cgi is accessible in general
    • check if the first RRD's are generated
    • Wait 20 mins and check if graphs start to be visible at the Nagiosgraph URL
    • check if the Nagiosgraph URL to show.cgi is correctly embedded within Nagios
    • check Nagiosgraph log file, rotate it or decrease the debug level to manage its size

Adding Nagios service configuration types to Nagiosgraph


In order to add new Nagios service types, we need edit the map file. This file contains regular expressions to identify service types, and to define how to store data in RRD files.

The map configuration file is perl code containing regular expressions. There is one entry per Nagios service type, and these will be eval'ed by insert.pl and show.cgi. Several examples of servicechecks are included in the distributed 'map' file, but generally it becomes necessary to make modifications or add entries to match the output of the particular nagiosplugins in use. Knowing perl is helpful when making modifications, but the examples supplied should cover most types of performance data.

By default all available data for a servicecheck will be displayed in the same graph. With extra configuration, embedded in the url, it's possible to display less data or to split values into multiple graphs. There is also a general method for specifying any rrd graph options.

How it works


When the script insert.pl picks up the data, it formats it for using it with the map file by creating one string consisting of three lines of text. This string might look like this:

servicedescr:ping
output:PING OK - Packet loss = 0%, RTA = 0.00 ms
perfdata:
Or like this:
servicedescr:CPU Load 
output:OK - load average: 0.06, 0.12, 0.10
perfdata:load1=0;15;30;0 load5=0;10;25;0 load15=0;5;20;0

In some plugins, perfdata is not available. Since Nagios 3.3.1, performance data is required in order to write it into the external logfile. Nagiosgraph can create graphs from either of the output data or performance data lines.

For the ping example above, data can be extracted from the output line with a regular expression like this:

/output:PING.*?(\d+)%.+?([.\d]+)\sms/

In this case, two values are extracted and become available in $1 and $2. We can use them to create a data structure describing the content of the database. The general format is:

[ DB-name, [ DS-name, TYPE, DS-value ], [ DS-name, TYPE, DS-value ], ... ]
ParameterExplanation
DB-name Service check name that will show as a legend on RRD graphs
DS-name Name that will be assigned to a data set (line), showing on RRD graphs
TYPE either GAUGE or DERIVE, see RRDtool for more details
DS-value the data extracted in the regular expression. The DS value itself can be an expression, i.e. to normalize to SI units

Finally, each database definition must be added to the @s array for returning to insert.pl's code. Here is a complete code example for the PING example above:

/output:PING.*?(\d+)%.+?([.\d]+)\sms/
and push @s, [ ping,
               [ losspct, GAUGE, $1      ],
               [ rta,     GAUGE, $2/1000 ] ];

In this case the database name is called 'ping' and the DS-names stored are losspct and rta. The Nagios output reports round trip time in milliseconds, so the value is multiplied by 1000 to convert to seconds. Both DS types are set as GAUGE.

Be careful about the database names and DS names. In the code example above the names are barewords, which only works as long as the don't conflict with perl functions or subroutines. For example the word 'sleep' will not work without quoting. here is a safer version of the above example, using single quotes:

/output:PING.*?(\d+)%.+?([.\d]+)\sms/
and push @s, [ 'ping',
               [ 'losspct', 'GAUGE', $1      ],
               [ 'rta',     'GAUGE', $2/1000 ] ];

Caution: map files can grow large and complex. If there is a single syntax error, nothing will be inserted into the RRD files until the map file is fixed. It is best not to edit production map files directly, and to always check their syntax with perl -c map before making them active. A simple example on how to handle them is below. Version control through rcs or similar is also a good idea.

susie114:/tmp #  cd /src/app/nagiosgraph/etc
susie114:/src/app/nagiosgraph/etc # cp map map.new
susie114:/src/app/nagiosgraph/etc # vi map.new
susie114:/src/app/nagiosgraph/etc # perl -c map.new
susie114:/src/app/nagiosgraph/etc # mv map.new map

Conclusion


If the instructions above sound complicated, remember that Nagiosgraph is one of the easiest and simplest solutions. If you decide to use it, consider contributing by sharing your work and experience. For example, if you have a good map file entry for standard Nagios plugins, then please post it on the forum, or send it to me.

Topics:

More Information: