Database Monitoring with Nagios

Frank4DD, @2008-2010

Introduction

Many modern business software applications use a Relational Database Management System (RDBMS) as their backend data store. Relational databases are optimized fast transactional operations and they allow to build comprehensive data views by querying organised data (data-warehouse). With these databases and their stored data being at the "heart" of applications, database "health" is critical. Typically, IT organisations have dedicated staff, DBA's, to look after them. To relieve them of routine tasks and improve on issue escalation and resolution, automated database monitoring with a tool like Nagios is a good idea.

History

Not so long ago, database vendors (i.e. Oracle, IBM) implemented database monitoring access with SNMP. By implementing a standardized network monitoring protocol together with publishing the monitoring indicator dictionary (MIB's), we had a ideal base to work with a vendor neutral monitoring system. Unfortunately, this method has been discontinued by all vendors, today everyone just implements monitoring by using their own database management system.

While this works from a single-vendor standpoint, the real IT world is different. We have a multitude of database vendors and the number of our databases is constantly growing. Some databases are small, embedded and self-maintaining - but most of them are not. As a result, monitoring is necessary but the dedicated, vendor-provided tools for database monitoring do not scale. There is a benefit of setting up a vendor neutral database monitoring: It's integration into a IT systems and infrastructure monitoring system gives a better understanding of dependencies and allows for a faster failure and root-cause analysis.

This situation led us to review database monitoring options and resulted in development of several database monitoring plugins for the widely popular monitoring system Nagios.

Monitoring database up/down

The most basic type of database monitoring is often implemented by using the Nagios-included plugin 'check_tcp' to verify if the database network port is up. Most databases are opening a TCP port on the network interface to provide access for applications or manual adhoc queries. Below is a list of major DB vendors and their default network ports:

Database	Network Port
MS SQL Server	TCP 1433
IBM DB2	TCP 50000
Oracle	TCP 1521
MySQL	TCP 3306

Although database access through network ports became the most common way of accessing a database, it is not the only way. If the application is hosted on the same system were the database resides, they might communicate through a method called Interprocess Communication (IPC), for example implemented as a "named pipe" file. By using IPC, the elimination of the network protocol stack can dramatically speed up database communications. However, this is an obstacle to our remote database monitoring and needs a local check with a monitoring agent such as NRPE.

The network port check has shortcomings. For example with Oracle, database network connectivity is provided by a separate listener process. When the network port check confirms that the listener is up, the database itself might be down. Also, with the network port check alone, there is no way to determine if a database has internal problems and if a login would work.

The next level of database monitoring is doing a real database login. This verifies not only if the database network port is up, but also that the particular database instance is running and login is possible. The Nagios-included default plugins check_oracle and check_mysql (nagios-plugins 1.4.14) can do such a check. While check_mysql is a binary that requires the mysqlclient installed (it's linked against libmysqlclient.so), check_oracle is a simple shell script that requires a Oracle client installed, at a minimum the Oracle instantclient libraries together with the sqlplus command. In order to make these checks work, we also need a database login account. It is a good idea to create a dedicated account for monitoring and to restrict it's rights within the database.

Database Plugins in JAVA

If you have a mix of database vendors and versions, scripting additional check plugins and installing dedicated database client software quickly becomes very cumbersome and space-consuming. Not to mention extremly hard if Nagios runs on a UNIX/Linux platform but the target DB is Microsoft's SQL server. It is much easier if we use JDBC-based database plugins that are written in Java. JDBC libraries are small, easy to deploy and available from all but the most exotic database vendors. Running a Java-based plugin is no different from plugins written in other languages, once the Java-typical way of using the classpath is observed.

Database	JDBC Install Howto
MS SQL Server	howto-install-Microsoft-jdbc.htm
IBM DB2	howto-install-IBMdb2-jdbc.htm
Oracle	howto-install-Oracle-jdbc.htm
MySQL	howto-install-MySQL-jdbc.htm

Database software version (patch) monitoring

If we write a JAVA plugin for database login, why not obtain other useful information from the database? We could obtain the database version string and use it for verification of prefered database software levels. Our database installations constantly grow and it became increasingly difficult to keep an eye on which database had vendor fixes applied and which hasn't. The following database plugins solve this issue. They check the login to a database, then return the version string and match it against a blacklist/whitelist version file (Example: check_dbversion.cfg) to identify prefered/obsolete/vulnerable or unknown database software versions.

Database	DB version monitoring plugin	plugin man page
MS SQL Server	check_dbversion_mssql.java	check_dbversion_mssql manual
IBM DB2	check_dbversion_db2.java	check_dbversion_db2 manual
Oracle	check_dbversion_oracle.java	check_dbversion_oracle manual
MySQL	check_dbversion_mysql.java	check_dbversion_mysql manual

There are differences how vendors implemented database version identification and patchlevels strings. Microsoft's SQL server has a exemplary detailed versioning, the version string can identify which individual patch has been applied. IBM's DB2 and Oracle's versioning can identify the servicepack, while MySQL only handles software versions and their upgrades.

Database Tablespace (growth) monitoring

The term "Tablespace" refers to the space which database objects such as tables, views and procedures can allocate. Each vendor has its own implementation on how to provide this space. Most often, this space is taken out from the operating systems underlying filesystem using encapsulating container files. Depending on the implementation, these files can have a fixed size or can grow. The growth is called extend and growth parameters can control when, how and by how much the growth happens. Most often this is simply set to automatic and files grow in size and/or numbers as long as the OS filesystem space permits. As a result, out-of-space alerting can only be done for the underlying filesystem if automatic growth is set. Let's have a quick review on how database space is handled between the major vendors:

Oracle implements database storage either as raw devices or as OS container files. Raw devices are becoming rare due to their disadvantages compared with OS container files. Raw devices are always fixed size, while OS container files can and often are set to automatic growth by using the 'auto-extend' function. OS container files can be easily backed up with the OS, while raw devices need special backup handling.

DB2 implements database storage either as database-managed space (DMS) using raw devices or as system-managed space (SMS) with fixed size OS container files. Typically, DB2 space is set up as SMS, which allocates additional space on demand by creating more container files in sequence when DB space runs short.

Microsoft's SQL server has no tablespace concept. It creates OS container files per database that can be placed individually on the file system. The database files are typically split into one file for indizes and and one file for the remaining database objects. The files itself can grow on demand. Therefore, tablespace monitoring is tricky, monitoring the underlying filesystem space is a better choice.

MySQL storage is controlled by storage engines that determine how space is allocated. The default storage engine MyISAM creates three files per table: a format (.frm), a data (.MYD) and a index (.MYI) file. For complex schemas, this creates a large number of files. The lack of a tablespace concept in MyISAM creates the same restrictions for monitoring as MS SQL Server above.

Database	DB tablespace monitoring plugin	plugin man page
MS SQL Server	check_tablespace_mssql.java	check_tablespace_mssql manual
IBM DB2	check_tablespace_db2.java	check_tablespace_db2 manual
Oracle	check_tablespace_oracle.java	check_tablespace_oracle manual
MySQL	check_tablespace_mysql.java	check_tablespace_mysql manual

This space monitoring should be enabled for graphing the longterm trend. Keeping an eye on the database size requirements over a period of time is a great help for capacity planning and data retention set up. Nagiosgraph is a good choice for this task.

Database health and performance monitoring

This is the most challenging form of database monitoring. Depending om the DB, it could mean installing additional database software packages or enabling performance data collection functions. There are several nagios plugins out, I found the best to be the ones written by ConSol* Labs, available for the big 4 database vendors and consistent in their design and use.

There are to many parameters to cover, some of the important ones are cache utilization, index usage and table locking, among many more.

Database	DB tablespace monitoring plugin	plugin man page
MS SQL Server	check_mssql_health-1.5.8.tar.gz	check_mssql_health manual
IBM DB2	check_db2_health-1.0.3.tar.gz	check_db2_health manual
Oracle	check_oracle_health-1.6.7.tar.gz	check_oracle_health manual
MySQL	check_mysql_health-2.1.5.tar.gz	check_mysql_health manual

The measured data is very specific, it often requires intimate knowledge of the databases to be able to interpret them correctly. I would be careful to enable this type of monitoring by default. My recommendation is to enable it for specific reasons, limited time and possibly on a separate instance of Nagios to analyze database performance of selected target instances only.

Database health monitoring should be enabled for graphing to identify spikes and trends. Nagiosgraph is a good choice to visualize dayabase health and performance information.

Credits, Links and additonal information

Nagios and the Nagios community can be found at http://www.nagios.org/
Further Nagios documentation is available here http://nagios.fm4dd.com/docs/en/
A live Nagios system showing real examples is here http://nagios.fm4dd.com/nagios/
Health monitoring plugins are written by ConSol* Labs http://labs.consol.de/

Database Monitoring with Nagios

A collection of database monitoring plugins and methods

Introduction

History

Monitoring database up/down

Database Plugins in JAVA

Database software version (patch) monitoring

Database Tablespace (growth) monitoring

Database health and performance monitoring

Credits, Links and additonal information

Topics:

More Information:

Database Monitoring with Nagios

A collection of database monitoring plugins and methods

Introduction

History

Monitoring database up/down

Database Login Monitoring

Database Plugins in JAVA

Database software version (patch) monitoring

Database Tablespace (growth) monitoring

Database health and performance monitoring

Credits, Links and additonal information

Topics:

More Information: