Thursday 11 February 2010

Getting nagged by Nagios

Nagios is an entirely usable service monitoring system - I'm aware of at least three implementations within the University Computing Service alone. There are some aspects of its design (or, I suspect, its evolution) that I don't particularly like, but all in all it's much, much better than nothing.

An important feature is its powerful and capable configuration system. As usual this is a two edged sword because you have to understand the resulting complexity to take advantage of the power. I have two things to offer that might help: some general configuration guidelines, and a diagram.

Guidelines
  • Keep the number and size of edits to the distribution-supplied config files to a minimum. 
  • Arrange that related names and aliases share a common prefix (e.g. foo in what follows) so that they sort together (host names, which are most usefully based on DNS names, can be an exception to this)
  • Keep to a minimum the number of distinct names created (e.g. use foo as both a contact and a contact group name and so don't create foo-contactgroup
  • Note that, unlike most other 'names', service_description isn't a global name and only needs to be unique within the set of hosts that provide it. It doesn't need and shouldn't have a foo prefix, and should be a short, human-readable, description of the service
  • Keep names (especially those commonly displayed in the web apps) as short as reasonably possible
  • Express group membership by listing group name in the individual objects, NOT by listing the individual objects in the group definition (or if you want, the other way around but be consistent!)
  • Use inheritance wherever possible to avoid replicating common settings
  • Use a naming convention so that separate groups of people can create global names without clashing
  • Store information belonging to each group of people in a predictable location (e.g. always put host information in files or directories starting "host") to make navigation easier
  • Optimise the web-based display 
The UCS's Unix Support group has further developed these guidelines into a configuration standard that helps them monitor large numbers of services with minimal configuration work and with several different groups of people maintaining the configuration. Part of the key to this is putting the configuration for each separate group of services into separate sub-directories and incorporating this information into the main nagios.cfg with 'cfg_dir'.

Object relationships

Nagios's Template-Based Object Configuration is one of its most powerful features but it's difficult to get you head around it when starting out. Here's a diagram that might help - it shows most of the relationships between the various objects:

1 comment: