srijafavouriteblog: ABOUT NAGIOS

Introducing Nagios

This part shows you how to install Nagios and tie Ganglia back into it. We're going to add two features to Nagios that'll help your monitoring efforts in standard clusters, grids, clouds (or whatever your favorite buzzword is for scale-out computing). The two features are all about:

· Monitoring network switches

· Monitoring the resource manager

In this case, we'll be monitoring TORQUE. When we are finished, you'll have a framework to control the monitoring system of your entire data center.

Nagios, like Ganglia, is used heavily in HPC and other environments, but Nagios is more of an alerting mechanism that Ganglia (which is more focused on gathering and tracking metrics). Nagios previously only polled information from its target hosts, but has recently developed plug-ins that allow it to run agents on those hosts. Nagios has a built-in notification system.

Now let's install Nagios and set up a baseline monitoring system of an HPC Linux® cluster to address the three different monitoring perspectives:

· The application person can see how full the queues are and see available nodes for running jobs.

· The NOC can be alerted of system failures or see a shiny red error light on the Nagios Web interface. They also get notified via email if nodes go down or temperatures get too high.

· The system engineer can graph data, report on cluster utilization, and make decisions on future hardware acquisitions.

[Apr 21, 2009] check_openmanage freshmeat.net

check_openmanage is a plugin for Nagios that checks the hardware health of Dell PowerEdge and PowerVault servers. It uses the Dell OpenManage Server Administrator (OMSA) software to accomplish this task. check_openmanage can be used remotely with SNMP or locally with NRPE. The plugin checks the health of the storage subsystem, power supplies, memory modules, temperature probes, etc., and gives an alert if any of the components are faulty or operate outside normal parameters.

Changes: The --global option was added, which turns on checking of everything. If used with SNMP, the global system health status is also probed, to protect the user against bugs in the... plugin. If used with omreport, the overall chassis health is used. Support for SNMP version 3 was added. Checking of esmhealth was added, which checks the overall health of the ESM log, i.e. the fill grade. Alert log reporting was fixed to use the same format as for the ESM log. Output messages are now sorted by severity. Minor changes were made in how out-of-date controller firmware/driver is reported

(less)

[Dec 22, 2008] Nagiosgraph

Nagiosgraph is an add-on for Nagios. It collects service perfdata in RRD format, and displays the resulting graphs via CGI.

Using Nagios to Monitor Networks

Nagios is a powerful, modular network monitoring system that can be used to monitor many network services like smtp, http and dns on remote hosts. It also has support for snmp to allow you to check things like processor loads on routers and servers. I couldn't begin to cover all of the things that nagios can do in this article, so I'll just cover the basics to get you up and running.

apt-get install nagios-text

First we need to define people that will be notified, and define how they should be notified. In the example below, I define two users, joe and paul. Joe is the network guru and cares about routers and switches. Paul is the systems guy, and he cares about servers. Both will be notified via email and by pager. Note that if you are going to monitor your email server, you will want to use another notification method besides email. If your email server is down, you can't send anybody an email to notify them! :) In that case you will want to use a pager server to send a text message to a phone or pager, or set up a second nagios monitor that uses a different mail server to send email.

Edit /etc/nagios/contacts.cfg and add the following users:

define contact{

contact_name joe

alias Joe Blow

service_notification_period 24x7

host_notification_period 24x7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-by-email,notify-by-pager

host_notification_commands host-notify-by-email,host-notify-by-epager

email joe@yourdomain.com

pager 5555555@pager.yourdomain.com

}

define contact{

contact_name paul

alias Paul Shiznit

service_notification_period 24x7

host_notification_period 24x7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-by-email,notify-by-epager

host_notification_commands host-notify-by-email,host-notify-by-epager

email paul@yourdomain.com

pager 5556666@pager.yourdomain.com

}

Now add the users to groups.
In /etc/nagios/contactgroups.cfg add the following:

define contactgroup{

contactgroup_name router_admin

alias Network Administrators

members joe

}

define contactgroup{

contactgroup_name server_admin

alias Systems Administrators

members paul

}

You can add multiple members to a contact group by listing comma separated users.

Now to define some hosts to monitor. For my example, I define two machines, a mail server and a router.

Edit /etc/nagios/hosts.cfg and add:

define host{

use generic-host

host_name gw1.yourdomain.com

alias Gateway Router

address 10.0.0.1

check_command check-host-alive

max_check_attempts 20

notification_interval 240

notification_period 24x7

notification_options d,u,r

}

define host{

use generic-host

host_name mail.yourdomain.com

alias Mail Server

address 10.0.0.100

check_command check-host-alive

max_check_attempts 20

notification_interval 240

notification_period 24x7

notification_options d,u,r

}

Now we add the hosts to groups. I define groups called 'routers' and 'servers' and add the router and mail server respectively.

Edit /etc/nagios/hostgroups.cfg

define hostgroup{

hostgroup_name routers

alias Routers

contact_groups router_admin

members gw1.yourdomain.com

}

define hostgroup{

hostgroup_name servers

alias Servers

contact_groups server_admin

members mail.yourdomain.com

}

Again, for multiple members, just use a comma separated list of hosts.

Next define services to monitor on each of the hosts. Nagios has many built-in plugins for monitoring. On a debian sarge system, they are stored in /usr/lib/nagios/plugins. Here we want to monitor the smtp service on the mail server, and do ping checks on the router.

Edit /etc/nagios/services.cfg

define service{

use generic-service

host_name mail.yourdomain.com

service_description SMTP

is_volatile 0

check_period 24x7

max_check_attempts 3

normal_check_interval 5

retry_check_interval 1

contact_groups server_admin

notification_interval 240

notification_period 24x7

notification_options w,u,c,r

check_command check_smtp

}

define service{

use generic-service

host_name gw1.yourdomain.com

service_description PING

is_volatile 0

check_period 24x7

max_check_attempts 3

normal_check_interval 5

retry_check_interval 1

contact_groups router_admin

notification_interval 240

notification_period 24x7

notification_options w,u,c,r

check_command check_ping!100.0,20%!500.0,60%

}

And that's it. To test your configurations, you can run

nagios -v /etc/nagios/nagios.cfg

If all is well we can restart nagios and move on to the apache side to get a visual view of the monitor.

/etc/init.d/nagios restart

Assuming you have a working apache install, you can add the apache.conf file included in the nagios package to set up the nagios cgi administration interface. The web interface is not required to run nagios, but it is definitely worth setting it up. The simplest way to get it up and running is to copy the supplied conf file over to our apache installation. On my system, I'm running apache2. Systems running apache 1.3.xx will have slightly different setups.

cp /etc/nagios/apache.conf /etc/apache2/sites-enabled/nagios

Of course you may want to set it up as a virtual server, but I leave that as an exercise for the reader. Now you will want to set up an allowed user to view the cgi interface. By default, nagios issues full administrative access to the nagiosadmin user. Nagios uses apache htpasswd style authentication. So here we add a user and password to the default nagios htpasswd file. Here we add the user nagiosadmin with password mypassword to the nagios htpasswd file.

htpasswd2 -nb nagiosadmin mypassword >> /etc/nagios/htpasswd.users

You should now be able to restart apache and logon to

http://your.nagios.server/nagios

Nagios is a very powerful tool for monitoring networks. I've only touched on the basics here, but it should be enough to get you up and running. Hopefully, once you do, you'll start experimenting with all the cool features and plugins that are available. The documentation included in the cgi interface is very detailed and helpful.

nagstamon 0.5.10 by Nagiostray

About: Nagstamon is a Nagios status monitor with a UI that resides in the GNOME systray or on the Windows desktop. It informs you in realtime about the status of your Nagios monitored network.

Changes: This release fixes a problem with passwords containing special characters, and an issue where it omitted showing failed services on hosts in scheduled downtime.

[Jun 25, 2008] check_oracle_health

About: check_oracle_health is a plugin for the Nagios monitoring software that allows you to monitor various metrics of an Oracle database. It includes connection time, SGA data buffer hit ratio, SGA library cache hit ratio, SGA dictionary cache hit ratio, SGA shared pool free, PGA in memory sort ratio, tablespace usage, tablespace fragmentation, tablespace I/O balance, invalid objects, and many more.

Release focus: Major feature enhancements

Changes: The tablespace-usage mode now takes into account when tablespaces use autoextents. The data-buffer/library/dictionary-cache-hitratio are now more accurate. Sqlplus can now be used instead of DBD::Oracle.

Configuring Nagios ¶

In the main config file, make sure that the command_file directive is set and that it works. See http://nagios.sourceforge.net/docs/2_0/configmain.html#command_file for details.

Below is a sample extract from nagios.cfg:

command_file=/var/run/nagios/nagios.cmd

The /var/run/nagios directory is owned by the user nagios runs as. The nagios.cmd is a named pipe on which Nagios accepts external input.

Configuring NSCA, server side ¶

NSCA is run through (x)inetd. Using inetd, the below line enables NSCA listening on port 5667:

5667 stream tcp nowait nagios /usr/sbin/tcpd /usr/sbin/nsca -c /etc/nsca.cfg --inetd

Using xinetd, the blow line enables NSCA listening on port 5667, allowing connections only from the local host:

# description: NSCA (Nagios Service Check Acceptor)

service nsca

{

flags = REUSE

type = UNLISTED

port = 5667

socket_type = stream

wait = no

server = /usr/sbin/nsca

server_args = -c /etc/nagios/nsca.cfg --inetd

user = nagios

group = nagios

log_on_failure += USERID

only_from = 127.0.0.1

}

The file /etc/nsca.cfg defines how NSCA behaves. Check in particular the nsca_user and command_file directives, these should correspond to the file permissions and the location of the named pipe described in nagios.cfg.

nsca_user=nagios

command_file=/var/run/nagios/nagios.cmd

Configuring NSCA, client side ¶

The NSCA client is a binary that submits to an NSCA server whatever it received as arguments. Its behaviour is controlled by the file /etc/send_nsca.cfg, which mainly controls encryption.

You should now be able to test the communication between the NSCA client and the NSCA server, and consequently whether Nagios picks up the message. NSCA requires a defined format for messages. For service checks, it's like this: <host_name>[tab]<svc_description>[tab]<return_code>[tab]<plugin_output>[newline]

Below is shown how to test NSCA.

$ /usr/sbin/send_nsca -H localhost -c /etc/send_nsca.cfg

foo.example.com test 0 0

1 data packet(s) sent to host successfully.

This caused the following to appear in /var/log/nagios/nagios.log:

[1159868622] Warning: Message queue contained results for service 'test' on host 'foo.example.com'. The service could not be found!

Messages are sent by munin-limits based on the state of a monitored data source: OK, Warning and Critical. Munin does not currently support a Unknown state (This will be fixed in the future, see Ticket 29 for more information).

Configuring munin.conf ¶

Nagios uses the above mentioned send_nsca binary to send messages to Nagios. In /etc/munin/munin.conf, enter this:

contacts nagios

contact.nagios.command /usr/bin/send_nsca -H your.nagios-host.here -c /etc/send_nsca.cfg

Be aware that the -H switch to send_nsca appeared sometime after send_nsca version 2.1. Always check send_nsca --help!

Configuring Munin plugins ¶

Lots of Munin plugins have (hopefully reasonable) values for Warning and Critical levels. To set or override these, you can change the values in munin.conf.

Configuring Nagios services ¶

Now Nagios needs to recognize the messages from Munin as messages about services it monitors. To accomplish this, every message Munin sends to Nagios requires a matching (passive) service defined or Nagios will ignore the message (but it will log that something tried).

A passive service is defined through these directives in the proper Nagios configuration file:

active_checks_enabled 0

passive_checks_enabled 1

A working solution is to create a template for passive services, like the one below:

define service {

name passive-service

active_checks_enabled 0

passive_checks_enabled 1

parallelize_check 1

notifications_enabled 1

event_handler_enabled 1

is_volatile 1

}

When the template is registered, each Munin plugin should be registered as per below:

define service {

use passive-service

host_name foo

service_description bar

check_period 24x7

max_check_attempts 3

normal_check_interval 3

retry_check_interval 1

contact_groups linux-admins

notification_interval 120

notification_period 24x7

notification_options w,u,c,r

check_command check_dummy!0

}

Notes ¶

· host_name is either the FQDN of the host_name registered to the Nagios plugin, or the host alias corresponding to Munin's

srijafavouriteblog

Thursday, 3 November 2011

ABOUT NAGIOS

No comments:

Post a Comment