Thursday, 3 November 2011

Clearly about Nagios

Nagios Configuration

You must first define a new command in the Nagios checkcommands.cfg file (assuming you have installed WebInject in /usr/local/webinject):
define command {
    command_name    webinject
    command_line    /usr/local/webinject/webinject.pl -c $ARG1$ $ARG2$
}
Then you should declare a new service in the Nagios services.cfg file:
define service {
    use                      generic-service
    host_name                MyApplication-server
    service_description      WebInject test of MyApplication
    is_volatile              0
    check_period             24x7
    max_check_attempts       3
    normal_check_interval    1
    retry_check_interval     1
    contact_groups           myapplication-admins
    notification_interval    120
    notification_period      24x7
    notification_options     w,u,c,r
    check_command            webinject!myconfig.xml!MyApplication.xml
}
Of course, replace MyApplication-server with a host that you have declared in the hosts.cfg file, myapplication-admins by a contact group defined in the contactgroups.cfg file, and config-proxy.xml and MyApplication.xml by your own WebInject defined files.
After your configurations are setup, stop and start the Nagios process to reload the configuration files.
Now, if your database server is down but not the MyApplication server, you should receive a critical alert in Nagios with the message:

    WebInject CRITICAL: Unable to do a sample request that accesses the MyApplication database
If you don't have an 'errormessage' specified in your WebInject test case, you will receive something like:

    WebInject CRITICAL - Test case number 4 failed
If your test works but takes more than 20 seconds (the threshold defined by your WebInject 'globaltimeout' value), you will receive:

    WebInject WARNING - All tests passed successfully but global timeout (20 seconds) has been reached
If everything works perfectly, you will receive something like:

    WebInject OK - All tests passed successfully in 4.932 seconds

Troubleshooting

Beware that if you run WebInject with a user other than the default 'nagios', some files may be created with this user. Nagios will obtain an error if it can't write to these files. So give the ownership of these files to 'nagios' or remove them and run WebInject again as the 'nagios' user (the one declared in nagios.cfg). If you have file ownership issues, this will show up in Nagios with an error like "Return code of 13 for check of service ... on host ... was out of bounds"
Make sure the 'nagios' user has sufficient rights to execute webinject.pl, and to write to output files http.log, result.xml, and result.html. (if you don't have this setup correctly, you will get back invalid return codes from the plugin)




Bottom of Form







#summary This page describe the wlsagent Nagios plugin.

=== Introduction ===

Wlsagent is a small JMX client, which exposes performance metrics for !WebLogic servers 9.x and 10.x.

You can get the performance data by submitting simple HTTP requests to the plugin, as it embeds a Jetty container. The plugin runs in the background and handles every monitoring request, which allows a small memory footprint (a 32MB JVM heap is sufficient), no CPU overhead and short response time.

=== Running the plugin ===

First of all, you have to run the plugin. This script will do the job (make sure JAVA_HOME variable is defined):

{{{
#!/bin/bash

CURRENT_PATH=$(dirname "$0")
LIB_PATH=${CURRENT_PATH}/lib
CLASSPATH=${CURRENT_PATH}/wlsagent.jar

for jar in $(find ${LIB_PATH} -name '*.jar'); do
  CLASSPATH=${CLASSPATH}:${jar};
Done

${JAVA_HOME}/bin/java -Xmx32m -cp ${CLASSPATH} net.wait4it.wlsagent.WlsAgent "$@" > /dev/null 2>&1 &
}}}

The content of the working directory should look like this :
  * the lib directory
  * the wlsagent.jar file
  * the previous shell script

The lib directory contains the jar dependencies for Jetty and the !WebLogic implementation of the 't3' protocol :

 jetty-continuation-7.3.0.v20110203.jar<br>
 jetty-http-7.3.0.v20110203.jar<br>
 jetty-io-7.3.0.v20110203.jar<br>
 jetty-security-7.3.0.v20110203.jar<br>
 jetty-server-7.3.0.v20110203.jar<br>
 jetty-servlet-7.3.0.v20110203.jar<br>
 jetty-util-7.3.0.v20110203.jar<br>
 servlet-api-2.5.jar<br>
 wlclient.jar<br>
 wljmxclient.jar<br>

You can get the last Jetty distribution from the Eclipse site :


The !WebLogic jars are available in the ${WL_HOME}/server/lib directory of the !WebLogic server distribution.

Below is a Unix CLI invocation example of the previous script:

./wlsagent.sh 8080

=== Using it ===
 
Once the plugin is running, you can invoke it for instance with wget :


In the example above, the plugin is listening on the port 8080, and the target server on the port 7001.

The above command produces the following output:

3|Unable to get MBeanServerConnection for JMXServiceURL service:jmx:t3://localhost:7001/jndi/weblogic.management.mbeanservers.runtime

That's fair enough, as we have to provide credentials to access our !WebLogic MBeanServer. Notice the first character of the command output string is the regular Nagios exit code (UNKNOWN in this case).

In order to monitor your application servers, you can create a specific user on the !WebLogic side, and make sure it belongs to the monitor group.

Let's retry by adding the credentials :

wget -q -O - 'http://localhost:8080/wlsagent/WlsAgent?hostname=localhost&port=7001&username=nagios&password=nagios'

This time, we get this output:

0|server1: status OK|

The Nagios exit code is 0 (OK), as we didn't perform any test. 'server1' is the name of our !WebLogic instance.

Next we're going to get information about the JVM heap usage by adding [jvm]=!UsedMemory;80;90 to the request parameters :

wget -q -O - 'http://localhost:8080/wlsagent/WlsAgent?hostname=localhost&port=7001&username=nagios&password=nagios&jvm=UsedMemory;80;90'

The two numeric values at the end are the warning and critical thresholds for the memory usage, we will go back on this later.

The command output is :

0|server1: status OK|!HeapSize=256M;;;0;512 !UsedMemory=194M;;;0;512

As you can see, we get the current heap size, the maximum heap size and the amount of memory currently used by the server.

Let's try to change the warning threshold value to '30':

wget -q -O - 'http://localhost:8080/wlsagent/WlsAgent?hostname=localhost&port=7001&username=nagios&password=nagios&jvm=UsedMemory;30;90'

We get this:

1|server1: status WARNING - Warning alert raised by the [jvm] test |!HeapSize=256M;;;0;512 !UsedMemory=200M;;;0;512

A warning alert is raised by the test, as the ratio used memory / maximum memory is superior to 30% (our warning threshold).

=== Screenshots ===

Here is below a sample of the kind of graph you can get with !HeapSize and !UsedMemory properties:

https://lh6.googleusercontent.com/_GD7v8WzxfNA/TbGe2-SFQ_I/AAAAAAAAAmo/raSsL_i6iCY/s400/nagios.jpg

Nagios Plugins

Aside from the normal usage as an individual test harness, WebInject can run in a mode that allows it to be used as a plugin for Nagios.
Nagios is an open source host, service, and network monitoring program. It is very popular and used by many big companies and organizations for enterprise monitoring. Nagios was originally designed to run under Linux, but should run on other Unix variants as well. Nagios is a very powerful, very flexible monitoring solution, with many plugins available to do almost anything, and with a large number of options for notification and service monitoring. Nagios is based on a central server that runs external commands or plugins to test distributed hosts and services.
For information on Nagios, visit http://www.nagios.org

Why Another HTTP Plugin for Nagios?

There is an existing plugin for Nagios named 'check_http' that is installed along with the official set of Nagios plugins. The check_http plugin tests the HTTP service on a specified host and port. It has a variety of options and is useful for cursory monitoring of an HTTP service.
However there are instances when it would be useful to do more than send a single request to test your HTTP service. WebInject is a test harness that can operate as an intelligent test agent that is able to chain together test cases to form a functional test suite for your web application or web service.
For example, lets say you wanted to monitor a web application's functionality through multiple phases:
Phase 1 - Connect to the application
Phase 2 - Authenticate a user under the web application's login/authentication system
Phase 3 - Verify that you can navigate through the application while you're authenticated
Phase 4 - Do a sample request that accesses a database to verify it is available to the web application
This could not be done in Nagios with a simple check_http which just verifies results of a single request without chaining tests that depend on previous responses. However, we can use WebInject in the Nagios plugin mode to do the job. The idea is to consider all of these 4 consecutive WebInject tests as one global test in the Nagios environment.

WebInject Plugin Return Codes

After the test has been launched, WebInject could return these codes to Nagios:

OK

We consider that the test is OK if all the consecutives tests have been successfully passed. The return code 0 is sent to Nagios with a message giving the global time needed to pass all the tests.

WARNING

If you declare a 'globaltimeout' in your config file, this value given in seconds will be compared to the global time elapsed to run all the tests. If the tests have all been successful, but have taken more time than the 'globaltimeout' value, This is considered a WARNING. The return code 1 is sent to Nagios with a warning message stating that your tests took longer than the 'globaltimeout' to run.

CRITICAL

We consider that the test failed (CRITICAL) if any of the WebInject declared tests in the test case file failed. The return code 2 is then sent to Nagios with a simple message describing which test number has failed. This message could easily be personalized by adding an 'erromessage' section in your test case.

WebInject Configuration

First you must install WebInject and the necessary Perl modules on your Nagios server. See the WebInject Build page for help getting it setup. Use of remote test computers with nrpe and nsca should also work but have not been tested.
Then you have to modify the config.xml and add this line in it to enable the Nagios plugin mode:
<reporttype>nagios</reporttype>
Note: you may also create an alternate config file that you can then specify on the command line used to call webinject.pl
This will modify the output of WebInject to be compatible with Nagios. Only one result line will be given and the return code will depend on the results of the tests.
You could also add a <globaltimeout> setting to your configuration file if you want obtain a warning if the test is slow.
Here is an example config file (we will refer to it as myconfig.xml in later examples) which is useable with Nagios:
<testcasefile>MyApplication.xml</testcasefile>
<useragent>WebInject Application Tester</useragent>
<timeout>10</timeout>
<globaltimeout>20</globaltimeout>
<reporttype>nagios</reporttype>
(see the WebInject manual for a full list of config settings that are available)
Now you would setup a test case file. In each case you may specify an errormessage section which will contain a descriptive message of the problem which will be set as the problem description in Nagios interface.
Based on the example we described earlier, here is a minimal MyApplication.xml test case file:
 
<testcases repeat="1">
 
<case
    id="1"
    description1="Connecting to MyAppication"
    method="get"
    url="http://www.mydummyapplication.com/login.php"
    parseresponse='mykey="|"'
    verifypositive="Login"
    errormessage="Unable to connect to the login page of MyApplication"
/>
 
<case
    id="2"
    description1="Authentication on MyApplication"
    method="post"
    url="http://www.mydummyapplication.com/Authentication"
    postbody="user=foo&password=bar&mykey={PARSEDRESULT}"
    verifynegative="User unknown"
    errormessage="Unable to authenticate user foo in MyApplication"
/>
 
<case
    id="3"
    description1="Navigate through MyApplication while authenticated"
    method="get"
    url="http://www.mydummyapplication.com/ApplicationHelp"
    verifypositive="Welcome to the MyApplication help"
    errormessage="Unable to navigate through MyApplication even though correctly authenticated"
/>
 
<case
    id="4"
    description1="Test access to the database"
    method="post"
    url="http://www.mydummyapplication.com/DatabaseRequest.php"
    postbody="object=fruits&color=red"
    verifypositive="strawberry"
    errormessage="Unable to do a sample request that accesses the MyApplication database"
/>
 
</testcases>
(see the WebInject manual for a full list of test case parameters that are available)
Once everything is setup, it is advisable to manually validate your WebInject test before configuring Nagios to use it. Continuing with our example, this can be done by launching:
webinject.pl -c myconfig.xml MyApplication.xml
or a simple:
webinject.pl -c config-proxy.xml
because the test case file name MyApplication.xml is already specified in the myconfig.xml config file.
To obtain more debugging information on standard output, you could change the 'reporttype' config setting to:
<reporttype>standard</reporttype>
and of course have a look at the http.log file.
You could also force false errors by modifying the verify conditions to test everything is working in your tests and that errors are correctly signaled.
Once it works correctly, don't forget to change back the 'reporttype' setting to 'nagios'.






























































































































































































































































No comments:

Post a Comment