Author:
Jouko Markkanen

Categories:
Nagstatus, Roadmap

Tags:

Published:
March 31, 2009
Modified:
June 24, 2009

For some time I have set my mind on revisiting Nagstatus Gadget visual layout. The current (original) layout was intended to provide the overall status of all monitored hosts and services with a single glance. It ended up kind-of doing that, but it also has too many distracting elements that give no value for the purpose. Originally I intended to provide just visual indicators (that came to be the status bars), but as they are not very informative, and small percentages of failed hosts/services do not necessary show at all (especially on larger installations), I ended up adding the numeric indicators.

Now I have noticed that I don’t look at the status bars at all, and the actual information displayed on the numbers is quite small compared to the screen estate they take. That’s maybe not a big problem to most, but if you have several gadgets running on a small screen, you might value a more compressed view.

The Gadget UI

The more I think about the optimal design, I’m returning to the idea of visual indication only. Currently I’m planning to design a gadget with just three traffic lights, that can display one of three or four colors. Their purpose would be:

  1. Gadget status
    • Green for all-ok
    • Yellow/orange for warning/notice kind of states (an update for the gadget is available, an update query failed once, an update took too long to load etc.)
    • Red for errors (no connection to the server, or there was an error either loading or processing the status information etc.)
  2. Host status
    • Green if all hosts are OK (acknowledged errors or scheduled downtimes are calculated as OK)
    • Yellow/orange if there is at least one host unreachable (but none down – actually, I’m not sure if this kind of situation is possible by definition)
    • Red if at least one host is down
  3. Service status
    • Green if all services are OK (also acknowledged errors or scheduled downtimes on the services or their host are calculated as OK)
    • Yellow if at least one service is Unknown (but none in conditions mentioned below)
    • Orange if at least one service is Warning (but none Critical)
    • Red if at least one service is Critical

I’m aware that many Nagios users have constantly having some error conditions on their monitored hosts/services, so the blunt single traffic light interface might not be the best choice. To conquer at least some of the problems, I’ve thought of keeping the numeric information somehow (maybe inside the traffic light, in a more condensed layout), or you might be able to choose whether to show the numbers or not.

But another big change would be the indicators on state changes. Currently the gadget flashes with red rims for a few seconds, when a state change occurs. It does it on all changes (from ok to non-ok states and vice versa, and also on acknowledged/scheduled hosts and services). There’s also a possibility to play a sound, but the same limitations apply to that too. I’m planning to change the indicator to flashing the corresponding “traffic light”, and adding a configuration parameter that allows you to a) disable the flashing, b) flash (as the current behavior) for a few seconds or c) flash until you acknowledge the change (by clicking the light or acknowledging the Nagios status or something).

The Flyout UI

About the flyout UI my plans are not so refined yet. Quite possibly (to keep the update cycle reasonable) I’ll keep it unchanged at first, and start modifying it when the main gadget design has been changed. But some ideas I’ve developed so far:

  • Split the flyout to two or three different pages (overall status, hosts, services; or maybe overall status, non-ok hosts and services, and ok hosts and services).
  • Add more details to ok hosts/services. This would probably be somehow configurable to keep the UI usable on larger installations too. But I’ve gotten used to check some service status using the flyout (like available disk space), and only remember that it’s not displayed if it’s not in warning or critical state when I have already opened the flyout.
  • Maybe some of the detailed information could be presented as a tooltip when hovering the mouse over the host/service name. This would, however, require a more refined method to request information update in a more modular fashion.

The nagxmlstatus.cgi script

Especially the last bulletpoint in the previous list made me think further about the nagxmlstatus.cgi script and it’s functionality. Currently it just converts the whole status.dat into XML and filters out the requested information. This is far from optimal, and makes it difficult to balance between how often to make this translation on the server and how much processing to do on the client.

Once again I wish that there was a real Web Service interface in Nagios itself (like SOAP or XML-RPC). I admit that it would be also heavy thing to implement, but much lighter than running my script every few seconds, and I’m sure that others would find use for such too. However, this is far out reach for my skills and resources, so I’ll have to live with the script system until someone else implements that :) Actually, I have found one, the NXE – Nagios XML Engine, but that project shows no updates since version 1.0 (which was released 6 years ago), so I’m not counting much on that now.

The only practical way to use the current script in larger environments is to dump the script output to a file using cron (or similar) and reading that file from clients. But this method locks down the information to quite little details (or, you could add details, but this would make the client processing much more cumbersome). Maybe I could make a wrapper daemon for the current script, which would periodically read the status.dat and transform it to XML in the memory, and you could request specific information from there, with maybe a possibility to trigger a partial or complete update of the status information.

Endnote

This list was originally titled just “Plans for Nagstatus V2.0”, but I ended up with so many changes that they will definitely not be all implemented in V2.0, if ever. So, consider it as a roadmap with possible developments for the system. As this is a hobby project, I will not set any deadlines nor make any estimates when new releases are available or what they include.

But if you have any comments or suggestions on what you as a gadget user would like to see, drop a comment below or mail me.

Leave a Comment