Description of problem: ------------------------ On one of the hosts being monitored using Nagios, the services CPU Utilization, Memory Utilization, Swap Utilization moved to unknown with "sadf command failed" as the status information and a long xml error. Network utilization was unknown with the status information "UNKNOWN" Version-Release number of selected component (if applicable): gluster-nagios-addons-0.1.2-1.el6rhs.x86_64 How reproducible: Saw it once. Steps to Reproduce: Cannot provide steps for recreating this issue as I have not observed anything unusual. Actual results: Services moved to unknown. Expected results: Services should not move to unknown for no apparent reason. Additional info:
As per triage call on 10 June -- NON BLOCKER
I've seen the same issue reported in this bug once with the latest build rhsc-nagios-release-denali-6 during my testing. Steps: 1. Installed and configured RHSC + Nagios Server using http://rhsm.pad.engineering.redhat.com/rhsc-nagios-release-denali-6 2. Launched 3 fresh RHS VM's using the build RHSS-3.0-20140609.n.0-RHS-x86_64-DVD1.iso 3. Added these 3 RHS nodes to RHSC, created and started some volumes. 4, Ran the auto config script to import the cluster to Nagios 5. Waited for all the services to show up in Nagios UI However, I noticed that Status Information of "Network Utilization" of all the 3 RHS nodes is showing as "UNKNOWN' for ever. Can you confirm if this is due to the same issue as reported in this bug or not? If so, let me know if you want me to log a different BZ for this. I can also share my test setup, if needed for debugging.
Looks like these two are different issues. For this bug all the plugins dependent on sadf were showing unknown. Reason sadf command was returning incomplete xml output which which was not readable by our plugins. Issue seen by prashanth: only network was showing unknown and the reason was the name of some nic was shown in binary format in the output xml of sadf command. It is not valid to have binary data in xml output. hence it was not readable by the plugin.
Saw it again in my setup. 2 out of 5 nodes being monitored in my setup are affected by this issue. Proposing for 3.0.z. Maybe it should be documented for 3.0.
Have added doc text.
Please review and sign-off edited doc text.
looks good
Moving this out of RHS 3.0.2
Thank you for your report. However, this bug is being closed as it's logged against gluster-nagios monitoring for which no further new development is being undertaken.