Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1107605 - [Nagios] Services moving to unknown state with "sadf command failed"
[Nagios] Services moving to unknown state with "sadf command failed"
Status: CLOSED CANTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nagios-addons (Show other bugs)
3.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Darshan
RHS-C QE
: ZStream
Depends On:
Blocks: 1087818
  Show dependency treegraph
 
Reported: 2014-06-10 06:48 EDT by Shruti Sampat
Modified: 2018-01-30 06:11 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
Executing sadf command used by the Nagios plug-ins returns invalid output. Workaround (if any): Delete the datafile located at /var/log/sa/saDD where DD is current date. This deletes the datafile for current day and a new datafile is automatically created and which is usable by Nagios plug-in.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-01-30 06:11:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shruti Sampat 2014-06-10 06:48:49 EDT
Description of problem:
------------------------

On one of the hosts being monitored using Nagios, the services CPU Utilization, Memory Utilization, Swap Utilization moved to unknown with "sadf command failed" as the status information and a long xml error. Network utilization was unknown with the status information "UNKNOWN"

Version-Release number of selected component (if applicable):
gluster-nagios-addons-0.1.2-1.el6rhs.x86_64

How reproducible:
Saw it once.

Steps to Reproduce:
Cannot provide steps for recreating this issue as I have not observed anything unusual.

Actual results:
Services moved to unknown.

Expected results:
Services should not move to unknown for no apparent reason.

Additional info:
Comment 1 Dusmant 2014-06-10 11:47:31 EDT
As per triage call on 10 June -- NON BLOCKER
Comment 2 Prasanth 2014-06-12 06:09:13 EDT
I've seen the same issue reported in this bug once with the latest build rhsc-nagios-release-denali-6 during my testing. 

Steps: 

1. Installed and configured RHSC + Nagios Server using http://rhsm.pad.engineering.redhat.com/rhsc-nagios-release-denali-6
2. Launched 3 fresh RHS VM's using the build RHSS-3.0-20140609.n.0-RHS-x86_64-DVD1.iso
3. Added these 3 RHS nodes to RHSC, created and started some volumes.
4, Ran the auto config script to import the cluster to Nagios
5. Waited for all the services to show up in Nagios UI

However, I noticed that Status Information of "Network Utilization" of all the 3 RHS nodes is showing as "UNKNOWN' for ever.

Can you confirm if this is due to the same issue as reported in this bug or not?  If so, let me know if you want me to log a different BZ for this. 

I can also share my test setup, if needed for debugging.
Comment 3 Darshan 2014-06-12 08:23:17 EDT
Looks like these two are different issues. 

   For this bug all the plugins dependent on sadf were showing unknown. Reason
sadf command was returning incomplete xml output which which was not readable by our plugins.

   Issue seen by prashanth: only network was showing unknown and the reason was the name of some nic was shown in binary format in the output xml of sadf command. It is not valid to have binary data in xml output. hence it was not readable by the plugin.
Comment 4 Shruti Sampat 2014-07-04 07:52:27 EDT
Saw it again in my setup. 2 out of 5 nodes being monitored in my setup are affected by this issue. Proposing for 3.0.z.

Maybe it should be documented for 3.0.
Comment 6 Darshan 2014-07-07 02:57:49 EDT
Have added doc text.
Comment 12 Shalaka 2014-09-20 12:31:48 EDT
Please review and sign-off edited doc text.
Comment 13 Darshan 2014-09-22 00:29:10 EDT
looks good
Comment 15 Ramesh N 2014-10-13 08:54:15 EDT
Moving this out of RHS 3.0.2
Comment 18 Sahina Bose 2018-01-30 06:11:39 EST
Thank you for your report. However, this bug is being closed as it's logged against gluster-nagios monitoring for which no further new development is being undertaken.

Note You need to log in before you can comment on or make changes to this bug.