1107605 – [Nagios] Services moving to unknown state with "sadf command failed"

Bug 1107605 - [Nagios] Services moving to unknown state with "sadf command failed"

Summary: [Nagios] Services moving to unknown state with "sadf command failed"

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-nagios-addons
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Darshan
QA Contact:	RHS-C QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1087818
TreeView+	depends on / blocked

Reported:	2014-06-10 10:48 UTC by Shruti Sampat
Modified:	2018-01-30 11:11 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	Executing sadf command used by the Nagios plug-ins returns invalid output. Workaround (if any): Delete the datafile located at /var/log/sa/saDD where DD is current date. This deletes the datafile for current day and a new datafile is automatically created and which is usable by Nagios plug-in.
Clone Of:
Environment:
Last Closed:	2018-01-30 11:11:39 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Shruti Sampat 2014-06-10 10:48:49 UTC

Description of problem:
------------------------

On one of the hosts being monitored using Nagios, the services CPU Utilization, Memory Utilization, Swap Utilization moved to unknown with "sadf command failed" as the status information and a long xml error. Network utilization was unknown with the status information "UNKNOWN"

Version-Release number of selected component (if applicable):
gluster-nagios-addons-0.1.2-1.el6rhs.x86_64

How reproducible:
Saw it once.

Steps to Reproduce:
Cannot provide steps for recreating this issue as I have not observed anything unusual.

Actual results:
Services moved to unknown.

Expected results:
Services should not move to unknown for no apparent reason.

Additional info:

Comment 1 Dusmant 2014-06-10 15:47:31 UTC

As per triage call on 10 June -- NON BLOCKER

Comment 2 Prasanth 2014-06-12 10:09:13 UTC

I've seen the same issue reported in this bug once with the latest build rhsc-nagios-release-denali-6 during my testing. 

Steps: 

1. Installed and configured RHSC + Nagios Server using http://rhsm.pad.engineering.redhat.com/rhsc-nagios-release-denali-6
2. Launched 3 fresh RHS VM's using the build RHSS-3.0-20140609.n.0-RHS-x86_64-DVD1.iso
3. Added these 3 RHS nodes to RHSC, created and started some volumes.
4, Ran the auto config script to import the cluster to Nagios
5. Waited for all the services to show up in Nagios UI

However, I noticed that Status Information of "Network Utilization" of all the 3 RHS nodes is showing as "UNKNOWN' for ever.

Can you confirm if this is due to the same issue as reported in this bug or not?  If so, let me know if you want me to log a different BZ for this. 

I can also share my test setup, if needed for debugging.

Comment 3 Darshan 2014-06-12 12:23:17 UTC

Looks like these two are different issues. 

   For this bug all the plugins dependent on sadf were showing unknown. Reason
sadf command was returning incomplete xml output which which was not readable by our plugins.

   Issue seen by prashanth: only network was showing unknown and the reason was the name of some nic was shown in binary format in the output xml of sadf command. It is not valid to have binary data in xml output. hence it was not readable by the plugin.

Comment 4 Shruti Sampat 2014-07-04 11:52:27 UTC

Saw it again in my setup. 2 out of 5 nodes being monitored in my setup are affected by this issue. Proposing for 3.0.z.

Maybe it should be documented for 3.0.

Comment 6 Darshan 2014-07-07 06:57:49 UTC

Have added doc text.

Comment 12 Shalaka 2014-09-20 16:31:48 UTC

Please review and sign-off edited doc text.

Comment 13 Darshan 2014-09-22 04:29:10 UTC

looks good

Comment 15 Ramesh N 2014-10-13 12:54:15 UTC

Moving this out of RHS 3.0.2

Comment 18 Sahina Bose 2018-01-30 11:11:39 UTC

Thank you for your report. However, this bug is being closed as it's logged against gluster-nagios monitoring for which no further new development is being undertaken.

Note You need to log in before you can comment on or make changes to this bug.