Bug 1109843
Summary: | [Nagios] Volume utilization is unknown with status information "Invalid host name <hostname-of-RHS-node>" when glusterd is stopped | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Shruti Sampat <ssampat> |
Component: | nagios-server-addons | Assignee: | Nishanth Thomas <nthomas> |
Status: | CLOSED ERRATA | QA Contact: | Shruti Sampat <ssampat> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.0 | CC: | asrivast, dpati, kmayilsa, knarra, nthomas, psriniva, rhsc-qe-bugs, rnachimu, sabose, sgraf, sharne |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | RHGS 3.0.3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | nagios-server-addons-0.1.9-1.el6rhs | Doc Type: | Bug Fix |
Doc Text: |
Previously, if the host that is used for discovery was detached from the Red Hat Storage trusted storage pool, then all the hosts would get removed from the Nagios configuration when an auto-discovery was performed. With this fix, auto-config does not remove any configuration detail if the host used for discovery is detached from the Red Hat Storage trusted storage pool.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-01-15 13:48:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1087818, 1136205 |
Description
Shruti Sampat
2014-06-16 13:14:38 UTC
This issue is unlikely to happen often, after the bug fix of Bug 1109025 . But it needs to be documented. Please add doc text for the known issue FYI, this issue is seen even with the fix of BZ #1109025, even with glusterd being running. Please review and signoff edited doc text. doc_text looks good Hi, This issue is also seen in case of volume quota monitoring service, when the volume is stopped. Maybe the doc text needs to be changed to include this too, right now it seems specific to volume utilization. Hi, Another situation where I saw this issue is while testing quota timeout value using the -t option (BZ #1094614) Performed the following steps to cause the quota list command to not return within 1 second, and thus the timeout to occur (timeout was set to 1 second using the -t option) - 1. Created 2000 directories on the mount of the volume. 2. Configured quota limits on all 2000 directories. Now quota list command takes over 1 second to return the information. While quota was being configured on the directories, the status of the quota service was UNKNOWN with the status information as "Invalid host name rhs.5" (rhs.5 is one of the hosts in the cluster being monitored) After a while the status of the service was CRITICAL with status information "CHECK_NRPE: Socket timeout after 1 seconds." This issue is also seen when quota is enabled for a volume and the volume is stopped. The status information of quota status service displays "Invalid host name 'rhs.4' ", rhs.4 being the name of one of the hosts in the cluster. Moving back to assigned state as there are some scenarios which is not covered in the bug Verified as fixed in nagios-server-addons-0.1.9-1.el6rhs Tested with RHS+Nagios in cluster of 4 nodes. Verified in the following scenarios - 1. Stopped nrpe on one of the nodes. 2. Stopped glusterd on a couple of nodes. 3. Powered off one of the nodes. In all of the above scenarios, volume utilization was unknown with the following status information - UNKNOWN: Failed to get the Volume Utilization Data Also tested with volume quota service, as mentioned in Comment #6 and Comment #7 - 1. Status of volume quota service when volume was stopped was warning with status information - QUOTA: Quota status could not be determined. quota command failed : Volume is stopped, start volume before executing quota command. 2. Unable to reproduce with scenario mentioned in Comment #7 Marking as verified. Hi Nishanth, Can you please review the edited doc text for technical accuracy and sign off? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0039.html The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |