| Summary: | Heal info plugin shows Critical state when files are healing, which is misleading | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Sweta Anandpara <sanandpa> | ||||||
| Component: | nagios-server-addons | Assignee: | Sahina Bose <sabose> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Sweta Anandpara <sanandpa> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | rhgs-3.1 | CC: | asrivast, rhinduja, sankarshan | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | RHGS 3.1.3 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | gluster-nagios-addons-0.2.7-1 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-06-23 05:27:56 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1311817 | ||||||||
| Attachments: |
|
||||||||
|
Description
Sweta Anandpara
2016-04-25 05:46:04 UTC
With "heal info" command, there's no way to determine if heal is in progress. At any time, we can only determine the entries needing heal. Ideally if the entries needing heal do not decrease over time, then the plugin should go to Critical state. However, changing state based on Trends is not possible - admin has to monitor the plugin trend graph once the plugin state is warning. So, in effect, the states of plugin OK - no files need healing WARNING - there are files requiring heal or if command could not be executed due to nrpe/other errors UNKNOWN - command execution failed due to transaction in progress Moving this out of 3.1.3 as per comment 2. Once we review the states expected, will either close it or implement changes. After reviewing the current implementation of "gluster volume heal info" - the output returns "Possibly undergoing heal" in 2 cases 1. File is actually undergoing heal 2. heal info command is executed simultaneously on 2 nodes, which acquires lock on file. Moving the plugin state to "Critical" in such cases is misleading to the user. If files are undergoing heal - this is expected, and user only needs to be warned of this case, similar to the warning about unsynced entries. The plugin status needs to be changed. Created attachment 1156640 [details]
Server and client logs
Tested and verified this on the build glusterfs 3.7.9-4 , with nagios-server-addons 0.2.5-1 and gluster-nagios-addons 0.2.7-1 Had a replica2 and replica3 volume, killed a brick using 'kill 15' and created large file(s) from nfs/fuse mount. Verified that the 'volume heal info' goes to 'warning' - saying ' unsynced entries found'. The cli command 'gluster volume heal <volname> info' lists the number of files that are out of sync. Start the volume using force option, thereby restarting the brick process, in turn triggering self heal to heal the file(s) in the brick that has just come up. The nagios web UI continues to show the service 'volume heal info' as 'warning' as opposed to 'critical' that used to get shown before. When the healing completes, the service transitions to green. Moving this BZ to verified in 3.1.3. Detailed logs are attached. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1242 |