| Summary: | glusterd restart is starting the offline shd daemon on other node in the cluster | |||
|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> | |
| Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | |
| Status: | CLOSED ERRATA | QA Contact: | Vinayak Papnoi <vpapnoi> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.2 | CC: | asrivast, bsrirama, nchilaka, rhinduja, rhs-bugs, sasundar, storage-qa-internal, vbellur | |
| Target Milestone: | --- | |||
| Target Release: | RHGS 3.3.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.8.4-19 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1383893 (view as bug list) | Environment: | ||
| Last Closed: | 2017-09-21 04:28:23 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | 1383893, 1417042 | |||
| Bug Blocks: | 1417147 | |||
|
Description
Byreddy
2016-10-05 06:54:08 UTC
RCA: This is not a regression and has been there since server side quorum is introduced. Unlike brick processes, daemon services are (re)started irrespective of what the quorum state is. In this particular case, when glusterd instance on N1 was brought down and shd service of N2 was explicitly killed, upon restarting glusterd service on N1, N2 gets a friend update request which calls glusterd_restart_bricks () and which eventually ends up spawning the shd daemon. If the same reproducer is applied for one of the brick processes, the brick doesn't come up as for bricks the logic is start the brick processes only if the quorum is regained, otherwise skip it. To fix this behaviour the other daemons should also follow the same logic like bricks. Upstream mainline patch http://review.gluster.org/15626 posted for review. Byreddy - I'd like to see if there are any other implications on the changes done in http://review.gluster.org/15626 through upstream review process. IMHO, given its not a regression nor a severe bug, this bug can be fixed post 3.2.0 too. Please let us know your thoughts here. (In reply to Atin Mukherjee from comment #4) > Byreddy - I'd like to see if there are any other implications on the changes > done in http://review.gluster.org/15626 through upstream review process. > IMHO, given its not a regression nor a severe bug, this bug can be fixed > post 3.2.0 too. Please let us know your thoughts here. I am OK to take this IN/OUT of 3.2.0, there is no any functionality loss with out this fix. downstream patch : https://code.engineering.redhat.com/gerrit/#/c/101298/ Build - 3.8.4-26 Followed the steps to reproduce provided in the description. Killed shd on one node and restarted Glusterd on another node. This didn't result in the offline shd daemon to start on previously killed node in the cluster. Hence, moving to Verifed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |