Bug 1278332
Summary: | nfs-ganesha server do not enter grace period during failover/failback | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Soumya Koduri <skoduri> | |
Component: | gluster-nfs | Assignee: | Kaleb KEITHLEY <kkeithle> | |
Status: | CLOSED ERRATA | QA Contact: | Shashank Raj <sraj> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | akhakhar, annair, asrivast, dblack, jbuchta, jthottan, kkeithle, mzywusko, ndevos, nlevinki, rabhat, rcyriac, rhinduja, rhs-bugs, rnalakka, sankarshan, sashinde, skoduri, storage-qa-internal | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.1.3 | Flags: | sankarshan:
needinfo+
|
|
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.7.9-1 | Doc Type: | Bug Fix | |
Doc Text: |
NFS-ganesha servers were not always able to fail over gracefully if a node was shut down. This meant that NFS clients that connected to these nodes were unable to recover state after the shutdown, because that state had not been gracefully handed off to another node. This resulted in a hang in the client mount point. These processes have been updated so that NFS-ganesha servers have a grace period in which to hand off their processes before shutdown so that clients can continue to access data and reclaim any lost state.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1290865 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-23 04:56:24 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1290865, 1317424 | |||
Bug Blocks: | 1299184 |
Description
Soumya Koduri
2015-11-05 09:57:56 UTC
The fix is posted upstream for review - http://review.gluster.org/13275 Correcting a backwards dependency chain. (In reply to Soumya Koduri from comment #3) > The fix is posted upstream for review - > http://review.gluster.org/13275 Sorry. Had given wrong link. The fix provided upstream is - http://review.gluster.org/12964 This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions FWIW I've tested RHGS 3.1.2 with Kaleb's patches, and it corrects the the VIP failover problem for me. In my VM lab, I can pause one of the nodes in a 2-node ganesha-ha configuration, and the VIP will quickly failover to the other node. Prior to this patch, the VIP would not failover and in fact the VIP on the remaining 'up' node would quickly disappear, causing a complete failure of the HA system. Verified this bug with 3.1.3 latest build and the original issue, where nfs-ganesha was not entering in grace period during failover/failback, can not be reproduced. However there are other grace related bugs which are observed during verification as below and can be tracked separately: >>>> Bug 1329887 - Unexpected behavior observed when nfs-ganesha enters grace period. (https://bugzilla.redhat.com/show_bug.cgi?id=1329887) Description: During failover/failback, nfs-ganesha enters grace period only for 60 seconds and IO's get stopped for somewhere around 70-75 seconds >>>> Bug 1330218 - Shutting down I/O serving node, takes 15-20 mins for IO to resume from failed over node. (https://bugzilla.redhat.com/show_bug.cgi?id=1330218) Description: Shutting down I/O serving node, takes 15-20 mins for IO to resume from failed over node. Since the original reported issue is not reproducible any more and seems to be working fine with latest ganesha builds, hence marking this bug as Verified. Providing PM approval for the accelerated fix. Do we have a build with required fixes (both)? Kindly post the brew link on the bug so that we can pick it up for verification. There are lot many regressions we are seeing related to failover/failback with 3.1.3 build and there are couple of open bugs as of now for 3.1.3. And to verify this bug for hotfix we need to take care of other existing/open bugs, which doesn't look like a good idea to do as of now. so probably if everyone agrees, can we drop this bug (related patches) from the hotfix build and provide a new build which contains the fixes of only 2 bugs as below: https://review.gerrithub.io/#/c/263358/ (BZ#1306691 crash fix) http://review.gluster.org/13459 (BZ#1301542) doc_text looks good to me. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |