Bug 1478710
Summary: | when gluster pod is restarted, bricks from the restarted pod fails to connect to fuse, self-heal etc | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Mohit Agrawal <moagrawa> | |
Component: | glusterd | Assignee: | Mohit Agrawal <moagrawa> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | mainline | CC: | akhakhar, amukherj, annair, bugs, hchiramm, jarrpa, kramdoss, madam, mliyazud, mzywusko, pprakash, rcyriac, rhs-bugs, rkavunga, rreddy, rtalur, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.13.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1477024 | |||
: | 1479662 (view as bug list) | Environment: | ||
Last Closed: | 2017-12-08 17:37:20 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1477024 | |||
Bug Blocks: | 1479662 |
Description
Mohit Agrawal
2017-08-06 11:50:09 UTC
REVIEW: https://review.gluster.org/17984 (glusterd: Sometime on cns after pod is restarted client is getting Transport endpoint error while brick mux is on) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Sometime on cns after pod is restarted client is getting Transport endpoint error while brick mux is on) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Sometime on cns after pod is restarted client is getting Transport endpoint error while brick mux is on) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Sometime on cns after pod is restarted client is getting Transport endpoint error while brick mux is on) posted (#4) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#5) for review on master by Atin Mukherjee (amukherj) REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#6) for review on master by Atin Mukherjee (amukherj) REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#7) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#8) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#9) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#10) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#11) for review on master by MOHIT AGRAWAL (moagrawa) REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#12) for review on master by MOHIT AGRAWAL (moagrawa) COMMIT: https://review.gluster.org/17984 committed in master by Jeff Darcy (jeff.us) ------ commit c13d69babc228a2932994962d6ea8afe2cdd620a Author: Mohit Agrawal <moagrawa> Date: Tue Aug 8 14:36:17 2017 +0530 glusterd: Block brick attach request till the brick's ctx is set Problem: In multiplexing setup in a container environment we hit a race where before the first brick finishes its handshake with glusterd, the subsequent attach requests went through and they actually failed and glusterd has no mechanism to realize it. This resulted into all the such bricks not to be active resulting into clients not able to connect. Solution: Introduce a new flag port_registered in glusterd_brickinfo to make sure about pmap_signin finish before the subsequent attach bricks can be processed. Test: To reproduce the issue followed below steps 1) Create 100 volumes on 3 nodes(1x3) in CNS environment 2) Enable brick multiplexing 3) Reboot one container 4) Run below command for v in ‛gluster v list‛ do glfsheal $v | grep -i "transport" done After apply the patch command should not fail. Note: A big thanks to Atin for suggest the fix. BUG: 1478710 Change-Id: I8e1bd6132122b3a5b0dd49606cea564122f2609b Signed-off-by: Mohit Agrawal <moagrawa> Reviewed-on: https://review.gluster.org/17984 Reviewed-by: Atin Mukherjee <amukherj> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jeff.us> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report. glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html [2] https://www.gluster.org/pipermail/gluster-users/ |