Bug 1479662

Summary:	when gluster pod is restarted, bricks from the restarted pod fails to connect to fuse, self-heal etc
Product:	[Community] GlusterFS	Reporter:	Atin Mukherjee <amukherj>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	3.12	CC:	akhakhar, amukherj, annair, bugs, hchiramm, jarrpa, kramdoss, madam, mliyazud, moagrawa, mzywusko, pasik, pprakash, rcyriac, rhs-bugs, rkavunga, rreddy, rtalur, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.12.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1478710	Environment:
Last Closed:	2017-09-05 17:38:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1478710
Bug Blocks:

Description Atin Mukherjee 2017-08-09 05:26:04 UTC

+++ This bug was initially created as a clone of Bug #1478710 +++

+++ This bug was initially created as a clone of Bug #1477024 +++

+++ This bug was initially created as a clone of Bug #1477020 +++

Description of problem:
when one gluster pod is restarted on a CNS deployment with 3 gluster pods with around 100 volumes mounted to 100 app pods, brick from the restarted pod fails to connect to mount, self-healing daemons.

As a result, Any new write to the mount fails to get written on the new brick. 

This issue is seen on all the 100 volumes in the Trusted Storage Pool.

Following error messages are seen in the brick logs.

[2017-08-01 02:59:35.247187] E [server-helpers.c:388:server_alloc_frame] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7effacfdb8c5] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x289cb) [0x7eff8dc659cb] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0xe064) [0x7eff8dc4b064] ) 0-server: invalid argument: client [Invalid argument]
[2017-08-01 02:59:35.247334] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2017-08-01 04:39:29.200776] E [server-helpers.c:388:server_alloc_frame] (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7effacfdb8c5] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x289cb) [0x7eff8dc659cb] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0xe064) [0x7eff8dc4b064] ) 0-server: invalid argument: client [Invalid argument]
[2017-08-01 04:39:29.200829] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully

gluster vol status shows that all bricks are up.

gluster v status vol_fe3995a5e9b186486e7d01a326b296d4 
Status of volume: vol_fe3995a5e9b186486e7d01a326b296d4
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.201:/var/lib/heketi/mounts/v
g_57416b0c6c42778c9fcc913f3e1aa6a0/brick_53
ae1a82ee4d7858018d1c53f3c61865/brick        49152     0          Y       810  
Brick 10.70.46.203:/var/lib/heketi/mounts/v
g_e57848f756a6fd3b559c7ab5d0f026ed/brick_49
9e2845415a6d0337871206664c55b3/brick        49152     0          Y       1017 
Brick 10.70.46.197:/var/lib/heketi/mounts/v
g_6fb7232af84e00b7c23ffdf9a825e355/brick_f7
7473532b0f3f483fbe7f5ac5c67811/brick        49152     0          Y       1041 
Self-heal Daemon on localhost               N/A       N/A        Y       819  
Self-heal Daemon on 10.70.46.203            N/A       N/A        Y       57006
Self-heal Daemon on 10.70.46.197            N/A       N/A        Y       57409
 
Task Status of Volume vol_fe3995a5e9b186486e7d01a326b296d4
------------------------------------------------------------------------------
There are no active volume tasks

In the above test, gluster pod running on node 10.70.46.201 was restarted.

Version-Release number of selected component (if applicable):
glusterfs-3.8.4-35.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. create a cns setup with 100 app pods consuming 100 pvc
2. restart one of the three gluster pod

Actual results:
brick process fails to connect to fuse mount or self-heal 

Expected results:
brick process should connect to fuse mount, self-heal should get triggered automatically

Additional info:
Logs shall be attached shortly

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-08-01 01:37:46 EDT ---

This bug is automatically being proposed for the current release of Container-Native Storage under active development, by setting the release flag 'cns‑3.6.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-08-01 01:42:02 EDT ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from krishnaram Karthick on 2017-08-01 01:50:44 EDT ---

setup is left in the same state for dev to have a look. Please ping me offline for setup details.

This bug could also be related to https://bugzilla.redhat.com/show_bug.cgi?id=1476828. I'd leave it to dev to confirm if the issues seen in both the cases are same.

Proposing this as a blocker as this is a common use case and gluster pod failure in cns should not cause all the volumes to get into this state.

--- Additional comment from Atin Mukherjee on 2017-08-01 03:58:02 EDT ---

Karthick - please do not attach this bug to the in-flight tracker. I know in CNS this is a bit different process. In RHGS, while we ack this bug, the tracker is attached.

--- Additional comment from Rejy M Cyriac on 2017-08-02 03:46:44 EDT ---

At the 'RHGS 3.3.0 - Release Blocker Bug Triage and Status Check' meeting on 02 August, it was decided to ACCEPT this BZ for fix at the RHGS 3.3.0 release

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-08-02 03:46:56 EDT ---

Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal Whiteboard entry of '3.3.0', the Target Release is being automatically set to 'RHGS 3.3.0'

--- Additional comment from Worker Ant on 2017-08-06 08:36:06 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Sometime on cns after pod is restarted client is getting Transport endpoint           error while brick mux is on) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-06 08:42:44 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Sometime on cns after pod is restarted client is getting Transport endpoint           error while brick mux is on) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-06 10:04:20 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Sometime on cns after pod is restarted client is getting Transport endpoint           error while brick mux is on) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-06 11:40:55 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Sometime on cns after pod is restarted client is getting Transport endpoint           error while brick mux is on) posted (#4) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-06 13:29:52 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#5) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-08-06 13:57:52 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#6) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-08-07 01:16:57 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#7) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-07 01:31:19 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#8) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-08 05:26:18 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#9) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-08 06:03:48 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#10) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-08 08:16:44 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#11) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-08 11:29:14 EDT ---

REVIEW: https://review.gluster.org/17984 (glusterd: Block brick attach request till the brick's ctx is set) posted (#12) for review on master by MOHIT AGRAWAL (moagrawa)

--- Additional comment from Worker Ant on 2017-08-08 18:28:34 EDT ---

COMMIT: https://review.gluster.org/17984 committed in master by Jeff Darcy (jeff.us) 
------
commit c13d69babc228a2932994962d6ea8afe2cdd620a
Author: Mohit Agrawal <moagrawa>
Date:   Tue Aug 8 14:36:17 2017 +0530

    glusterd: Block brick attach request till the brick's ctx is set
    
    Problem: In multiplexing setup in a container environment we hit a race
    where before the first brick finishes its handshake with glusterd, the
    subsequent attach requests went through and they actually failed and
    glusterd has no mechanism to realize it. This resulted into all the such
    bricks not to be active resulting into clients not able to connect.
    
    Solution: Introduce a new flag port_registered in glusterd_brickinfo
              to make sure about pmap_signin finish before the subsequent
              attach bricks can be processed.
    
    Test:     To reproduce the issue followed below steps
              1) Create 100 volumes on 3 nodes(1x3) in CNS environment
              2) Enable brick multiplexing
              3) Reboot one container
              4) Run below command
                 for v in ‛gluster v list‛
                 do
                   glfsheal $v | grep -i "transport"
                 done
              After apply the patch command should not fail.
    
    Note:   A big thanks to Atin for suggest the fix.
    
    BUG: 1478710
    Change-Id: I8e1bd6132122b3a5b0dd49606cea564122f2609b
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: https://review.gluster.org/17984
    Reviewed-by: Atin Mukherjee <amukherj>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jeff.us>

Comment 1 Worker Ant 2017-08-09 05:26:52 UTC

REVIEW: https://review.gluster.org/18004 (glusterd: Block brick attach request till the brick's ctx is set) posted (#1) for review on release-3.12 by Atin Mukherjee (amukherj)

Comment 2 Worker Ant 2017-08-12 13:35:30 UTC

COMMIT: https://review.gluster.org/18004 committed in release-3.12 by Shyamsundar Ranganathan (srangana) 
------
commit d66af9ac76f84faa33ecb2eb390656f5637e6fee
Author: Mohit Agrawal <moagrawa>
Date:   Tue Aug 8 14:36:17 2017 +0530

    glusterd: Block brick attach request till the brick's ctx is set
    
    Problem: In multiplexing setup in a container environment we hit a race
    where before the first brick finishes its handshake with glusterd, the
    subsequent attach requests went through and they actually failed and
    glusterd has no mechanism to realize it. This resulted into all the such
    bricks not to be active resulting into clients not able to connect.
    
    Solution: Introduce a new flag port_registered in glusterd_brickinfo
              to make sure about pmap_signin finish before the subsequent
              attach bricks can be processed.
    
    Test:     To reproduce the issue followed below steps
              1) Create 100 volumes on 3 nodes(1x3) in CNS environment
              2) Enable brick multiplexing
              3) Reboot one container
              4) Run below command
                 for v in ‛gluster v list‛
                 do
                   glfsheal $v | grep -i "transport"
                 done
              After apply the patch command should not fail.
    
    Note:   A big thanks to Atin for suggest the fix.
    
    >Reviewed-on: https://review.gluster.org/17984
    >Reviewed-by: Atin Mukherjee <amukherj>
    >Smoke: Gluster Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Jeff Darcy <jeff.us>
    >(cherry picked from commit c13d69babc228a2932994962d6ea8afe2cdd620a)
    
    BUG: 1479662
    Change-Id: I8e1bd6132122b3a5b0dd49606cea564122f2609b
    Signed-off-by: Mohit Agrawal <moagrawa>
    Reviewed-on: https://review.gluster.org/18004
    Tested-by: Atin Mukherjee <amukherj>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 3 Shyamsundar 2017-09-05 17:38:34 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/