Bug 1442787

Summary:	Brick Multiplexing: During Remove brick when glusterd of a node is stopped, the brick process gets disconnected from glusterd purview and hence losing multiplexing feature
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	glusterd	Assignee:	Samikshan Bairagya <sbairagy>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, gyadav, moagrawa, nchilaka, rcyriac, rhinduja, rhs-bugs, storage-qa-internal, vbellur
Target Milestone:	---
Target Release:	RHGS 3.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	brick-multiplexing
Fixed In Version:	glusterfs-3.8.4-26	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1451559 (view as bug list)		Environment:
Last Closed:	2017-09-21 04:37:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1451559
Bug Blocks:	1417151

Description Nag Pavan Chilakam 2017-04-17 14:11:55 UTC

Description of problem:
======================
When a glusterd on a node goes down during a remove brick operation, the context of the brick is lost from glusterd even on restart of glusterd and the volume status shows as the brick process to be down.
However the brick is still up (which glusterd can't capture) as IOs were still progressing.
Now, if we create a new volume, then the volume takes a new brick process hence losing multiplexing feature

Version-Release number of selected component (if applicable):
========
3.8.4-22

How reproducible:
===================
2/2

Steps to Reproduce:
1.have a 6 node setup say n1..n6
2.created about 20 volume all 1x3 say v1..v20 and started the volumes...note that the bricks used were from n1..n3(say b1,b2,b3)
3.now added bricks from n4..n6(say b4,b5,b6) to make all vols 2x3
4. now triggered a rebalance -->successful(note that there were no IOs going on and nothing to rebalance actually)
5. now triggered remove-brick for all volumes in a loop for removing b1,b2,b3(ie from n1,n2,n3 resp)
6.while this was going on, I stopped glusterd of n3

following was the error message

[root@dhcp35-45 ~]# for i in $(gluster v list);do gluster v remove-brick $i dhcp35-45.lab.eng.blr.redhat.com:/rhs/brick3/$i dhcp35-130.lab.eng.blr.redhat.com:/rhs/brick3/$i dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/$i start;done
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_41. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_42. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_43. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_44. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_45. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_46. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_47. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_48. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_49
volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_50
volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_51
volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_52



7. Now, post that when we start the glusterd , the volume still shows as the b3 process as down

[root@dhcp35-45 ~]# gluster v status cross3_50
^[[AStatus of volume: cross3_50
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-45.lab.eng.blr.redhat.com:/rhs
/brick3/cross3_50                           49152     0          Y       21951
Brick dhcp35-130.lab.eng.blr.redhat.com:/rh
s/brick3/cross3_50                          49152     0          Y       25284
Brick dhcp35-122.lab.eng.blr.redhat.com:/rh
s/brick3/cross3_50                          N/A       N/A        N       N/A  
Brick dhcp35-23.lab.eng.blr.redhat.com:/rhs
/brick3/cross3_50                           49152     0          Y       32125
Brick dhcp35-112.lab.eng.blr.redhat.com:/rh
s/brick3/cross3_50                          49152     0          Y       17448
Brick dhcp35-138.lab.eng.blr.redhat.com:/rh
s/brick3/cross3_50                          49152     0          Y       31773
Self-heal Daemon on localhost               N/A       N/A        Y       26567
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       2581 
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       20279
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       2215 
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       27797
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       29034
 
Task Status of Volume cross3_50
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 4736e5c1-1263-4a05-b1b8-354d3b84b966
Status               : completed           




8. However the brick process is still available on the node n3:
[root@dhcp35-122 glusterfs]# ps -ef|grep glusterfsd
root     25352     1  0 19:04 ?        00:00:06 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id cross3_41.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick3-cross3_41 -p /var/lib/glusterd/vols/cross3_41/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick3-cross3_41.pid -S /var/lib/glusterd/vols/cross3_41/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick3/cross3_41 -l /var/log/glusterfs/bricks/rhs-brick3-cross3_41.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49152 --xlator-option cross3_41-server.listen-port=49152


I created a directory and it was created successfully even on b3

Now when We create a new volume with n3 as part of it, a New brick prcoess is spawned hence we lose the brick multiplexing feature





[root@dhcp35-122 glusterfs]# ps -ef|grep glusterfsd
root     25352     1  0 19:04 ?        00:00:06 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id cross3_41.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick3-cross3_41 -p /var/lib/glusterd/vols/cross3_41/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick3-cross3_41.pid -S /var/lib/glusterd/vols/cross3_41/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick3/cross3_41 -l /var/log/glusterfs/bricks/rhs-brick3-cross3_41.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49152 --xlator-option cross3_41-server.listen-port=49152
root     28623     1  0 19:22 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id test.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick2-test -p /var/lib/glusterd/vols/test/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick2-test.pid -S /var/lib/glusterd/vols/test/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick2/test -l /var/log/glusterfs/bricks/rhs-brick2-test.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49153 --xlator-option test-server.listen-port=49153
root     29206 19064  0 19:26 pts/0    00:00:00 grep --color=auto glusterfsd




[root@dhcp35-45 ~]# gluster v status test
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-122.lab.eng.blr.redhat.com:/rh
s/brick2/test                               49153     0          Y       28623
Brick dhcp35-45.lab.eng.blr.redhat.com:/rhs
/brick2/test                                49152     0          Y       21951
Self-heal Daemon on localhost               N/A       N/A        Y       26567
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       2215 
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       29034
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       27797
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       2581 
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       20279
 
Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-45 ~]# gluster v info test
 
Volume Name: test
Type: Replicate
Volume ID: 86a53f44-5a29-4778-90c6-b5d65693ba47
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick2/test
Brick2: dhcp35-45.lab.eng.blr.redhat.com:/rhs/brick2/test
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable

Comment 2 Nag Pavan Chilakam 2017-04-17 14:16:35 UTC

logs at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1442787/

refer 35.45 for command logs

Comment 3 Gaurav Yadav 2017-04-18 08:50:39 UTC

Not able to access the logs file, while accessing we are getting error 
"You don't have permission to access /sosreports/nchilaka/bug.1442787/log/glusterfs/glusterd.log on this server."

Please update the permission so that we can access the logs.

Comment 4 Atin Mukherjee 2017-04-21 06:38:01 UTC

refer https://bugzilla.redhat.com/show_bug.cgi?id=1443991#c6 for initial analysis.

Comment 5 Atin Mukherjee 2017-04-24 03:54:29 UTC

upstream patch : https://review.gluster.org/#/c/17101/

Comment 8 Atin Mukherjee 2017-05-09 06:35:35 UTC

Upstream patches : https://review.gluster.org/#/q/topic:bug-1444596

Downstream patches:

https://code.engineering.redhat.com/gerrit/#/c/105595/
https://code.engineering.redhat.com/gerrit/#/c/105596/

Comment 10 Nag Pavan Chilakam 2017-05-16 09:47:14 UTC

Still seeing the problem on 3.8.4-25

[root@dhcp35-45 ~]# for i in {1..10};do VOL=aus;echo $VOL;gluster  v remove-brick $VOL-$i rep 3 10.70.35.45:/rhs/brick$i/$VOL-$i 10.70.35.130:/rhs/brick$i/$VOL-$i 10.70.35.122:/rhs/brick$i/$VOL-$i start;done
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Commit failed on 10.70.35.130. Please check log file for details. ===============>it failed for volume "aus-5"
aus
volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick6/aus-6 is down
aus
volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick7/aus-7 is down
aus
volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick8/aus-8 is down
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
[root@dhcp35-45 ~]# gluster v status



created a new vol post glusterd start., it can be seen that a new pid is started for brick processs


[root@dhcp35-45 ~]# for i in 11;do VOL=aus;echo $VOL;gluster  v create $VOL-$i rep 3 10.70.35.45:/rhs/brick$i/$VOL-$i 10.70.35.130:/rhs/brick$i/$VOL-$i 10.70.35.122:/rhs/brick$i/$VOL-$i;doneaus
volume create: aus-11: success: please start the volume to access data
[root@dhcp35-45 ~]# gluster v start aus-11
gluster v statuvolume start: aus-11: success
[root@dhcp35-45 ~]# gluster v status aus-11
Status of volume: aus-11
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.45:/rhs/brick11/aus-11       49152     0          Y       31813
Brick 10.70.35.130:/rhs/brick11/aus-11      49153     0          Y       5451 
Brick 10.70.35.122:/rhs/brick11/aus-11      49152     0          Y       3650 
Self-heal Daemon on localhost               N/A       N/A        Y       1494 
Self-heal Daemon on 10.70.35.23             N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.35.130            N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       4741 
 
Task Status of Volume aus-11
------------------------------------------------------------------------------
There are no active volume tasks



old vols:
Status of volume: aus-9
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.45:/rhs/brick9/aus-9         49152     0          Y       31813
Brick 10.70.35.130:/rhs/brick9/aus-9        49152     0          Y       3907 
Brick 10.70.35.122:/rhs/brick9/aus-9        49152     0          Y       3650 
Self-heal Daemon on localhost               N/A       N/A        Y       382  
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       32740
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       4458 
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       5199 
 
Task Status of Volume aus-9
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-45 ~]#

Comment 12 Atin Mukherjee 2017-05-16 19:58:14 UTC

So our initial RCA was wrong as we were suspecting this to be a pidfile issue.

I've an easier reproducer:

1. Enable brick mux.
2. Create two volumes and start them
3. Restart GlusterD
4. Create 3rd volume and start it, here the brick for 3rd volume will pick up a new pid but following that any new bricks would continue to get attached to the brick process which was spawned post glusterd restart.

So in short, post restart of glusterd, the volume which gets created get new pid for its brick and then brick mux starts working onwards.


Here the issue is post restart the flag (brickinfo->started_here) with which we were finding out the compatible brick is lost and is never set to true when glusterd has to just connect to a brick in case if brick process is already running.

Comment 13 Atin Mukherjee 2017-05-19 03:58:26 UTC

Upstream patch : https://review.gluster.org/#/c/17307/

Comment 14 Atin Mukherjee 2017-05-22 07:36:45 UTC

downstream patch : downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106803/

Comment 15 Nag Pavan Chilakam 2017-06-10 08:10:19 UTC

validation on 3.8.4-27

retired steps mentioned as when this bug was raised(in summary)
and also steps in comment#12

I don't see the problem anymore . hence moving to verified

Comment 17 errata-xmlrpc 2017-09-21 04:37:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774