Bug 1451559 - Brick Multiplexing: During Remove brick when glusterd of a node is stopped, the brick process gets disconnected from glusterd purview and hence losing multiplexing feature
Summary: Brick Multiplexing: During Remove brick when glusterd of a node is stopped, t...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard: brick-multiplexing
Depends On:
Blocks: 1442787
TreeView+ depends on / blocked
 
Reported: 2017-05-17 04:32 UTC by Atin Mukherjee
Modified: 2018-08-29 03:18 UTC (History)
9 users (show)

Fixed In Version: glusterfs-4.1.3 (or higher)
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1442787
Environment:
Last Closed: 2018-08-29 03:18:13 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Atin Mukherjee 2017-05-17 04:32:55 UTC
+++ This bug was initially created as a clone of Bug #1442787 +++

Description of problem:
======================
When a glusterd on a node goes down during a remove brick operation, the context of the brick is lost from glusterd even on restart of glusterd and the volume status shows as the brick process to be down.
However the brick is still up (which glusterd can't capture) as IOs were still progressing.
Now, if we create a new volume, then the volume takes a new brick process hence losing multiplexing feature

Version-Release number of selected component (if applicable):
========
3.8.4-22

How reproducible:
===================
2/2

Steps to Reproduce:
1.have a 6 node setup say n1..n6
2.created about 20 volume all 1x3 say v1..v20 and started the volumes...note that the bricks used were from n1..n3(say b1,b2,b3)
3.now added bricks from n4..n6(say b4,b5,b6) to make all vols 2x3
4. now triggered a rebalance -->successful(note that there were no IOs going on and nothing to rebalance actually)
5. now triggered remove-brick for all volumes in a loop for removing b1,b2,b3(ie from n1,n2,n3 resp)
6.while this was going on, I stopped glusterd of n3

following was the error message

[root@dhcp35-45 ~]# for i in $(gluster v list);do gluster v remove-brick $i dhcp35-45.lab.eng.blr.redhat.com:/rhs/brick3/$i dhcp35-130.lab.eng.blr.redhat.com:/rhs/brick3/$i dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/$i start;done
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_41. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_42. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_43. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_44. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_45. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_46. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_47. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_48. Either commit it or stop it before starting a new task.
volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_49
volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_50
volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_51
volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_52



7. Now, post that when we start the glusterd , the volume still shows as the b3 process as down

[root@dhcp35-45 ~]# gluster v status cross3_50
^[[AStatus of volume: cross3_50
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-45.lab.eng.blr.redhat.com:/rhs
/brick3/cross3_50                           49152     0          Y       21951
Brick dhcp35-130.lab.eng.blr.redhat.com:/rh
s/brick3/cross3_50                          49152     0          Y       25284
Brick dhcp35-122.lab.eng.blr.redhat.com:/rh
s/brick3/cross3_50                          N/A       N/A        N       N/A  
Brick dhcp35-23.lab.eng.blr.redhat.com:/rhs
/brick3/cross3_50                           49152     0          Y       32125
Brick dhcp35-112.lab.eng.blr.redhat.com:/rh
s/brick3/cross3_50                          49152     0          Y       17448
Brick dhcp35-138.lab.eng.blr.redhat.com:/rh
s/brick3/cross3_50                          49152     0          Y       31773
Self-heal Daemon on localhost               N/A       N/A        Y       26567
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       2581 
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       20279
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       2215 
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       27797
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       29034
 
Task Status of Volume cross3_50
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 4736e5c1-1263-4a05-b1b8-354d3b84b966
Status               : completed           




8. However the brick process is still available on the node n3:
[root@dhcp35-122 glusterfs]# ps -ef|grep glusterfsd
root     25352     1  0 19:04 ?        00:00:06 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id cross3_41.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick3-cross3_41 -p /var/lib/glusterd/vols/cross3_41/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick3-cross3_41.pid -S /var/lib/glusterd/vols/cross3_41/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick3/cross3_41 -l /var/log/glusterfs/bricks/rhs-brick3-cross3_41.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49152 --xlator-option cross3_41-server.listen-port=49152


I created a directory and it was created successfully even on b3

Now when We create a new volume with n3 as part of it, a New brick prcoess is spawned hence we lose the brick multiplexing feature





[root@dhcp35-122 glusterfs]# ps -ef|grep glusterfsd
root     25352     1  0 19:04 ?        00:00:06 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id cross3_41.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick3-cross3_41 -p /var/lib/glusterd/vols/cross3_41/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick3-cross3_41.pid -S /var/lib/glusterd/vols/cross3_41/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick3/cross3_41 -l /var/log/glusterfs/bricks/rhs-brick3-cross3_41.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49152 --xlator-option cross3_41-server.listen-port=49152
root     28623     1  0 19:22 ?        00:00:00 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id test.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick2-test -p /var/lib/glusterd/vols/test/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick2-test.pid -S /var/lib/glusterd/vols/test/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick2/test -l /var/log/glusterfs/bricks/rhs-brick2-test.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49153 --xlator-option test-server.listen-port=49153
root     29206 19064  0 19:26 pts/0    00:00:00 grep --color=auto glusterfsd




[root@dhcp35-45 ~]# gluster v status test
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-122.lab.eng.blr.redhat.com:/rh
s/brick2/test                               49153     0          Y       28623
Brick dhcp35-45.lab.eng.blr.redhat.com:/rhs
/brick2/test                                49152     0          Y       21951
Self-heal Daemon on localhost               N/A       N/A        Y       26567
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       2215 
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       29034
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       27797
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       2581 
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       20279
 
Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-45 ~]# gluster v info test
 
Volume Name: test
Type: Replicate
Volume ID: 86a53f44-5a29-4778-90c6-b5d65693ba47
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick2/test
Brick2: dhcp35-45.lab.eng.blr.redhat.com:/rhs/brick2/test
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-04-17 10:12:01 EDT ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from nchilaka on 2017-04-17 10:16:35 EDT ---

logs at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1442787/

refer 35.45 for command logs

--- Additional comment from Gaurav Yadav on 2017-04-18 04:50:39 EDT ---

Not able to access the logs file, while accessing we are getting error 
"You don't have permission to access /sosreports/nchilaka/bug.1442787/log/glusterfs/glusterd.log on this server."

Please update the permission so that we can access the logs.

--- Additional comment from Atin Mukherjee on 2017-04-21 02:38:01 EDT ---

refer https://bugzilla.redhat.com/show_bug.cgi?id=1443991#c6 for initial analysis.

--- Additional comment from Atin Mukherjee on 2017-04-23 23:54:29 EDT ---

upstream patch : https://review.gluster.org/#/c/17101/

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-04-25 00:34:28 EDT ---

This bug is automatically being provided 'pm_ack+' for the release flag 'rhgs‑3.3.0', the current release of Red Hat Gluster Storage 3 under active development, having been appropriately marked for the release, and having been provided ACK from Development and QE

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-04-25 07:26:47 EDT ---

Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal Whiteboard entry of '3.3.0', the Target Release is being automatically set to 'RHGS 3.3.0'

--- Additional comment from Atin Mukherjee on 2017-05-09 02:35:35 EDT ---

Upstream patches : https://review.gluster.org/#/q/topic:bug-1444596

Downstream patches:

https://code.engineering.redhat.com/gerrit/#/c/105595/
https://code.engineering.redhat.com/gerrit/#/c/105596/

--- Additional comment from errata-xmlrpc on 2017-05-10 10:01:47 EDT ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHEA-2017:27703-03
https://errata.devel.redhat.com/advisory/27703

--- Additional comment from nchilaka on 2017-05-16 05:47:14 EDT ---

Still seeing the problem on 3.8.4-25

[root@dhcp35-45 ~]# for i in {1..10};do VOL=aus;echo $VOL;gluster  v remove-brick $VOL-$i rep 3 10.70.35.45:/rhs/brick$i/$VOL-$i 10.70.35.130:/rhs/brick$i/$VOL-$i 10.70.35.122:/rhs/brick$i/$VOL-$i start;done
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Commit failed on 10.70.35.130. Please check log file for details. ===============>it failed for volume "aus-5"
aus
volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick6/aus-6 is down
aus
volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick7/aus-7 is down
aus
volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick8/aus-8 is down
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
aus
volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly.
[root@dhcp35-45 ~]# gluster v status



created a new vol post glusterd start., it can be seen that a new pid is started for brick processs


[root@dhcp35-45 ~]# for i in 11;do VOL=aus;echo $VOL;gluster  v create $VOL-$i rep 3 10.70.35.45:/rhs/brick$i/$VOL-$i 10.70.35.130:/rhs/brick$i/$VOL-$i 10.70.35.122:/rhs/brick$i/$VOL-$i;doneaus
volume create: aus-11: success: please start the volume to access data
[root@dhcp35-45 ~]# gluster v start aus-11
gluster v statuvolume start: aus-11: success
[root@dhcp35-45 ~]# gluster v status aus-11
Status of volume: aus-11
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.45:/rhs/brick11/aus-11       49152     0          Y       31813
Brick 10.70.35.130:/rhs/brick11/aus-11      49153     0          Y       5451 
Brick 10.70.35.122:/rhs/brick11/aus-11      49152     0          Y       3650 
Self-heal Daemon on localhost               N/A       N/A        Y       1494 
Self-heal Daemon on 10.70.35.23             N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.35.130            N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       4741 
 
Task Status of Volume aus-11
------------------------------------------------------------------------------
There are no active volume tasks



old vols:
Status of volume: aus-9
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.45:/rhs/brick9/aus-9         49152     0          Y       31813
Brick 10.70.35.130:/rhs/brick9/aus-9        49152     0          Y       3907 
Brick 10.70.35.122:/rhs/brick9/aus-9        49152     0          Y       3650 
Self-heal Daemon on localhost               N/A       N/A        Y       382  
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       32740
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       4458 
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       5199 
 
Task Status of Volume aus-9
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-45 ~]#

--- Additional comment from Atin Mukherjee on 2017-05-16 13:30:38 EDT ---

You can mark this failedQA. Fix is not working.

--- Additional comment from Atin Mukherjee on 2017-05-16 15:58:14 EDT ---

So our initial RCA was wrong as we were suspecting this to be a pidfile issue.

I've an easier reproducer:

1. Enable brick mux.
2. Create two volumes and start them
3. Restart GlusterD
4. Create 3rd volume and start it, here the brick for 3rd volume will pick up a new pid but following that any new bricks would continue to get attached to the brick process which was spawned post glusterd restart.

So in short, post restart of glusterd, the volume which gets created get new pid for its brick and then brick mux starts working onwards.


Here the issue is post restart the flag (brickinfo->started_here) with which we were finding out the compatible brick is lost and is never set to true when glusterd has to just connect to a brick in case if brick process is already running.

Comment 1 Atin Mukherjee 2017-08-09 07:43:54 UTC
https://review.gluster.org/#/c/17307/ fixes this issue.


Note You need to log in before you can comment on or make changes to this bug.