Description of problem: ====================== When a glusterd on a node goes down during a remove brick operation, the context of the brick is lost from glusterd even on restart of glusterd and the volume status shows as the brick process to be down. However the brick is still up (which glusterd can't capture) as IOs were still progressing. Now, if we create a new volume, then the volume takes a new brick process hence losing multiplexing feature Version-Release number of selected component (if applicable): ======== 3.8.4-22 How reproducible: =================== 2/2 Steps to Reproduce: 1.have a 6 node setup say n1..n6 2.created about 20 volume all 1x3 say v1..v20 and started the volumes...note that the bricks used were from n1..n3(say b1,b2,b3) 3.now added bricks from n4..n6(say b4,b5,b6) to make all vols 2x3 4. now triggered a rebalance -->successful(note that there were no IOs going on and nothing to rebalance actually) 5. now triggered remove-brick for all volumes in a loop for removing b1,b2,b3(ie from n1,n2,n3 resp) 6.while this was going on, I stopped glusterd of n3 following was the error message [root@dhcp35-45 ~]# for i in $(gluster v list);do gluster v remove-brick $i dhcp35-45.lab.eng.blr.redhat.com:/rhs/brick3/$i dhcp35-130.lab.eng.blr.redhat.com:/rhs/brick3/$i dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/$i start;done volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_41. Either commit it or stop it before starting a new task. volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_42. Either commit it or stop it before starting a new task. volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_43. Either commit it or stop it before starting a new task. volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_44. Either commit it or stop it before starting a new task. volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_45. Either commit it or stop it before starting a new task. volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_46. Either commit it or stop it before starting a new task. volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_47. Either commit it or stop it before starting a new task. volume remove-brick start: failed: An earlier remove-brick task exists for volume cross3_48. Either commit it or stop it before starting a new task. volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_49 volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_50 volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_51 volume remove-brick start: failed: Staging failed on 10.70.35.122. Error: Found stopped brick dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick3/cross3_52 7. Now, post that when we start the glusterd , the volume still shows as the b3 process as down [root@dhcp35-45 ~]# gluster v status cross3_50 ^[[AStatus of volume: cross3_50 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp35-45.lab.eng.blr.redhat.com:/rhs /brick3/cross3_50 49152 0 Y 21951 Brick dhcp35-130.lab.eng.blr.redhat.com:/rh s/brick3/cross3_50 49152 0 Y 25284 Brick dhcp35-122.lab.eng.blr.redhat.com:/rh s/brick3/cross3_50 N/A N/A N N/A Brick dhcp35-23.lab.eng.blr.redhat.com:/rhs /brick3/cross3_50 49152 0 Y 32125 Brick dhcp35-112.lab.eng.blr.redhat.com:/rh s/brick3/cross3_50 49152 0 Y 17448 Brick dhcp35-138.lab.eng.blr.redhat.com:/rh s/brick3/cross3_50 49152 0 Y 31773 Self-heal Daemon on localhost N/A N/A Y 26567 Self-heal Daemon on 10.70.35.23 N/A N/A Y 2581 Self-heal Daemon on 10.70.35.112 N/A N/A Y 20279 Self-heal Daemon on 10.70.35.138 N/A N/A Y 2215 Self-heal Daemon on 10.70.35.130 N/A N/A Y 27797 Self-heal Daemon on 10.70.35.122 N/A N/A Y 29034 Task Status of Volume cross3_50 ------------------------------------------------------------------------------ Task : Rebalance ID : 4736e5c1-1263-4a05-b1b8-354d3b84b966 Status : completed 8. However the brick process is still available on the node n3: [root@dhcp35-122 glusterfs]# ps -ef|grep glusterfsd root 25352 1 0 19:04 ? 00:00:06 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id cross3_41.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick3-cross3_41 -p /var/lib/glusterd/vols/cross3_41/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick3-cross3_41.pid -S /var/lib/glusterd/vols/cross3_41/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick3/cross3_41 -l /var/log/glusterfs/bricks/rhs-brick3-cross3_41.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49152 --xlator-option cross3_41-server.listen-port=49152 I created a directory and it was created successfully even on b3 Now when We create a new volume with n3 as part of it, a New brick prcoess is spawned hence we lose the brick multiplexing feature [root@dhcp35-122 glusterfs]# ps -ef|grep glusterfsd root 25352 1 0 19:04 ? 00:00:06 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id cross3_41.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick3-cross3_41 -p /var/lib/glusterd/vols/cross3_41/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick3-cross3_41.pid -S /var/lib/glusterd/vols/cross3_41/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick3/cross3_41 -l /var/log/glusterfs/bricks/rhs-brick3-cross3_41.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49152 --xlator-option cross3_41-server.listen-port=49152 root 28623 1 0 19:22 ? 00:00:00 /usr/sbin/glusterfsd -s dhcp35-122.lab.eng.blr.redhat.com --volfile-id test.dhcp35-122.lab.eng.blr.redhat.com.rhs-brick2-test -p /var/lib/glusterd/vols/test/run/dhcp35-122.lab.eng.blr.redhat.com-rhs-brick2-test.pid -S /var/lib/glusterd/vols/test/run/daemon-dhcp35-122.lab.eng.blr.redhat.com.socket --brick-name /rhs/brick2/test -l /var/log/glusterfs/bricks/rhs-brick2-test.log --xlator-option *-posix.glusterd-uuid=b83b6b2a-ad07-4b76-bada-8ab0434169dd --brick-port 49153 --xlator-option test-server.listen-port=49153 root 29206 19064 0 19:26 pts/0 00:00:00 grep --color=auto glusterfsd [root@dhcp35-45 ~]# gluster v status test Status of volume: test Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp35-122.lab.eng.blr.redhat.com:/rh s/brick2/test 49153 0 Y 28623 Brick dhcp35-45.lab.eng.blr.redhat.com:/rhs /brick2/test 49152 0 Y 21951 Self-heal Daemon on localhost N/A N/A Y 26567 Self-heal Daemon on 10.70.35.138 N/A N/A Y 2215 Self-heal Daemon on 10.70.35.122 N/A N/A Y 29034 Self-heal Daemon on 10.70.35.130 N/A N/A Y 27797 Self-heal Daemon on 10.70.35.23 N/A N/A Y 2581 Self-heal Daemon on 10.70.35.112 N/A N/A Y 20279 Task Status of Volume test ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-45 ~]# gluster v info test Volume Name: test Type: Replicate Volume ID: 86a53f44-5a29-4778-90c6-b5d65693ba47 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: dhcp35-122.lab.eng.blr.redhat.com:/rhs/brick2/test Brick2: dhcp35-45.lab.eng.blr.redhat.com:/rhs/brick2/test Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.brick-multiplex: enable
logs at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1442787/ refer 35.45 for command logs
Not able to access the logs file, while accessing we are getting error "You don't have permission to access /sosreports/nchilaka/bug.1442787/log/glusterfs/glusterd.log on this server." Please update the permission so that we can access the logs.
refer https://bugzilla.redhat.com/show_bug.cgi?id=1443991#c6 for initial analysis.
upstream patch : https://review.gluster.org/#/c/17101/
Upstream patches : https://review.gluster.org/#/q/topic:bug-1444596 Downstream patches: https://code.engineering.redhat.com/gerrit/#/c/105595/ https://code.engineering.redhat.com/gerrit/#/c/105596/
Still seeing the problem on 3.8.4-25 [root@dhcp35-45 ~]# for i in {1..10};do VOL=aus;echo $VOL;gluster v remove-brick $VOL-$i rep 3 10.70.35.45:/rhs/brick$i/$VOL-$i 10.70.35.130:/rhs/brick$i/$VOL-$i 10.70.35.122:/rhs/brick$i/$VOL-$i start;done aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Commit failed on 10.70.35.130. Please check log file for details. ===============>it failed for volume "aus-5" aus volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick6/aus-6 is down aus volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick7/aus-7 is down aus volume remove-brick start: failed: Host node of the brick 10.70.35.130:/rhs/brick8/aus-8 is down aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. aus volume remove-brick start: failed: Removing bricks from replicate configuration is not allowed without reducing replica count explicitly. [root@dhcp35-45 ~]# gluster v status created a new vol post glusterd start., it can be seen that a new pid is started for brick processs [root@dhcp35-45 ~]# for i in 11;do VOL=aus;echo $VOL;gluster v create $VOL-$i rep 3 10.70.35.45:/rhs/brick$i/$VOL-$i 10.70.35.130:/rhs/brick$i/$VOL-$i 10.70.35.122:/rhs/brick$i/$VOL-$i;doneaus volume create: aus-11: success: please start the volume to access data [root@dhcp35-45 ~]# gluster v start aus-11 gluster v statuvolume start: aus-11: success [root@dhcp35-45 ~]# gluster v status aus-11 Status of volume: aus-11 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick11/aus-11 49152 0 Y 31813 Brick 10.70.35.130:/rhs/brick11/aus-11 49153 0 Y 5451 Brick 10.70.35.122:/rhs/brick11/aus-11 49152 0 Y 3650 Self-heal Daemon on localhost N/A N/A Y 1494 Self-heal Daemon on 10.70.35.23 N/A N/A N N/A Self-heal Daemon on 10.70.35.130 N/A N/A N N/A Self-heal Daemon on 10.70.35.122 N/A N/A Y 4741 Task Status of Volume aus-11 ------------------------------------------------------------------------------ There are no active volume tasks old vols: Status of volume: aus-9 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick9/aus-9 49152 0 Y 31813 Brick 10.70.35.130:/rhs/brick9/aus-9 49152 0 Y 3907 Brick 10.70.35.122:/rhs/brick9/aus-9 49152 0 Y 3650 Self-heal Daemon on localhost N/A N/A Y 382 Self-heal Daemon on 10.70.35.23 N/A N/A Y 32740 Self-heal Daemon on 10.70.35.122 N/A N/A Y 4458 Self-heal Daemon on 10.70.35.130 N/A N/A Y 5199 Task Status of Volume aus-9 ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-45 ~]#
So our initial RCA was wrong as we were suspecting this to be a pidfile issue. I've an easier reproducer: 1. Enable brick mux. 2. Create two volumes and start them 3. Restart GlusterD 4. Create 3rd volume and start it, here the brick for 3rd volume will pick up a new pid but following that any new bricks would continue to get attached to the brick process which was spawned post glusterd restart. So in short, post restart of glusterd, the volume which gets created get new pid for its brick and then brick mux starts working onwards. Here the issue is post restart the flag (brickinfo->started_here) with which we were finding out the compatible brick is lost and is never set to true when glusterd has to just connect to a brick in case if brick process is already running.
Upstream patch : https://review.gluster.org/#/c/17307/
downstream patch : downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106803/
validation on 3.8.4-27 retired steps mentioned as when this bug was raised(in summary) and also steps in comment#12 I don't see the problem anymore . hence moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774