+++ This bug was initially created as a clone of Bug #1704851 +++ Description of problem: ======================= On upgrade to glusterfs master, self heal deamon fails to come up on a brickmux setup. Version-Release number of selected component (if applicable): ============================================================ master How reproducible: ================ The issue is not consistent but you saw it 3/6 times Steps to Reproduce: ================== 1.Upgraded node from 4 to master 2.Started glusterd 3.shd fails to come up Another way to reproduce ========================== On a 3.5.0 setup with brick-mux enabled 1.pkill glusterfsd 2.pkill glusterfs 3.systemctl stop glusterd 4.systemctl start glusterd Actual results: =============== self heal deamon should come up Expected results: ================ self heal deamon not coming up Additional info: ================ [root@dhcp43-102 ~]# gluster v status Status of volume: disperse-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.43.44:/gluster/brick1/ec1 49152 0 Y 32720 Brick 10.70.42.80:/gluster/brick1/ec2 49152 0 Y 31885 Brick 10.70.43.116:/gluster/brick1/ec3 49152 0 Y 24287 Brick 10.70.43.211:/gluster/brick1/ec4 49152 0 Y 445 Brick 10.70.35.15:/gluster/brick1/ec5 49152 0 Y 4430 Brick 10.70.43.102:/gluster/brick1/ec6 49152 0 Y 1773 Brick 10.70.43.44:/gluster/brick1/ec7 49152 0 Y 32720 Brick 10.70.42.80:/gluster/brick1/ec8 49152 0 Y 31885 Brick 10.70.43.116:/gluster/brick1/ec9 49152 0 Y 24287 Brick 10.70.43.211:/gluster/brick1/ec10 49152 0 Y 445 Brick 10.70.35.15:/gluster/brick1/ec11 49152 0 Y 4430 Brick 10.70.43.102:/gluster/brick1/ec12 49152 0 Y 1773 Brick 10.70.43.44:/gluster/brick1/ec13 49152 0 Y 32720 Brick 10.70.42.80:/gluster/brick1/ec14 49152 0 Y 31885 Brick 10.70.43.116:/gluster/brick1/ec15 49152 0 Y 24287 Brick 10.70.43.211:/gluster/brick1/ec16 49152 0 Y 445 Brick 10.70.35.15:/gluster/brick1/ec17 49152 0 Y 4430 Brick 10.70.43.102:/gluster/brick1/ec18 49152 0 Y 1773 Self-heal Daemon on localhost N/A N/A N N/A Self-heal Daemon on 10.70.42.80 N/A N/A Y 695 Self-heal Daemon on 10.70.43.211 N/A N/A Y 434 Self-heal Daemon on dhcp35-15.lab.eng.blr.r edhat.com N/A N/A Y 5441 Self-heal Daemon on 10.70.43.116 N/A N/A Y 1738 Self-heal Daemon on 10.70.43.44 N/A N/A Y 302 Task Status of Volume disperse-vol ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp43-102 ~]# rpm -qa|grep gluster [root@dhcp43-102 ~]# [root@dhcp43-102 ~]# [root@dhcp43-102 ~]# gluster v info Volume Name: disperse-vol Type: Distributed-Disperse Volume ID: 6d36d014-8c14-4866-9e39-4d8e42a8b657 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (4 + 2) = 18 Transport-type: tcp Bricks: Brick1: 10.70.43.44:/gluster/brick1/ec1 Brick2: 10.70.42.80:/gluster/brick1/ec2 Brick3: 10.70.43.116:/gluster/brick1/ec3 Brick4: 10.70.43.211:/gluster/brick1/ec4 Brick5: 10.70.35.15:/gluster/brick1/ec5 Brick6: 10.70.43.102:/gluster/brick1/ec6 Brick7: 10.70.43.44:/gluster/brick1/ec7 Brick8: 10.70.42.80:/gluster/brick1/ec8 Brick9: 10.70.43.116:/gluster/brick1/ec9 Brick10: 10.70.43.211:/gluster/brick1/ec10 Brick11: 10.70.35.15:/gluster/brick1/ec11 Brick12: 10.70.43.102:/gluster/brick1/ec12 Brick13: 10.70.43.44:/gluster/brick1/ec13 Brick14: 10.70.42.80:/gluster/brick1/ec14 Brick15: 10.70.43.116:/gluster/brick1/ec15 Brick16: 10.70.43.211:/gluster/brick1/ec16 Brick17: 10.70.35.15:/gluster/brick1/ec17 Brick18: 10.70.43.102:/gluster/brick1/ec18 Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.brick-multiplex: enable [root@dhcp43-102 ~]# Had shared the setup with rafi . He looked into it and gave me a custom build to test in which i am not seeing the issue
REVIEW: https://review.gluster.org/22667 (shd/glusterd: Serialize shd manager to prevent race condition) posted (#2) for review on master by mohammed rafi kc
REVIEW: https://review.gluster.org/22667 (shd/glusterd: Serialize shd manager to prevent race condition) merged (#7) on master by Atin Mukherjee