Bug 1707081 - Self heal daemon not coming up after upgrade to glusterfs-6.0-2 (intermittently) on a brick mux setup
Summary: Self heal daemon not coming up after upgrade to glusterfs-6.0-2 (intermittent...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact: Mohammed Rafi KC
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-06 18:25 UTC by Mohammed Rafi KC
Modified: 2019-06-13 08:27 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1704851
Environment:
Last Closed: 2019-05-10 14:20:03 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 22667 0 None Merged shd/glusterd: Serialize shd manager to prevent race condition 2019-05-10 14:20:02 UTC

Description Mohammed Rafi KC 2019-05-06 18:25:27 UTC
+++ This bug was initially created as a clone of Bug #1704851 +++

Description of problem:
=======================
On upgrade to glusterfs master, self heal deamon fails to come up on a brickmux setup.

Version-Release number of selected component (if applicable):
============================================================
master


How reproducible:
================
The issue is not consistent but you saw it 3/6 times


Steps to Reproduce:
==================
1.Upgraded node from 4 to master
2.Started glusterd
3.shd fails to come up

Another way to reproduce
==========================
On a 3.5.0 setup with brick-mux enabled
1.pkill glusterfsd
2.pkill glusterfs
3.systemctl stop glusterd
4.systemctl start glusterd

Actual results:
===============
self heal deamon should come up


Expected results:
================
self heal deamon not coming up

Additional info:
================

[root@dhcp43-102 ~]# gluster v status
Status of volume: disperse-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.44:/gluster/brick1/ec1       49152     0          Y       32720
Brick 10.70.42.80:/gluster/brick1/ec2       49152     0          Y       31885
Brick 10.70.43.116:/gluster/brick1/ec3      49152     0          Y       24287
Brick 10.70.43.211:/gluster/brick1/ec4      49152     0          Y       445  
Brick 10.70.35.15:/gluster/brick1/ec5       49152     0          Y       4430 
Brick 10.70.43.102:/gluster/brick1/ec6      49152     0          Y       1773 
Brick 10.70.43.44:/gluster/brick1/ec7       49152     0          Y       32720
Brick 10.70.42.80:/gluster/brick1/ec8       49152     0          Y       31885
Brick 10.70.43.116:/gluster/brick1/ec9      49152     0          Y       24287
Brick 10.70.43.211:/gluster/brick1/ec10     49152     0          Y       445  
Brick 10.70.35.15:/gluster/brick1/ec11      49152     0          Y       4430 
Brick 10.70.43.102:/gluster/brick1/ec12     49152     0          Y       1773 
Brick 10.70.43.44:/gluster/brick1/ec13      49152     0          Y       32720
Brick 10.70.42.80:/gluster/brick1/ec14      49152     0          Y       31885
Brick 10.70.43.116:/gluster/brick1/ec15     49152     0          Y       24287
Brick 10.70.43.211:/gluster/brick1/ec16     49152     0          Y       445  
Brick 10.70.35.15:/gluster/brick1/ec17      49152     0          Y       4430 
Brick 10.70.43.102:/gluster/brick1/ec18     49152     0          Y       1773 
Self-heal Daemon on localhost               N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.42.80             N/A       N/A        Y       695  
Self-heal Daemon on 10.70.43.211            N/A       N/A        Y       434  
Self-heal Daemon on dhcp35-15.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       5441 
Self-heal Daemon on 10.70.43.116            N/A       N/A        Y       1738 
Self-heal Daemon on 10.70.43.44             N/A       N/A        Y       302  
 
Task Status of Volume disperse-vol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp43-102 ~]# rpm -qa|grep gluster
[root@dhcp43-102 ~]# 
[root@dhcp43-102 ~]# 
[root@dhcp43-102 ~]# gluster v info
 
Volume Name: disperse-vol
Type: Distributed-Disperse
Volume ID: 6d36d014-8c14-4866-9e39-4d8e42a8b657
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (4 + 2) = 18
Transport-type: tcp
Bricks:
Brick1: 10.70.43.44:/gluster/brick1/ec1
Brick2: 10.70.42.80:/gluster/brick1/ec2
Brick3: 10.70.43.116:/gluster/brick1/ec3
Brick4: 10.70.43.211:/gluster/brick1/ec4
Brick5: 10.70.35.15:/gluster/brick1/ec5
Brick6: 10.70.43.102:/gluster/brick1/ec6
Brick7: 10.70.43.44:/gluster/brick1/ec7
Brick8: 10.70.42.80:/gluster/brick1/ec8
Brick9: 10.70.43.116:/gluster/brick1/ec9
Brick10: 10.70.43.211:/gluster/brick1/ec10
Brick11: 10.70.35.15:/gluster/brick1/ec11
Brick12: 10.70.43.102:/gluster/brick1/ec12
Brick13: 10.70.43.44:/gluster/brick1/ec13
Brick14: 10.70.42.80:/gluster/brick1/ec14
Brick15: 10.70.43.116:/gluster/brick1/ec15
Brick16: 10.70.43.211:/gluster/brick1/ec16
Brick17: 10.70.35.15:/gluster/brick1/ec17
Brick18: 10.70.43.102:/gluster/brick1/ec18
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
[root@dhcp43-102 ~]# 


Had shared the setup with rafi .
He looked into it and gave me a custom build to test in which i am not seeing the issue

Comment 1 Worker Ant 2019-05-06 18:29:03 UTC
REVIEW: https://review.gluster.org/22667 (shd/glusterd: Serialize shd manager to prevent race condition) posted (#2) for review on master by mohammed rafi  kc

Comment 2 Worker Ant 2019-05-10 14:20:03 UTC
REVIEW: https://review.gluster.org/22667 (shd/glusterd: Serialize shd manager to prevent race condition) merged (#7) on master by Atin Mukherjee


Note You need to log in before you can comment on or make changes to this bug.