1582443 – gluster volume status <volname> does not show glustershd status correctly

Bug 1582443 - gluster volume status <volname> does not show glustershd status correctly

Summary: gluster volume status <volname> does not show glustershd status correctly

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Sanju
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1585391
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-25 08:35 UTC by zhou lin
Modified:	2018-10-23 14:14 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-10-23 14:14:07 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description zhou lin 2018-05-25 08:35:09 UTC

Description of problem:

glustershd status is not correctly showed in command "gluster v status <volname>"
Version-Release number of selected component (if applicable):

3.12.3
How reproducible:
isolate sn-0 node by drop all packet comming in/out to/from other sn nodes. for a while then restore network

Steps to Reproduce:
1.isolate sn-0
2. wait 10 seconds
3.restore network
4.execute "gluster v status <volname>"

Actual results:

Status of volume: export
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sn-0.local:/mnt/bricks/export/brick   49154     0          Y       15425
Brick sn-1.local:/mnt/bricks/export/brick   49154     0          Y       3218 
Self-heal Daemon on localhost               N/A       N/A        N       N/A  
Self-heal Daemon on sn-0.local              N/A       N/A        Y       15568
Self-heal Daemon on sn-1.local              N/A       N/A        Y       13719

Task Status of Volume export
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: log
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sn-0.local:/mnt/bricks/log/brick      49155     0          Y       4067 
Brick sn-1.local:/mnt/bricks/log/brick      49155     0          Y       3509 
Self-heal Daemon on localhost               N/A       N/A        N       N/A  
Self-heal Daemon on sn-0.local              N/A       N/A        Y       15568
Self-heal Daemon on sn-1.local              N/A       N/A        Y       13719

Task Status of Volume log
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: mstate
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sn-0.local:/mnt/bricks/mstate/brick   49153     0          Y       3500 
Brick sn-1.local:/mnt/bricks/mstate/brick   49153     0          Y       2970 
Self-heal Daemon on localhost               N/A       N/A        N       N/A  
Self-heal Daemon on sn-0.local              N/A       N/A        Y       15568
Self-heal Daemon on sn-1.local              N/A       N/A        Y       13719

Task Status of Volume mstate
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: services
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sn-0.local:/mnt/bricks/services/brick 49156     0          Y       15442
Brick sn-1.local:/mnt/bricks/services/brick 49152     0          Y       2618 
Self-heal Daemon on localhost               N/A       N/A        N       N/A  
Self-heal Daemon on sn-0.local              N/A       N/A        Y       15568
Self-heal Daemon on sn-1.local              N/A       N/A        Y       13719

Task Status of Volume services

[root@sn-2:/root]
# ps -ef | grep glustershd
root     11142     1  0 14:30 ?        00:00:00 /usr/sbin/glusterfs -s sn-2.local --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/31d6e90b5e65aededb7ada7278c7181a.socket --xlator-option *replicate*.node-uuid=7321b551-5b98-4583-bc0b-887ebae4ba2a
root     21017 16286  0 15:25 pts/2    00:00:00 grep --color=auto glustershd
[root@sn-2:/root]

Expected results:

gluster v status should show glustershd status OK
Additional info:

Comment 1 Atin Mukherjee 2018-07-02 02:47:48 UTC

Sanju - Can you please backport https://review.gluster.org/20131 to release-3.12 branch?

Comment 2 Sanju 2018-07-02 05:44:03 UTC

upstream patch: https://review.gluster.org/#/c/20429/

Comment 3 Worker Ant 2018-07-04 04:04:31 UTC

COMMIT: https://review.gluster.org/20429 committed in release-3.12 by "jiffin tony Thottan" <jthottan> with a commit message- glusterd: gluster v status is showing wrong status for glustershd

When we restart the bricks, connect and disconnect events happen
for glustershd. glusterd use two threads to handle disconnect and
connects events from glustershd. When we restart the bricks we'll
get both disconnect and connect events. So both the threads will
compete for the big lock.

We want disconnect event to finish before connect event. But If
connect thread gets the big lock first, it sets svc->online to
true, and then disconnect thread will et svc->online to false.
So, glustershd will be disconnected from glusterd and wrong status
is shown.

After killing shd, glusterd sleeps for 1 second. To avoid the problem,
If glusterd releses the lock before sleep and acquires it after sleep,
disconnect thread will get a chance to handle the
glusterd_svc_common_rpc_notify before other thread completes connect
event.

>Change-Id: Ie82e823fdfc936feb7c0ae10599297b050ee9986
>Signed-off-by: Sanju Rakonde <srakonde>

Change-Id: Ie82e823fdfc936feb7c0ae10599297b050ee9986
fixes: bz#1582443
Signed-off-by: Sanju Rakonde <srakonde>

Comment 4 Shyamsundar 2018-10-23 14:14:07 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.12, please open a new bug report.

glusterfs-3.12.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-July/000105.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.