Description of problem: glustershd status is not correctly showed in command "gluster v status <volname>" Version-Release number of selected component (if applicable): 3.12.3 How reproducible: isolate sn-0 node by drop all packet comming in/out to/from other sn nodes. for a while then restore network Steps to Reproduce: 1.isolate sn-0 2. wait 10 seconds 3.restore network 4.execute "gluster v status <volname>" Actual results: Status of volume: export Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick sn-0.local:/mnt/bricks/export/brick 49154 0 Y 15425 Brick sn-1.local:/mnt/bricks/export/brick 49154 0 Y 3218 Self-heal Daemon on localhost N/A N/A N N/A Self-heal Daemon on sn-0.local N/A N/A Y 15568 Self-heal Daemon on sn-1.local N/A N/A Y 13719 Task Status of Volume export ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: log Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick sn-0.local:/mnt/bricks/log/brick 49155 0 Y 4067 Brick sn-1.local:/mnt/bricks/log/brick 49155 0 Y 3509 Self-heal Daemon on localhost N/A N/A N N/A Self-heal Daemon on sn-0.local N/A N/A Y 15568 Self-heal Daemon on sn-1.local N/A N/A Y 13719 Task Status of Volume log ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: mstate Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick sn-0.local:/mnt/bricks/mstate/brick 49153 0 Y 3500 Brick sn-1.local:/mnt/bricks/mstate/brick 49153 0 Y 2970 Self-heal Daemon on localhost N/A N/A N N/A Self-heal Daemon on sn-0.local N/A N/A Y 15568 Self-heal Daemon on sn-1.local N/A N/A Y 13719 Task Status of Volume mstate ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: services Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick sn-0.local:/mnt/bricks/services/brick 49156 0 Y 15442 Brick sn-1.local:/mnt/bricks/services/brick 49152 0 Y 2618 Self-heal Daemon on localhost N/A N/A N N/A Self-heal Daemon on sn-0.local N/A N/A Y 15568 Self-heal Daemon on sn-1.local N/A N/A Y 13719 Task Status of Volume services [root@sn-2:/root] # ps -ef | grep glustershd root 11142 1 0 14:30 ? 00:00:00 /usr/sbin/glusterfs -s sn-2.local --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/31d6e90b5e65aededb7ada7278c7181a.socket --xlator-option *replicate*.node-uuid=7321b551-5b98-4583-bc0b-887ebae4ba2a root 21017 16286 0 15:25 pts/2 00:00:00 grep --color=auto glustershd [root@sn-2:/root] Expected results: gluster v status should show glustershd status OK Additional info:
Sanju - Can you please backport https://review.gluster.org/20131 to release-3.12 branch?
upstream patch: https://review.gluster.org/#/c/20429/
COMMIT: https://review.gluster.org/20429 committed in release-3.12 by "jiffin tony Thottan" <jthottan> with a commit message- glusterd: gluster v status is showing wrong status for glustershd When we restart the bricks, connect and disconnect events happen for glustershd. glusterd use two threads to handle disconnect and connects events from glustershd. When we restart the bricks we'll get both disconnect and connect events. So both the threads will compete for the big lock. We want disconnect event to finish before connect event. But If connect thread gets the big lock first, it sets svc->online to true, and then disconnect thread will et svc->online to false. So, glustershd will be disconnected from glusterd and wrong status is shown. After killing shd, glusterd sleeps for 1 second. To avoid the problem, If glusterd releses the lock before sleep and acquires it after sleep, disconnect thread will get a chance to handle the glusterd_svc_common_rpc_notify before other thread completes connect event. >Change-Id: Ie82e823fdfc936feb7c0ae10599297b050ee9986 >Signed-off-by: Sanju Rakonde <srakonde> Change-Id: Ie82e823fdfc936feb7c0ae10599297b050ee9986 fixes: bz#1582443 Signed-off-by: Sanju Rakonde <srakonde>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.12, please open a new bug report. glusterfs-3.12.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2018-July/000105.html [2] https://www.gluster.org/pipermail/gluster-users/