Bug 1659708 - Optimize by not stopping (restart) selfheal deamon (shd) when a volume is stopped unless it is the last volume
Summary: Optimize by not stopping (restart) selfheal deamon (shd) when a volume is st...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-15 14:51 UTC by Mohammed Rafi KC
Modified: 2019-07-15 17:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1471742
Environment:
Last Closed: 2019-04-11 03:48:25 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21868 0 None Abandoned shd/glusterd: Implement shd daemon as per volume basis 2019-04-10 07:16:34 UTC
Gluster.org Gerrit 21907 0 None Abandoned glusterd/shd: implement mux logic for shd 2019-04-10 07:16:11 UTC
Gluster.org Gerrit 21960 0 None Abandoned mgmt/shd: Add support for multiplexed volfile management 2019-04-03 04:29:15 UTC
Gluster.org Gerrit 21989 0 None Abandoned afr/shd: Cleanup self heal daemon resources during afr fini 2019-02-14 18:38:24 UTC
Gluster.org Gerrit 22072 0 None Open performance/readdir-ahead: Fix deadlock in readdir ahead. 2019-01-23 16:01:29 UTC
Gluster.org Gerrit 22075 0 None Merged mgmt/shd: Implement multiplexing in self heal daemon 2019-04-01 03:45:01 UTC
Gluster.org Gerrit 22087 0 None Open clnt/rpc: ref leak during disconnect. 2019-02-12 11:58:04 UTC
Gluster.org Gerrit 22150 0 None Abandoned afr/shd: Cleanup self heal daemon resources during afr fini 2019-02-04 11:58:39 UTC
Gluster.org Gerrit 22151 0 None Open afr/shd: Cleanup self heal daemon resources during afr fini 2019-02-12 15:10:22 UTC
Gluster.org Gerrit 22266 0 None Open rpc/transport: Missing a ref on dict while creating transport object 2019-03-20 13:25:26 UTC
Gluster.org Gerrit 22468 0 None Merged client/fini: return fini after rpc cleanup 2019-04-11 03:48:24 UTC

Description Mohammed Rafi KC 2018-12-15 14:51:49 UTC
+++ This bug was initially created as a clone of Bug #1471742 +++

Description of problem:
=======================
When there is more than 1 volume in a cluster setup, if we stop one volume, we stop and start even the selfheal deamons on all nodes.
This is completely unncessary and waste of resources.
We are immediately restarting shd because there are still other volumes available. This means we are interfering momentarly in the heal process for any file which was getting healed , and this means the file takes a bit more time to heal.

before stopping the healdeamon as part of volume stop, we must check if there are any other volumes associated with the heal deamon and avoid stopping and regenerating a new shd process when it is not required
Version-Release number of selected component (if applicable):


How reproducible:
========
always


Steps to Reproduce:
1.create 3 volumes all 1x3
2.note down the shd pids
3.now stop v1
4. check the shd pid, it can be seen they are restarted
5. now stop v2, again it can be seen the shd pids are restarted

Actual results:
===========

Expected results:
don't stop shds unless it is the last volume


--- Additional comment from Ravishankar N on 2017-07-18 09:37:10 UTC ---

This behaviour is by design. Not killing and restarting shd with the new graph (containing only active volumes) means we would need to perform graph switch in shd. Unlike other processes (say fuse mount/ brick process etc ) which has a inode table maintained by the top most xlator, in glustershd, each afr maintains its own inode table, so we need to migrate all of them etc. which is not worth it.

Also there is not much loss in state due to a restart because shd only restarts heals of files that were not completed earlier.

--- Additional comment from nchilaka on 2017-07-18 10:08:20 UTC ---

I still see this as an issue, for eg take a case when a 100 gb file was 99gb healed, then in this case we try to scan and heal from beginning.
If the fix is not worth the effort, then we need to move to won't fix or can't fix instead of "not a bug"

--- Additional comment from Pranith Kumar K on 2017-07-18 10:32:14 UTC ---

(In reply to nchilaka from comment #3)
> I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> healed, then in this case we try to scan and heal from beginning.
> If the fix is not worth the effort, then we need to move to won't fix or
> can't fix instead of "not a bug"

yeah I agree, let's move it to WONTFIX. It is a design tradeoff.

--- Additional comment from Ravishankar N on 2017-07-18 10:34:57 UTC ---

(In reply to nchilaka from comment #3)
> I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> healed, then in this case we try to scan and heal from beginning.

For the record, this is something which can be fixed by introducing granular data-self-heal.

--- Additional comment from Pranith Kumar K on 2017-07-18 10:59:49 UTC ---

(In reply to Ravishankar N from comment #5)
> (In reply to nchilaka from comment #3)
> > I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> > healed, then in this case we try to scan and heal from beginning.
> 
> For the record, this is something which can be fixed by introducing granular
> data-self-heal.

Or supporting general purpose sharding, which seems to be needed for DHT as well.

--- Additional comment from Ravishankar N on 2017-07-18 11:10:23 UTC ---

(In reply to Pranith Kumar K from comment #6)
> Or supporting general purpose sharding, which seems to be needed for DHT as
> well.

Yes that would work too. I guess whether we should *not* be doing granular data self-heal just because we can piggy back on files being striped is a discussion we need to have upstream.  I can imagine users who wouldn't want to stripe their files just because faster heal time is a by-product of the striping, if granular data-heal can guarantee faster heal times without it.

Comment 1 Worker Ant 2018-12-15 14:54:12 UTC
REVIEW: https://review.gluster.org/21868 (shd/glusterd: Implement shd daemon as per volume basis) posted (#4) for review on master by mohammed rafi  kc

Comment 2 Worker Ant 2018-12-20 14:30:00 UTC
REVIEW: https://review.gluster.org/21907 (glusterd/shd: implement mux logic for shd) posted (#1) for review on master by mohammed rafi  kc

Comment 3 Worker Ant 2019-01-03 21:44:11 UTC
REVIEW: https://review.gluster.org/21989 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 4 Worker Ant 2019-01-22 07:31:02 UTC
REVIEW: https://review.gluster.org/22072 (performance/readdir-ahead: Fix deadlock in readdir ahead.) posted (#3) for review on master by mohammed rafi  kc

Comment 5 Worker Ant 2019-01-22 09:34:54 UTC
REVIEW: https://review.gluster.org/22075 (mgmt/shd: Implement multiplexing in self heal daemon) posted (#1) for review on master by mohammed rafi  kc

Comment 6 Worker Ant 2019-01-23 16:01:31 UTC
REVIEW: https://review.gluster.org/22072 (performance/readdir-ahead: Fix deadlock in readdir ahead.) merged (#11) on master by Amar Tumballi

Comment 7 Worker Ant 2019-02-04 11:48:57 UTC
REVIEW: https://review.gluster.org/22150 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 8 Worker Ant 2019-02-04 12:07:03 UTC
REVIEW: https://review.gluster.org/22151 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 9 Worker Ant 2019-02-12 11:58:06 UTC
REVIEW: https://review.gluster.org/22087 (clnt/rpc: ref leak during disconnect.) merged (#17) on master by mohammed rafi  kc

Comment 10 Worker Ant 2019-02-12 15:10:24 UTC
REVIEW: https://review.gluster.org/22151 (afr/shd: Cleanup self heal daemon resources during afr fini) merged (#2) on master by Ravishankar N

Comment 11 Worker Ant 2019-02-26 12:40:16 UTC
REVIEW: https://review.gluster.org/22266 (rpc/transport: Mixing a ref on dictionary while assigning to transport object) posted (#1) for review on master by mohammed rafi  kc

Comment 12 Worker Ant 2019-03-20 13:25:27 UTC
REVIEW: https://review.gluster.org/22266 (rpc/transport: Missing a ref on dict while creating transport object) merged (#8) on master by Amar Tumballi

Comment 13 Shyamsundar 2019-03-25 16:32:41 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 14 Worker Ant 2019-04-01 03:45:02 UTC
REVIEW: https://review.gluster.org/22075 (mgmt/shd: Implement multiplexing in self heal daemon) merged (#25) on master by Amar Tumballi

Comment 15 Worker Ant 2019-04-01 09:29:55 UTC
REVIEW: https://review.gluster.org/22468 (client/fini: return fini after rpc cleanup) posted (#1) for review on master by mohammed rafi  kc

Comment 16 Worker Ant 2019-04-11 03:48:25 UTC
REVIEW: https://review.gluster.org/22468 (client/fini: return fini after rpc cleanup) merged (#6) on master by Raghavendra G


Note You need to log in before you can comment on or make changes to this bug.