1659708 – Optimize by not stopping (restart) selfheal deamon (shd) when a volume is stopped unless it is the last volume

Bug 1659708 - Optimize by not stopping (restart) selfheal deamon (shd) when a volume is stopped unless it is the last volume

Summary: Optimize by not stopping (restart) selfheal deamon (shd) when a volume is st...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Mohammed Rafi KC
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-15 14:51 UTC by Mohammed Rafi KC
Modified:	2019-07-15 17:22 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:	1471742
Environment:
Last Closed:	2019-04-11 03:48:25 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	21868	None	Abandoned	shd/glusterd: Implement shd daemon as per volume basis	2019-04-10 07:16:34 UTC
Gluster.org Gerrit	21907	None	Abandoned	glusterd/shd: implement mux logic for shd	2019-04-10 07:16:11 UTC
Gluster.org Gerrit	21960	None	Abandoned	mgmt/shd: Add support for multiplexed volfile management	2019-04-03 04:29:15 UTC
Gluster.org Gerrit	21989	None	Abandoned	afr/shd: Cleanup self heal daemon resources during afr fini	2019-02-14 18:38:24 UTC
Gluster.org Gerrit	22072	None	Open	performance/readdir-ahead: Fix deadlock in readdir ahead.	2019-01-23 16:01:29 UTC
Gluster.org Gerrit	22075	None	Merged	mgmt/shd: Implement multiplexing in self heal daemon	2019-04-01 03:45:01 UTC
Gluster.org Gerrit	22087	None	Open	clnt/rpc: ref leak during disconnect.	2019-02-12 11:58:04 UTC
Gluster.org Gerrit	22150	None	Abandoned	afr/shd: Cleanup self heal daemon resources during afr fini	2019-02-04 11:58:39 UTC
Gluster.org Gerrit	22151	None	Open	afr/shd: Cleanup self heal daemon resources during afr fini	2019-02-12 15:10:22 UTC
Gluster.org Gerrit	22266	None	Open	rpc/transport: Missing a ref on dict while creating transport object	2019-03-20 13:25:26 UTC
Gluster.org Gerrit	22468	None	Merged	client/fini: return fini after rpc cleanup	2019-04-11 03:48:24 UTC

Description Mohammed Rafi KC 2018-12-15 14:51:49 UTC

+++ This bug was initially created as a clone of Bug #1471742 +++

Description of problem:
=======================
When there is more than 1 volume in a cluster setup, if we stop one volume, we stop and start even the selfheal deamons on all nodes.
This is completely unncessary and waste of resources.
We are immediately restarting shd because there are still other volumes available. This means we are interfering momentarly in the heal process for any file which was getting healed , and this means the file takes a bit more time to heal.

before stopping the healdeamon as part of volume stop, we must check if there are any other volumes associated with the heal deamon and avoid stopping and regenerating a new shd process when it is not required
Version-Release number of selected component (if applicable):


How reproducible:
========
always


Steps to Reproduce:
1.create 3 volumes all 1x3
2.note down the shd pids
3.now stop v1
4. check the shd pid, it can be seen they are restarted
5. now stop v2, again it can be seen the shd pids are restarted

Actual results:
===========

Expected results:
don't stop shds unless it is the last volume


--- Additional comment from Ravishankar N on 2017-07-18 09:37:10 UTC ---

This behaviour is by design. Not killing and restarting shd with the new graph (containing only active volumes) means we would need to perform graph switch in shd. Unlike other processes (say fuse mount/ brick process etc ) which has a inode table maintained by the top most xlator, in glustershd, each afr maintains its own inode table, so we need to migrate all of them etc. which is not worth it.

Also there is not much loss in state due to a restart because shd only restarts heals of files that were not completed earlier.

--- Additional comment from nchilaka on 2017-07-18 10:08:20 UTC ---

I still see this as an issue, for eg take a case when a 100 gb file was 99gb healed, then in this case we try to scan and heal from beginning.
If the fix is not worth the effort, then we need to move to won't fix or can't fix instead of "not a bug"

--- Additional comment from Pranith Kumar K on 2017-07-18 10:32:14 UTC ---

(In reply to nchilaka from comment #3)
> I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> healed, then in this case we try to scan and heal from beginning.
> If the fix is not worth the effort, then we need to move to won't fix or
> can't fix instead of "not a bug"

yeah I agree, let's move it to WONTFIX. It is a design tradeoff.

--- Additional comment from Ravishankar N on 2017-07-18 10:34:57 UTC ---

(In reply to nchilaka from comment #3)
> I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> healed, then in this case we try to scan and heal from beginning.

For the record, this is something which can be fixed by introducing granular data-self-heal.

--- Additional comment from Pranith Kumar K on 2017-07-18 10:59:49 UTC ---

(In reply to Ravishankar N from comment #5)
> (In reply to nchilaka from comment #3)
> > I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> > healed, then in this case we try to scan and heal from beginning.
> 
> For the record, this is something which can be fixed by introducing granular
> data-self-heal.

Or supporting general purpose sharding, which seems to be needed for DHT as well.

--- Additional comment from Ravishankar N on 2017-07-18 11:10:23 UTC ---

(In reply to Pranith Kumar K from comment #6)
> Or supporting general purpose sharding, which seems to be needed for DHT as
> well.

Yes that would work too. I guess whether we should *not* be doing granular data self-heal just because we can piggy back on files being striped is a discussion we need to have upstream.  I can imagine users who wouldn't want to stripe their files just because faster heal time is a by-product of the striping, if granular data-heal can guarantee faster heal times without it.

Comment 1 Worker Ant 2018-12-15 14:54:12 UTC

REVIEW: https://review.gluster.org/21868 (shd/glusterd: Implement shd daemon as per volume basis) posted (#4) for review on master by mohammed rafi  kc

Comment 2 Worker Ant 2018-12-20 14:30:00 UTC

REVIEW: https://review.gluster.org/21907 (glusterd/shd: implement mux logic for shd) posted (#1) for review on master by mohammed rafi  kc

Comment 3 Worker Ant 2019-01-03 21:44:11 UTC

REVIEW: https://review.gluster.org/21989 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 4 Worker Ant 2019-01-22 07:31:02 UTC

REVIEW: https://review.gluster.org/22072 (performance/readdir-ahead: Fix deadlock in readdir ahead.) posted (#3) for review on master by mohammed rafi  kc

Comment 5 Worker Ant 2019-01-22 09:34:54 UTC

REVIEW: https://review.gluster.org/22075 (mgmt/shd: Implement multiplexing in self heal daemon) posted (#1) for review on master by mohammed rafi  kc

Comment 6 Worker Ant 2019-01-23 16:01:31 UTC

REVIEW: https://review.gluster.org/22072 (performance/readdir-ahead: Fix deadlock in readdir ahead.) merged (#11) on master by Amar Tumballi

Comment 7 Worker Ant 2019-02-04 11:48:57 UTC

REVIEW: https://review.gluster.org/22150 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 8 Worker Ant 2019-02-04 12:07:03 UTC

REVIEW: https://review.gluster.org/22151 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 9 Worker Ant 2019-02-12 11:58:06 UTC

REVIEW: https://review.gluster.org/22087 (clnt/rpc: ref leak during disconnect.) merged (#17) on master by mohammed rafi  kc

Comment 10 Worker Ant 2019-02-12 15:10:24 UTC

REVIEW: https://review.gluster.org/22151 (afr/shd: Cleanup self heal daemon resources during afr fini) merged (#2) on master by Ravishankar N

Comment 11 Worker Ant 2019-02-26 12:40:16 UTC

REVIEW: https://review.gluster.org/22266 (rpc/transport: Mixing a ref on dictionary while assigning to transport object) posted (#1) for review on master by mohammed rafi  kc

Comment 12 Worker Ant 2019-03-20 13:25:27 UTC

REVIEW: https://review.gluster.org/22266 (rpc/transport: Missing a ref on dict while creating transport object) merged (#8) on master by Amar Tumballi

Comment 13 Shyamsundar 2019-03-25 16:32:41 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 14 Worker Ant 2019-04-01 03:45:02 UTC

REVIEW: https://review.gluster.org/22075 (mgmt/shd: Implement multiplexing in self heal daemon) merged (#25) on master by Amar Tumballi

Comment 15 Worker Ant 2019-04-01 09:29:55 UTC

REVIEW: https://review.gluster.org/22468 (client/fini: return fini after rpc cleanup) posted (#1) for review on master by mohammed rafi  kc

Comment 16 Worker Ant 2019-04-11 03:48:25 UTC

REVIEW: https://review.gluster.org/22468 (client/fini: return fini after rpc cleanup) merged (#6) on master by Raghavendra G

Note You need to log in before you can comment on or make changes to this bug.