Bug 1659708

Summary:	Optimize by not stopping (restart) selfheal deamon (shd) when a volume is stopped unless it is the last volume
Product:	[Community] GlusterFS	Reporter:	Mohammed Rafi KC <rkavunga>
Component:	replicate	Assignee:	Mohammed Rafi KC <rkavunga>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	mainline	CC:	bugs, guillaume.pavese
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1471742	Environment:
Last Closed:	2019-04-11 03:48:25 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mohammed Rafi KC 2018-12-15 14:51:49 UTC

+++ This bug was initially created as a clone of Bug #1471742 +++

Description of problem:
=======================
When there is more than 1 volume in a cluster setup, if we stop one volume, we stop and start even the selfheal deamons on all nodes.
This is completely unncessary and waste of resources.
We are immediately restarting shd because there are still other volumes available. This means we are interfering momentarly in the heal process for any file which was getting healed , and this means the file takes a bit more time to heal.

before stopping the healdeamon as part of volume stop, we must check if there are any other volumes associated with the heal deamon and avoid stopping and regenerating a new shd process when it is not required
Version-Release number of selected component (if applicable):


How reproducible:
========
always


Steps to Reproduce:
1.create 3 volumes all 1x3
2.note down the shd pids
3.now stop v1
4. check the shd pid, it can be seen they are restarted
5. now stop v2, again it can be seen the shd pids are restarted

Actual results:
===========

Expected results:
don't stop shds unless it is the last volume


--- Additional comment from Ravishankar N on 2017-07-18 09:37:10 UTC ---

This behaviour is by design. Not killing and restarting shd with the new graph (containing only active volumes) means we would need to perform graph switch in shd. Unlike other processes (say fuse mount/ brick process etc ) which has a inode table maintained by the top most xlator, in glustershd, each afr maintains its own inode table, so we need to migrate all of them etc. which is not worth it.

Also there is not much loss in state due to a restart because shd only restarts heals of files that were not completed earlier.

--- Additional comment from nchilaka on 2017-07-18 10:08:20 UTC ---

I still see this as an issue, for eg take a case when a 100 gb file was 99gb healed, then in this case we try to scan and heal from beginning.
If the fix is not worth the effort, then we need to move to won't fix or can't fix instead of "not a bug"

--- Additional comment from Pranith Kumar K on 2017-07-18 10:32:14 UTC ---

(In reply to nchilaka from comment #3)
> I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> healed, then in this case we try to scan and heal from beginning.
> If the fix is not worth the effort, then we need to move to won't fix or
> can't fix instead of "not a bug"

yeah I agree, let's move it to WONTFIX. It is a design tradeoff.

--- Additional comment from Ravishankar N on 2017-07-18 10:34:57 UTC ---

(In reply to nchilaka from comment #3)
> I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> healed, then in this case we try to scan and heal from beginning.

For the record, this is something which can be fixed by introducing granular data-self-heal.

--- Additional comment from Pranith Kumar K on 2017-07-18 10:59:49 UTC ---

(In reply to Ravishankar N from comment #5)
> (In reply to nchilaka from comment #3)
> > I still see this as an issue, for eg take a case when a 100 gb file was 99gb
> > healed, then in this case we try to scan and heal from beginning.
> 
> For the record, this is something which can be fixed by introducing granular
> data-self-heal.

Or supporting general purpose sharding, which seems to be needed for DHT as well.

--- Additional comment from Ravishankar N on 2017-07-18 11:10:23 UTC ---

(In reply to Pranith Kumar K from comment #6)
> Or supporting general purpose sharding, which seems to be needed for DHT as
> well.

Yes that would work too. I guess whether we should *not* be doing granular data self-heal just because we can piggy back on files being striped is a discussion we need to have upstream.  I can imagine users who wouldn't want to stripe their files just because faster heal time is a by-product of the striping, if granular data-heal can guarantee faster heal times without it.

Comment 1 Worker Ant 2018-12-15 14:54:12 UTC

REVIEW: https://review.gluster.org/21868 (shd/glusterd: Implement shd daemon as per volume basis) posted (#4) for review on master by mohammed rafi  kc

Comment 2 Worker Ant 2018-12-20 14:30:00 UTC

REVIEW: https://review.gluster.org/21907 (glusterd/shd: implement mux logic for shd) posted (#1) for review on master by mohammed rafi  kc

Comment 3 Worker Ant 2019-01-03 21:44:11 UTC

REVIEW: https://review.gluster.org/21989 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 4 Worker Ant 2019-01-22 07:31:02 UTC

REVIEW: https://review.gluster.org/22072 (performance/readdir-ahead: Fix deadlock in readdir ahead.) posted (#3) for review on master by mohammed rafi  kc

Comment 5 Worker Ant 2019-01-22 09:34:54 UTC

REVIEW: https://review.gluster.org/22075 (mgmt/shd: Implement multiplexing in self heal daemon) posted (#1) for review on master by mohammed rafi  kc

Comment 6 Worker Ant 2019-01-23 16:01:31 UTC

REVIEW: https://review.gluster.org/22072 (performance/readdir-ahead: Fix deadlock in readdir ahead.) merged (#11) on master by Amar Tumballi

Comment 7 Worker Ant 2019-02-04 11:48:57 UTC

REVIEW: https://review.gluster.org/22150 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 8 Worker Ant 2019-02-04 12:07:03 UTC

REVIEW: https://review.gluster.org/22151 (afr/shd: Cleanup self heal daemon resources during afr fini) posted (#1) for review on master by mohammed rafi  kc

Comment 9 Worker Ant 2019-02-12 11:58:06 UTC

REVIEW: https://review.gluster.org/22087 (clnt/rpc: ref leak during disconnect.) merged (#17) on master by mohammed rafi  kc

Comment 10 Worker Ant 2019-02-12 15:10:24 UTC

REVIEW: https://review.gluster.org/22151 (afr/shd: Cleanup self heal daemon resources during afr fini) merged (#2) on master by Ravishankar N

Comment 11 Worker Ant 2019-02-26 12:40:16 UTC

REVIEW: https://review.gluster.org/22266 (rpc/transport: Mixing a ref on dictionary while assigning to transport object) posted (#1) for review on master by mohammed rafi  kc

Comment 12 Worker Ant 2019-03-20 13:25:27 UTC

REVIEW: https://review.gluster.org/22266 (rpc/transport: Missing a ref on dict while creating transport object) merged (#8) on master by Amar Tumballi

Comment 13 Shyamsundar 2019-03-25 16:32:41 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 14 Worker Ant 2019-04-01 03:45:02 UTC

REVIEW: https://review.gluster.org/22075 (mgmt/shd: Implement multiplexing in self heal daemon) merged (#25) on master by Amar Tumballi

Comment 15 Worker Ant 2019-04-01 09:29:55 UTC

REVIEW: https://review.gluster.org/22468 (client/fini: return fini after rpc cleanup) posted (#1) for review on master by mohammed rafi  kc

Comment 16 Worker Ant 2019-04-11 03:48:25 UTC

REVIEW: https://review.gluster.org/22468 (client/fini: return fini after rpc cleanup) merged (#6) on master by Raghavendra G