Bug 1515266 - Prevent ec from continue processing heal operations after PARENT_DOWN
Summary: Prevent ec from continue processing heal operations after PARENT_DOWN
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1505570 1522646
TreeView+ depends on / blocked
 
Reported: 2017-11-20 13:11 UTC by Xavi Hernandez
Modified: 2018-08-07 10:37 UTC (History)
3 users (show)

Fixed In Version: glusterfs-4.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1522646 (view as bug list)
Environment:
Last Closed: 2018-03-15 11:21:35 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Xavi Hernandez 2017-11-20 13:11:20 UTC
Description of problem:

EC delays PARENT_DOWN propagation until all pending requests have completed, but heal operations are ignored. This can cause unexpected results when a heal operation is running while the volume is being unmounted.

Version-Release number of selected component (if applicable): master


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ashish Pandey 2017-11-22 09:51:00 UTC
Xavi,

This issue is true for afr also. Do you think we can fix it in some common place 
like syntask infra? Or we have to deal with it separately.

Ashish

Comment 2 Worker Ant 2017-11-22 10:33:32 UTC
REVIEW: https://review.gluster.org/18840 (cluster/ec: Prevent self-heal to work after PARENT_DOWN) posted (#1) for review on master by Xavier Hernandez

Comment 3 Xavi Hernandez 2017-11-22 18:23:31 UTC
(In reply to Ashish Pandey from comment #1)
> Xavi,
> 
> This issue is true for afr also. Do you think we can fix it in some common
> place 
> like syntask infra? Or we have to deal with it separately.
> 
> Ashish

I'm not sure how synctask could help here. It should have access to some information telling if the xlator that has initiated the operation is shutting down or not (I don't think we have this). But even then, aborting a single operation doesn't guarantee that the caller do not attempt another synctask operation (for example healing the next entry of a directory) still delaying the shutdown and causing multiple failures on fops that have not really failed (it will probably add noise to the logs).

I think this is better to be handled inside the xlator itself. If AFR already tracks ongoing regular operations, I think it would be relatively easy to include heals in the checks, though I haven't looked at it.

Comment 4 Worker Ant 2017-11-28 09:12:20 UTC
COMMIT: https://review.gluster.org/18840 committed in master by \"Xavier Hernandez\" <jahernan> with a commit message- cluster/ec: Prevent self-heal to work after PARENT_DOWN

When the volume is being stopped, PARENT_DOWN event is received.
This instructs EC to wait until all pending operations are completed
before declaring itself down. However heal operations are ignored
and allowed to continue even after having said it was down.

This may cause unexpected results and crashes.

To solve this, heal operations are considered exactly equal as any
other operation and EC won't propagate PARENT_DOWN until all
operations, including healing, are complete. To avoid big delays
if this happens in the middle of a big heal, a check has been
added to quit current heal if shutdown is detected.

Change-Id: I26645e236ebd115eb22c7ad4972461111a2d2034
BUG: 1515266
Signed-off-by: Xavier Hernandez <jahernan>

Comment 5 Shyamsundar 2018-03-15 11:21:35 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.

glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.