1297172 – Client self-heals block the FOP that triggered the heals

Bug 1297172 - Client self-heals block the FOP that triggered the heals

Summary: Client self-heals block the FOP that triggered the heals

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1292314 1293412 1300875 1313312
TreeView+	depends on / blocked

Reported:	2016-01-10 07:18 UTC by Ravishankar N
Modified:	2016-06-16 13:54 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.8rc2
Clone Of:
Clones:	1300875 1313312 (view as bug list)
Environment:
Last Closed:	2016-06-16 13:54:13 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2016-01-10 07:18:34 UTC

Description of problem:
If a lookup or a read transaction FOP triggers an inode refresh, the FOP does not return until the heal completes. For VM use cases, this could mean the VM appearing to go to an unresponsive state until the heal completes.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Create a 1x2 replica, fuse mount and create a file.
2.Disable self-heal daemon
2.Kill a brick, `dd` a few gigs into the file.
3.Bring the brick back up, do a hexdump of file from the mount.
4.Hexdump will stall spewing out data until the data heal completes (as seen from the mount log)

Actual results:
FOP blocks until heal is done.

Expected results:
FOP should not wait for heals- they could be made to happen in the background.

Comment 1 Vijay Bellur 2016-01-10 07:21:14 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 2 Vijay Bellur 2016-01-12 10:42:45 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#2) for review on master by Ravishankar N (ravishankar)

Comment 3 Brad Hubbard 2016-01-22 00:29:38 UTC

Raising severity based on the bugs depending on this one.

Comment 4 Vijay Bellur 2016-02-01 11:52:28 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#3) for review on master by Ravishankar N (ravishankar)

Comment 5 Vijay Bellur 2016-02-03 06:02:15 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#4) for review on master by Ravishankar N (ravishankar)

Comment 6 Vijay Bellur 2016-02-05 12:27:17 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#5) for review on master by Ravishankar N (ravishankar)

Comment 7 Vijay Bellur 2016-02-25 08:22:55 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#6) for review on master by Ravishankar N (ravishankar)

Comment 8 Vijay Bellur 2016-02-26 02:13:01 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#7) for review on master by Ravishankar N (ravishankar)

Comment 9 Vijay Bellur 2016-02-28 04:06:59 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#8) for review on master by Ravishankar N (ravishankar)

Comment 10 Vijay Bellur 2016-02-28 12:14:43 UTC

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#9) for review on master by Ravishankar N (ravishankar)

Comment 11 Vijay Bellur 2016-03-01 11:23:32 UTC

COMMIT: http://review.gluster.org/13207 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 8210ca1a5c0e78e91c6fab7df7e002e39660b706
Author: Ravishankar N <ravishankar>
Date:   Sun Jan 10 09:19:34 2016 +0530

    afr: Add throttled background client-side heals
    
    If a heal is needed after inode refresh (lookup, read_txn), launch it in
    the background instead of blocking the fop (that triggered refresh) until the
    heal happens.
    
    afr_replies_interpret() is modified such that the heal is
    launched only if atleast one sink brick is up.
    
    Max. no of heals that can happen in parallel is configurable via the
    'background-self-heal-count' volume option. Any number greater than that
    is put in a wait queue whose length is configurable via
    'heal-wait-queue-leng' volume option. If the wait queue is also full,
    further heals will be ignored.
    
    Default values:  background-self-heal-count=8, heal-wait-queue-leng=128
    
    Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70
    BUG: 1297172
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/13207
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 12 Ravishankar N 2016-03-22 12:43:45 UTC

Moving it back to POST for a follow-up patch that adjusts op-verison.

Comment 13 Vijay Bellur 2016-03-22 13:02:41 UTC

REVIEW: http://review.gluster.org/13810 (glusterd/ afr: Fix op-version for background client-side heals) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 14 Vijay Bellur 2016-03-23 02:19:18 UTC

COMMIT: http://review.gluster.org/13810 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit b6edcbd6948f0252785672fde3db37cec6353d11
Author: Ravishankar N <root@ravi2.(none)>
Date:   Tue Mar 22 12:56:41 2016 +0000

    glusterd/ afr: Fix op-version for background client-side heals
    
    http://review.gluster.org/13207 tied cluster.heal-wait-queue-length to
    GD_OP_VERSION_3_7_9 but the patch will be merged in release-3.7 branch
    (http://review.gluster.org/#/c/13564/) only for 3.7.10.
    Hence change it on master also for uniformity.
    
    Change-Id: Id581695e58b0765f5652016cc2045f05e36b768f
    BUG: 1297172
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/13810
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 15 Niels de Vos 2016-06-16 13:54:13 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.