Bug 1313312 - Client self-heals block the FOP that triggered the heals
Summary: Client self-heals block the FOP that triggered the heals
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.7.9
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On: 1297172
Blocks: 1292314 1293412 1300875
TreeView+ depends on / blocked
 
Reported: 2016-03-01 11:28 UTC by Ravishankar N
Modified: 2016-04-19 06:58 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.7.10
Clone Of: 1297172
Environment:
Last Closed: 2016-04-19 06:58:43 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Ravishankar N 2016-03-01 11:28:31 UTC
+++ This bug was initially created as a clone of Bug #1297172 +++

Description of problem:
If a lookup or a read transaction FOP triggers an inode refresh, the FOP does not return until the heal completes. For VM use cases, this could mean the VM appearing to go to an unresponsive state until the heal completes.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Create a 1x2 replica, fuse mount and create a file.
2.Disable self-heal daemon
2.Kill a brick, `dd` a few gigs into the file.
3.Bring the brick back up, do a hexdump of file from the mount.
4.Hexdump will stall spewing out data until the data heal completes (as seen from the mount log)

Actual results:
FOP blocks until heal is done.

Expected results:
FOP should not wait for heals- they could be made to happen in the background.

--- Additional comment from Vijay Bellur on 2016-01-10 02:21:14 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-01-12 05:42:45 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#2) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Brad Hubbard on 2016-01-21 19:29:38 EST ---

Raising severity based on the bugs depending on this one.

--- Additional comment from Vijay Bellur on 2016-02-01 06:52:28 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#3) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-02-03 01:02:15 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#4) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-02-05 07:27:17 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#5) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-02-25 03:22:55 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#6) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-02-25 21:13:01 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#7) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-02-27 23:06:59 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#8) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-02-28 07:14:43 EST ---

REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#9) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Vijay Bellur on 2016-03-01 06:23:32 EST ---

COMMIT: http://review.gluster.org/13207 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 8210ca1a5c0e78e91c6fab7df7e002e39660b706
Author: Ravishankar N <ravishankar>
Date:   Sun Jan 10 09:19:34 2016 +0530

    afr: Add throttled background client-side heals
    
    If a heal is needed after inode refresh (lookup, read_txn), launch it in
    the background instead of blocking the fop (that triggered refresh) until the
    heal happens.
    
    afr_replies_interpret() is modified such that the heal is
    launched only if atleast one sink brick is up.
    
    Max. no of heals that can happen in parallel is configurable via the
    'background-self-heal-count' volume option. Any number greater than that
    is put in a wait queue whose length is configurable via
    'heal-wait-queue-leng' volume option. If the wait queue is also full,
    further heals will be ignored.
    
    Default values:  background-self-heal-count=8, heal-wait-queue-leng=128
    
    Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70
    BUG: 1297172
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/13207
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 1 Vijay Bellur 2016-03-01 11:58:46 UTC
REVIEW: http://review.gluster.org/13564 (afr: Add throttled background client-side heals) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar)

Comment 2 Vijay Bellur 2016-03-22 09:05:46 UTC
REVIEW: http://review.gluster.org/13564 (afr: Add throttled background client-side heals) posted (#2) for review on release-3.7 by Ravishankar N (ravishankar)

Comment 3 Vijay Bellur 2016-03-23 02:21:44 UTC
COMMIT: http://review.gluster.org/13564 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 4a5d8f65b9b04385dcae8b16a650f4e8ed357f8b
Author: Ravishankar N <ravishankar>
Date:   Tue Mar 22 14:26:32 2016 +0530

    afr: Add throttled background client-side heals
    
    Backport of: http://review.gluster.org/13207
    
    If a heal is needed after inode refresh (lookup, read_txn), launch it in
    the background instead of blocking the fop (that triggered refresh)
    until the heal happens.
    
    afr_replies_interpret() is modified such that the heal is
    launched only if atleast one sink brick is up.
    
    Max. no of heals that can happen in parallel is configurable via the
    'background-self-heal-count' volume option. Any number greater than that
    is put in a wait queue whose length is configurable via
    'heal-wait-queue-leng' volume option. If the wait queue is also full,
    further heals will be ignored.
    
    Default values:  background-self-heal-count=8, heal-wait-queue-leng=128
    
    Change-Id: I9a134b2c29d66b70b7b1278811bd504963aabacc
    BUG: 1313312
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/13564
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 4 Kaushal 2016-04-19 06:58:43 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.10, please open a new bug report.

glusterfs-3.7.10 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-April/026164.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.