Description of problem: If a lookup or a read transaction FOP triggers an inode refresh, the FOP does not return until the heal completes. For VM use cases, this could mean the VM appearing to go to an unresponsive state until the heal completes. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Create a 1x2 replica, fuse mount and create a file. 2.Disable self-heal daemon 2.Kill a brick, `dd` a few gigs into the file. 3.Bring the brick back up, do a hexdump of file from the mount. 4.Hexdump will stall spewing out data until the data heal completes (as seen from the mount log) Actual results: FOP blocks until heal is done. Expected results: FOP should not wait for heals- they could be made to happen in the background.
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#1) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#2) for review on master by Ravishankar N (ravishankar)
Raising severity based on the bugs depending on this one.
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#3) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#4) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#5) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#6) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#7) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#8) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/13207 (afr: Add throttled background client-side heals) posted (#9) for review on master by Ravishankar N (ravishankar)
COMMIT: http://review.gluster.org/13207 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 8210ca1a5c0e78e91c6fab7df7e002e39660b706 Author: Ravishankar N <ravishankar> Date: Sun Jan 10 09:19:34 2016 +0530 afr: Add throttled background client-side heals If a heal is needed after inode refresh (lookup, read_txn), launch it in the background instead of blocking the fop (that triggered refresh) until the heal happens. afr_replies_interpret() is modified such that the heal is launched only if atleast one sink brick is up. Max. no of heals that can happen in parallel is configurable via the 'background-self-heal-count' volume option. Any number greater than that is put in a wait queue whose length is configurable via 'heal-wait-queue-leng' volume option. If the wait queue is also full, further heals will be ignored. Default values: background-self-heal-count=8, heal-wait-queue-leng=128 Change-Id: I1d4a52814cdfd43d90591b6d2ad7b6219937ce70 BUG: 1297172 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/13207 Smoke: Gluster Build System <jenkins.com> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Pranith Kumar Karampuri <pkarampu> NetBSD-regression: NetBSD Build System <jenkins.org>
Moving it back to POST for a follow-up patch that adjusts op-verison.
REVIEW: http://review.gluster.org/13810 (glusterd/ afr: Fix op-version for background client-side heals) posted (#1) for review on master by Ravishankar N (ravishankar)
COMMIT: http://review.gluster.org/13810 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit b6edcbd6948f0252785672fde3db37cec6353d11 Author: Ravishankar N <root@ravi2.(none)> Date: Tue Mar 22 12:56:41 2016 +0000 glusterd/ afr: Fix op-version for background client-side heals http://review.gluster.org/13207 tied cluster.heal-wait-queue-length to GD_OP_VERSION_3_7_9 but the patch will be merged in release-3.7 branch (http://review.gluster.org/#/c/13564/) only for 3.7.10. Hence change it on master also for uniformity. Change-Id: Id581695e58b0765f5652016cc2045f05e36b768f BUG: 1297172 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/13810 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user