Bug 1399024
Summary: | performance.read-ahead on results in processes on client stuck in IO wait | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Poornima G <pgurusid> |
Component: | core | Assignee: | bugs <bugs> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.7.17 | CC: | bugs, bugzilla, jahernan, janlam7, ksubrahm, pgurusid, ravishankar, rgowdapp, sarumuga |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.7.19 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1388292 | Environment: | |
Last Closed: | 2017-01-18 13:39:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | fuse |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1388292 | ||
Bug Blocks: | 1379228, 1392299, 1399015, 1399018, 1399023 |
Description
Poornima G
2016-11-28 05:06:42 UTC
REVIEW: http://review.gluster.org/15935 (libglusterfs: Fix a read hang) posted (#1) for review on release-3.7 by Poornima G (pgurusid) REVIEW: http://review.gluster.org/15935 (libglusterfs: Fix a read hang) posted (#2) for review on release-3.7 by Poornima G (pgurusid) REVIEW: http://review.gluster.org/15935 (libglusterfs: Fix a read hang) posted (#3) for review on release-3.7 by Poornima G (pgurusid) COMMIT: http://review.gluster.org/15935 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit 8c93cbb4ba307e47f61c5551023bd82842f184ff Author: Poornima G <pgurusid> Date: Mon Nov 21 19:57:08 2016 +0530 libglusterfs: Fix a read hang Backport of http://review.gluster.org/#/c/15923/ Issue: ===== In certain cases, there was no unwind of read from read-ahead xlator, thus resulting in hang. RCA: ==== In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead of STACK_WIND(). One such case is when inode_ctx for that file is not present (can happen if readdirp was called, and populates md-cache and serves all the lookups from cache). Consider the following graph: ... io-cache (parent) | readdir-ahead | read-ahead ... Below is the code snippet of ioc_readv calling STACK_WIND_TAIL: ioc_readv() { ... if (!inode_ctx) STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this), FIRST_CHILD (frame->this)->fops->readv, fd, size, offset, flags, xdata); /* Ideally, this stack_wind should wind to readdir-ahead:readv() but it winds to read-ahead:readv(). See below for explaination. */ ... } STACK_WIND_TAIL (frame, obj, fn, ...) { frame->this = obj; /* for the above mentioned graph, frame->this will be readdir-ahead * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which * is as expected */ ... THIS = obj; /* THIS will be read-ahead instead of readdir-ahead!, as obj expands * to "FIRST_CHILD (frame->this)" and frame->this was pointing * to readdir-ahead in the previous statement. */ ... fn (frame, obj, params); /* fn will call read-ahead:readv() instead of readdir-ahead:readv()! * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and * frame->this was pointing ro readdir-ahead in the first statement */ ... } Thus, the readdir-ahead's readv() implementation will be skipped, and ra_readv() will be called with frame->this = "readdir-ahead" and this = "read-ahead". This can lead to corruption / hang / other problems. But in this perticular case, when 'frame->this' and 'this' passed to ra_readv() doesn't match, it causes ra_readv() to call ra_readv() again!. Thus the logic of read-ahead readv() falls apart and leads to hang. Solution: ========= Modify STACK_WIND_TAIL() as: STACK_WIND_TAIL (frame, obj, fn, ...) { next_xl = obj /* resolve obj as the variables passed in obj macro can be overwritten in the further instrucions */ next_xl_fn = fn /* resolve fn and store in a tmp variable, before modifying any variables */ frame->this = next_xl; ... THIS = next_xl; ... next_xl_fn (frame, next_xl, params); ... } >Reviewed-on: http://review.gluster.org/15923 >Smoke: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >Reviewed-by: Rajesh Joseph <rjoseph> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: Raghavendra G <rgowdapp> (Cherry picked from commit 8943c19a2ef51b6e4fa66cb57211d469fe558579) BUG: 1399024 Change-Id: Ie662ac8f18fa16909376f1e59387bc5b886bd0f9 Signed-off-by: Poornima G <pgurusid> Reviewed-on: http://review.gluster.org/15935 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.19, please open a new bug report. glusterfs-3.7.19 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/gluster-users/2017-January/029623.html [2] https://www.gluster.org/pipermail/gluster-users/ |