Bug 1392299
| Summary: | [SAMBA-mdcache]Read hungs and leads to disconnect of samba share while creating IOs from one client & reading from another client | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vivek Das <vdas> |
| Component: | read-ahead | Assignee: | Poornima G <pgurusid> |
| Status: | CLOSED ERRATA | QA Contact: | Vivek Das <vdas> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.2 | CC: | amukherj, nbalacha, pgurusid, rcyriac, rhs-bugs, rhs-smb, sbhaloth, storage-qa-internal |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.8.4-6 | Doc Type: | Bug Fix |
| Doc Text: |
In some situations, read operations were skipped by the io-cache translator, which led to a hung client mount. This has been corrected so that the client mount process works as expected for read operations.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-03-23 06:16:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1388292, 1399015, 1399018, 1399023, 1399024 | ||
| Bug Blocks: | 1351528, 1351530 | ||
|
Description
Vivek Das
2016-11-07 07:02:52 UTC
Sosreports , samba logs available http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1392299 From comment#3: This bug is not reproducible i.e it works absolutely fine when we disable read-ahead for the volume. < gluster volume set volname read-ahead off > Looks like this is related to read-ahead, not readdir-ahead. Updating the component. Fix posted upstream : http://review.gluster.org/15901 RCA:
====
In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead
of STACK_WIND(). One such case is when inode_ctx for that file
is not present (can happen if readdirp was called, and populates
md-cache and serves all the lookups from cache).
Consider the following graph:
...
io-cache (parent)
|
readdir-ahead
|
read-ahead
...
Below is the code snippet of ioc_readv calling STACK_WIND_TAIL:
ioc_readv()
{
...
if (!inode_ctx)
STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this),
FIRST_CHILD (frame->this)->fops->readv, fd,
size, offset, flags, xdata);
/* Ideally, this stack_wind should wind to readdir-ahead:readv()
but it winds to read-ahead:readv(). See below for
explaination.
*/
...
}
STACK_WIND_TAIL (frame, obj, fn, ...)
{
frame->this = obj;
/* for the above mentioned graph, frame->this will be readdir-ahead
* frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which
* is as expected
*/
...
THIS = obj;
/* THIS will be read-ahead instead of readdir-ahead!, as obj expands
* to "FIRST_CHILD (frame->this)" and frame->this was pointing
* to readdir-ahead in the previous statement.
*/
...
fn (frame, obj, params);
/* fn will call read-ahead:readv() instead of readdir-ahead:readv()!
* as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and
* frame->this was pointing ro readdir-ahead in the first statement
*/
...
}
Thus, the readdir-ahead's readv() implementation will be skipped, and
ra_readv() will be called with frame->this = "readdir-ahead" and
this = "read-ahead". This can lead to corruption / hang / other problems.
But in this perticular case, when 'frame->this' and 'this' passed
to ra_readv() doesn't match, it causes ra_readv() to call ra_readv()
again!. Thus the logic of read-ahead readv() falls apart and leads to
hang.
Have posted another patch for review: http://review.gluster.org/#/c/15923/ Once this is merged will backport this patch(http://review.gluster.org/#/c/15923/) to downstream. http://review.gluster.org/15901 which is already merged also fixes the issue, but the right way to fix would be http://review.gluster.org/#/c/15923/ downstream patch : https://code.engineering.redhat.com/gerrit/#/c/91496/ Followed steps to reproduce with glusterfs-3.8.4-6 & read-ahead: on, works fine so moving it to Verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |