Bug 1226880 - Fix infinite looping in shard_readdir(p) on '/'
Summary: Fix infinite looping in shard_readdir(p) on '/'
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: sharding
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Krutika Dhananjay
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On: 1222379
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-01 11:27 UTC by Krutika Dhananjay
Modified: 2015-06-20 09:49 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.7.2
Doc Type: Bug Fix
Doc Text:
Clone Of: 1222379
Environment:
Last Closed: 2015-06-20 09:49:17 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Krutika Dhananjay 2015-06-01 11:27:08 UTC
+++ This bug was initially created as a clone of Bug #1222379 +++

Description of problem:

Readdir(p) on '/' in sharded volume can sometimes lead to infinite calls to readdirp at same set of offsets circling back to offset=0 all over again.

RCA:
DHT performs readdirp one subvol at a time and the entries are ordered according to their offsets in ascending order. At some point, when /.shard is the last of the several entries read, and DHT unwinds the call to shard xlator, it deletes the entry corresponding to "/.shard" from the list as it is not supposed to be exposed on the mount. Shard xlator then unwinds the call with the rest of the entries to parent xlator. When the readdirp result reaches readdir-ahead translator, it winds the next readdir at the last entry's offset (which is at an offset less than that of "/.shard"). In this iteration, DHT fetches "/.shard", shard xlator ignores it and unwinds with no entries. In such cases, readdir-ahead creates a new stub for readdirp with offset = 0. When the call is resumed, it would again lead to the same events described above getting executed again forever, causing the mount to perceive a hang.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Anand Avati on 2015-05-18 13:06:10 EDT ---

REVIEW: http://review.gluster.org/10809 (features/shard: Fix issue with readdir(p) fop) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Anand Avati on 2015-05-28 06:36:51 EDT ---

REVIEW: http://review.gluster.org/10809 (features/shard: Fix issue with readdir(p) fop) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Anand Avati on 2015-05-29 02:08:58 EDT ---

REVIEW: http://review.gluster.org/10809 (features/shard: Fix issue with readdir(p) fop) posted (#3) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Anand Avati on 2015-05-30 06:33:24 EDT ---

REVIEW: http://review.gluster.org/10809 (features/shard: Fix issue with readdir(p) fop) posted (#4) for review on master by Krutika Dhananjay (kdhananj)

Comment 1 Anand Avati 2015-06-01 11:29:47 UTC
REVIEW: http://review.gluster.org/11031 (features/shard: Fix issue with readdir(p) fop) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj)

Comment 2 Anand Avati 2015-06-03 03:46:42 UTC
COMMIT: http://review.gluster.org/11031 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 9d74710581262a570547f1dc6bba4e1750871864
Author: Krutika Dhananjay <kdhananj>
Date:   Mon May 18 18:06:32 2015 +0530

    features/shard: Fix issue with readdir(p) fop
    
            Backport of: http://review.gluster.org/10809
    
    Problem:
    
    When readdir(p) is performed on '/' and ".shard" happens to be
    the last of the entries read in a given iteration of dht_readdir(p)
    (in other words the entry with the highest offset in the dirent list
    sorted in ascending order of d_offs), shard xlator would delete this
    entry as part of handling the call so as to avoid exposing its presence
    to the application. This would cause xlators above (like fuse,
    readdir-ahead etc) to wind the next readdirp as part of the same req
    at an offset which is (now) the highest d_off (post deletion of .shard)
    from the previously unwound list of entries. This offset would be less
    than that of ".shard" and therefore cause /.shard to be read once again.
    If by any chance this happens to be the only entry until end-of-directory,
    shard xlator would delete this entry and unwind with 0 entries, causing the
    xlator(s) above to think there is nothing more to readdir and the fop is
    complete. This would prevent DHT from gathering entries from the rest of
    its subvolumes, causing some entries to disappear.
    
    Fix:
    
    At the level of shard xlator, if ".shard" happens to be the last entry,
    make shard xlator wind another readdirp at offset equal to d_off of
    ".shard". That way, if ".shard" happens to be the only other entry under '/'
    until end-of-directory, DHT would receive an op_ret=0. This would enable it
    to wind readdir(p) on the rest of its subvols and gather the complete picture.
    
    Also, fixed a bug in shard_lookup_cbk() wherein file_size should be fetched
    unconditionally in cbk since it is set unconditionally in the wind path, failing
    which, lookup would be unwound with ia_size and ia_blocks only equal to that of
    the base file.
    
    Change-Id: I0ff0b48b6c9c12edbef947b6840a77a54c131650
    BUG: 1226880
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/11031
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: NetBSD Build System <jenkins.org>

Comment 3 Niels de Vos 2015-06-20 09:49:17 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.2, please reopen this bug report.

glusterfs-3.7.2 has been announced on the Gluster Packaging mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/packaging/2015-June/000006.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.