Bug 1149727 - An 'ls' can return invalid contents on a dispersed volume before self-heal repairs a damaged directory
Summary: An 'ls' can return invalid contents on a dispersed volume before self-heal re...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
: 1159485 (view as bug list)
Depends On: 1149726
Blocks: glusterfs-3.6.0
TreeView+ depends on / blocked
 
Reported: 2014-10-06 14:27 UTC by Xavi Hernandez
Modified: 2014-11-10 15:14 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.6.1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1149726
Environment:
Last Closed: 2014-11-10 15:14:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Xavi Hernandez 2014-10-06 14:27:18 UTC
+++ This bug was initially created as a clone of Bug #1149726 +++

Description of problem:

When a directory on a dispersed volume is damaged (for example directory contents are lost), if an 'ls' on that directory is done before forcing a self-heal, sometimes it returns an empty diretory.

Version-Release number of selected component (if applicable): master


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2014-10-20 13:21:03 UTC
REVIEW: http://review.gluster.org/8946 (ec: Fix self-heal issues) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 2 Anand Avati 2014-10-21 14:17:24 UTC
REVIEW: http://review.gluster.org/8946 (ec: Fix self-heal issues) posted (#2) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 3 Anand Avati 2014-10-21 18:59:24 UTC
REVIEW: http://review.gluster.org/8946 (ec: Fix self-heal issues) posted (#3) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 4 Anand Avati 2014-10-22 08:05:23 UTC
COMMIT: http://review.gluster.org/8946 committed in release-3.6 by Vijay Bellur (vbellur) 
------
commit 4522acc20bdd1ca17c053969ef7edce1bb6ede76
Author: Xavier Hernandez <xhernandez>
Date:   Wed Oct 8 09:20:11 2014 +0200

    ec: Fix self-heal issues
    
    Problem: Doing an 'ls' of a directory that has been modified while one
             of the bricks was down, sometimes returns the old directory
             contents.
    
    Cause: Directories are not marked when they are modified as files are.
           The ec xlator balances requests amongst available and healthy
           bricks. Since there is no way to detect that a directory is
           out of date in one of the bricks, it is used from time to time
           to return the directory contents.
    
    Solution: Basically the solution consists in use versioning information
              also for directories, however some additional changes have
              been necessary.
    
    Changes:
    
     * Use directory versioning:
    
         This required to lock full directory instead of a single entry for
         all requests that add or remove entries from it. This is needed to
         allow atomic version update. This affects the following fops:
    
             create, mkdir, mknod, link, symlink, rename, unlink, rmdir
    
         Another side effect is that opendir requires to do a previous
         lookup to get versioning information and discard out of date
         bricks for subsequent readdir(p) calls.
    
     * Restrict directory self-heal:
    
         Till now, when one discrepancy was found in lookup, a self-heal
         was automatically started. This caused the versioning information
         of a bad directory to be healed instantly, making the original
         problem to reapear again.
    
         To solve this, when a missing directory is detected in one or more
         bricks on lookup or opendir fops, only a partial self-heal is
         performed on it. A partial self-heal basically creates the
         directory but does not restore any additional information.
    
         This avoids that an 'ls' could repair the directory and cause the
         problem to happen again. With this change, output of 'ls' is
         always consistent. However, since the directory has been created
         in the brick, this allows any other operation on it (create new
         files, for example) to succeed on all bricks and not add additional
         work to the self-heal process.
    
         To force a self-heal of a directory, any other operation must be
         done on it. For example a getxattr.
    
         With these changes, the correct healing procedure that would avoid
         inconsistent directory browsing consists on a post-order traversal
         of directoriesi being healed. This way, the directory contents will
         be healed before healing the directory itslef.
    
     * Additional changes to fix self-heal errors
    
         - Don't use fop->fd to decide between fd/loc.
    
             open, opendir and create have an fd, but the correct data is in
             loc.
    
         - Fix incorrect management of bad bricks per inode/fd.
    
         - Fix incorrect selection of fop's target bricks when there are bad
           bricks involved.
    
         - Improved ec_loc_parent() to always return a parent loc as
           complete as possible.
    
    This is a backport of http://review.gluster.org/8916/
    
    Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad
    BUG: 1149727
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/8946
    Reviewed-by: Dan Lambright <dlambrig>
    Tested-by: Gluster Build System <jenkins.com>

Comment 5 Niels de Vos 2014-11-04 12:35:04 UTC
*** Bug 1159485 has been marked as a duplicate of this bug. ***

Comment 6 Niels de Vos 2014-11-10 15:14:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users


Note You need to log in before you can comment on or make changes to this bug.