Bug 1101143 - link file under .glusterfs directory not found for a directory
Summary: link file under .glusterfs directory not found for a directory
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
: 1152956 (view as bug list)
Depends On: 920199
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-26 09:13 UTC by Pranith Kumar K
Modified: 2015-02-03 06:15 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.6.0beta1
Clone Of: 920199
Environment:
Last Closed: 2014-11-11 08:33:13 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Pranith Kumar K 2014-05-26 09:39:36 UTC
Found the root cause for the bug:
    Directory rename while a brick is down can cause gfid handle of that directory to be deleted until next lookup happens on that directory.
    
    *) Self-heal does not have intelligence to detect renames at the moment. So it
    has to delete the directory 'd' using special flags, because it has to perform
    'rm -rf' of that directory as it is not empty. Posix xlator implements this by
    renaming the directory deleted to 'landfill' directory in '.glusterfs' where
    janitor thread will perform actual rm -rf by traversing the directory. Janitor
    thread wakes up every 10 minutes to check if there are any directories to be
    deleted and deletes them. As part of deleting it also deletes the gfid-handles.
    
    Steps to hit the problem:
    1) On a replicate volume create a directory 'd', file in 'd' called 'f' so the
       directory 'd' is not empty.
    
    2) bring one of the bricks down (lets call it brick-a, the other one is brick-b
    
    3) Rename d to d1
    
    4) When brick-a comes online again, self-heal deletes directory 'd' and creates
       directory 'd1' on brick-a for performing self-heal. So on brick-a,
       gfid-handle of 'd' pointing to 'da is deleted and recreated to point to 'd1'.
    
    5) This directory 'b' with all its directory hierarchy (for now just the file
       'f') will be under 'landfill' directory.
    
    6) When janitor thread wakes up and deletes directory 'd' and gfid-handle of
       'd' without realizing that it is now pointing to 'd1'. Thus 'd1' loses its
       gfid-handle

Comment 2 Anand Avati 2014-05-26 17:23:22 UTC
REVIEW: http://review.gluster.org/7879 (storage/posix: Janitor should guard against dir renames) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2014-06-02 10:10:19 UTC
REVIEW: http://review.gluster.org/7879 (storage/posix: Janitor should guard against dir renames) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Anand Avati 2014-06-02 10:17:23 UTC
REVIEW: http://review.gluster.org/7879 (storage/posix: Janitor should guard against dir renames) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2014-06-02 10:20:12 UTC
REVIEW: http://review.gluster.org/7879 (storage/posix: Janitor should guard against dir renames) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Anand Avati 2014-06-11 09:29:37 UTC
REVIEW: http://review.gluster.org/7879 (storage/posix: Janitor should guard against dir renames.) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 7 Anand Avati 2014-06-11 11:12:08 UTC
REVIEW: http://review.gluster.org/7879 (storage/posix: Janitor should guard against dir renames.) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 8 Anand Avati 2014-06-12 08:10:54 UTC
COMMIT: http://review.gluster.org/7879 committed in master by Anand Avati (avati) 
------
commit d240958fb36e652a2b910fe79414fb8b934e6158
Author: Pranith Kumar K <pkarampu>
Date:   Fri May 23 12:51:28 2014 +0530

    storage/posix: Janitor should guard against dir renames.
    
    Problem:
    Directory rename while a brick is down can cause gfid handle of that directory
    to be deleted until next lookup happens on that directory.
    
    *) Self-heal does not have intelligence to detect renames at the moment. So it
    has to delete the directory 'd' using special flags, because it has to perform
    'rm -rf' of that directory as it is not empty. Posix xlator implements this by
    renaming the directory deleted to 'landfill' directory in '.glusterfs' where
    janitor thread will perform actual rm -rf by traversing the directory. Janitor
    thread wakes up every 10 minutes to check if there are any directories to be
    deleted and deletes them. As part of deleting it also deletes the gfid-handles.
    
    Steps to hit the problem:
    1) On a replicate volume create a directory 'd', file in 'd' called 'f' so the
       directory 'd' is not empty.
    
    2) bring one of the bricks down (lets call it brick-a, the other one is brick-b
    
    3) Rename d to d1
    
    4) When brick-a comes online again, self-heal deletes directory 'd' and creates
       directory 'd1' on brick-a for performing self-heal. So on brick-a,
       gfid-handle of 'd' pointing to 'da is deleted and recreated to point to 'd1'.
    
    5) This directory 'b' with all its directory hierarchy (for now just the file
       'f') will be under 'landfill' directory.
    
    6) When janitor thread wakes up and deletes directory 'd' and gfid-handle of
       'd' without realizing that it is now pointing to 'd1'. Thus 'd1' loses its
       gfid-handle
    
    Fix:
    Delete gfid-handle for a directory only when the gfid-handle is stale.
    
    Change-Id: I21265b3bd3852f0967d916aaa21108ae5c9e7373
    BUG: 1101143
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/7879
    Reviewed-by: Niels de Vos <ndevos>
    Reviewed-by: Xavier Hernandez <xhernandez>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 9 Anand Avati 2014-06-13 03:14:55 UTC
REVIEW: http://review.gluster.org/8055 (tests: Revert janitor link file removal test) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 10 Anand Avati 2014-06-13 03:35:37 UTC
COMMIT: http://review.gluster.org/8055 committed in master by Vijay Bellur (vbellur) 
------
commit 333d2a60eb121a4f09a0c3c0d35d8585af86bdd1
Author: Pranith Kumar K <pkarampu>
Date:   Fri Jun 13 08:42:25 2014 +0530

    tests: Revert janitor link file removal test
    
    I found that order of execution in afr-v2 self-heal is causing
    the links to disappear some times. I need to fix that issue
    and then submit this test again
    
    Change-Id: Ia886feb796b7854645813f486b7b7ac4e944ed17
    BUG: 1101143
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8055
    Reviewed-by: Vijay Bellur <vbellur>
    Tested-by: Vijay Bellur <vbellur>

Comment 11 Niels de Vos 2014-09-22 12:41:00 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 12 Niels de Vos 2014-11-11 08:33:13 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users

Comment 13 Pranith Kumar K 2015-02-03 06:15:36 UTC
*** Bug 1152956 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.