Bug 1159498 - when replace one brick on disperse volume, ls sometimes goes wrong
Summary: when replace one brick on disperse volume, ls sometimes goes wrong
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.6.0
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On: 1163760
Blocks: glusterfs-3.6.2
TreeView+ depends on / blocked
 
Reported: 2014-11-01 09:06 UTC by lidi
Modified: 2015-01-28 14:27 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1163760 (view as bug list)
Environment:
Last Closed: 2015-01-28 14:27:51 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description lidi 2014-11-01 09:06:30 UTC
Steps to Reproduce:
1.gluster vol create test disperse 3 redundancy 1 10.10.21.20:/sdb 10.10.21.21:/sdb 10.10.21.22:/sdb force;
2.start the volume and mount it on /cluster2/test
3.cd /cluster2/test
4.mkdir a b c
5.touch a/1 b/2 c/3
6.gluster vol replace-brick test 10.10.21.22:/sdb 10.10.21.23:/sdb commit force
7.execute 'ls /cluster2/test/a' multiple times


Actual results:
sometimes 'ls /cluster2/test/a' can not list the file 1

Comment 1 Xavi Hernandez 2014-11-04 12:39:51 UTC
Are you using 3.6.0beta3 ? if that's the case, this problem should already be solved in latest version (see bug #1149727)

Comment 2 lidi 2014-11-05 02:07:29 UTC
I use official 3.6.0 for this test.

Comment 3 Xavi Hernandez 2014-11-10 15:43:11 UTC
I've tried to reproduce this bug repeating your steps using version 3.6.0 and I'm not able to see this problem. There was a bug on 3.6.0beta3 that caused this problem, but it should be solved.

Can you reproduce this problem with a 3.6.0 and a newly created volume using this version ?

Comment 4 lidi 2014-11-11 03:43:19 UTC
I got the source code form git://forge.gluster.org/glusterfs-core/glusterfs.git,and branch is release-3.6

I reformat all the disks, create a new volume and test again.

Then I found I made a mistake on  previous description. 

The step 7 should be : "ls a" multiple times;"cd a"; "ls" multiple times,then you'll see what I described.

Comment 5 Anand Avati 2014-11-13 13:06:26 UTC
REVIEW: http://review.gluster.org/9118 (ec: Avoid self-heal on directories on (f)stat calls) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)

Comment 6 Anand Avati 2014-11-15 18:01:36 UTC
COMMIT: http://review.gluster.org/9118 committed in release-3.6 by Vijay Bellur (vbellur) 
------
commit b01660c5d7cf4a59a85a8edc3c816e4585aa211b
Author: Xavier Hernandez <xhernandez>
Date:   Thu Nov 13 13:55:36 2014 +0100

    ec: Avoid self-heal on directories on (f)stat calls
    
    To avoid inconsistent directory listings, a full self-heal
    cannot happen on a directory until all its contents have
    been healed. This is controlled by a manual command using
    getfattr recursively and in post-order.
    
    While navigating the directories, sometimes an (f)stat fop
    can be sent. This fop caused a full self-heal of the directory.
    
    This patch makes that (f)stat only initiates a partial self-heal.
    
    This is a backport of http://review.gluster.org/9117/
    
    Change-Id: I0a92bda8f4f9e43c1acbceab2d7926944a8a4d9a
    BUG: 1159498
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/9118
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Dan Lambright <dlambrig>


Note You need to log in before you can comment on or make changes to this bug.