Bug 1032927 - Remove-brick with self-heal causes data loss
Summary: Remove-brick with self-heal causes data loss
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On: 1032558
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-21 09:19 UTC by Pranith Kumar K
Modified: 2014-04-17 11:51 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 1032558
Environment:
Last Closed: 2014-04-17 11:51:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Pranith Kumar K 2013-11-21 09:20:53 UTC
Steps to Reproduce:
1. created a 6x2 distributed-replicate volume using 3 nodes in the cluster
2. Fill up volume with files and deep directories upto depth 5
3. started remove-brick of one of the pair using
 gluster volume remove-brick <vol>  <b1> <b2> start
4. while remove-brick is in progress reboot one of the node so that after it comes back self-heal will be triggered
5. after remove-brick completed check the areequal checksum on the mount

Actual results:

There will be few file lost from the mount point

areequal before
--------------
 [root@rhs-client4 lo]# /opt/qa/tools/arequal-checksum .


Entry counts
Regular files   : 9330
Directories     : 9331
Symbolic links  : 0
Other           : 0
Total           : 18661

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 00
Directories     : 10000002e01
Symbolic links  : 0
Other           : 0
Total           : 10000002e01


areequal after
-------------
[root@rhs-client4 lo]# /opt/qa/tools/arequal-checksum  .

Entry counts
Regular files   : 9327
Directories     : 9331
Symbolic links  : 0
Other           : 0
Total           : 18658

Metadata checksums
Regular files   : 4bc885
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : d26028d3461d4fd7daf74b8d71b5ed7c
Directories     : 362e656c4767
Symbolic links  : 0
Other           : 0
Total           : 897557052c4e5cc

Comment 2 Anand Avati 2013-11-26 06:01:58 UTC
REVIEW: http://review.gluster.org/6332 (cluster/afr: Provide HA for pathinfo getxattr) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2013-11-26 08:34:23 UTC
COMMIT: http://review.gluster.org/6332 committed in master by Vijay Bellur (vbellur) 
------
commit 1d554b179f63a5a56ae447f2a5b0044c49ae2642
Author: Pranith Kumar K <pkarampu>
Date:   Thu Nov 21 16:17:32 2013 +0530

    cluster/afr: Provide HA for pathinfo getxattr
    
    Problem:
    afr_[f]getxattr_pathinfo_cbks fail the fop even when it succeeded on
    one of the bricks. This can happen if the last response to pathinfo
    [f]getxattr is a failure.
    
    Fix:
    Remember if any of the [f]getxattr_pathinfos are successful and send
    that as the op_ret/op_errno value to the xlators above.
    
    Note:
    Winding fop to a client xlator that is not connected to server produces
    an error log. Preventing that by not even winding fop when client xlator
    is DOWN.
    
    Change-Id: I846e8c47423ffcfa2eabffe8924534781a36841a
    BUG: 1032927
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/6332
    Reviewed-by: Vijay Bellur <vbellur>
    Tested-by: Gluster Build System <jenkins.com>

Comment 4 Anand Avati 2013-11-26 18:37:54 UTC
COMMIT: http://review.gluster.org/6341 committed in master by Vijay Bellur (vbellur) 
------
commit 2f218e1335d5fdab0b41716cc5c8976b20c367f6
Author: Pranith Kumar K <pkarampu>
Date:   Fri Nov 22 12:09:53 2013 +0530

    cluster/dht: Handle Link-info getxattr failure in rebalance
    
    When getxattr fails with errno other than ENODATA fail rebalance
    on that file. Log the reason for error.
    
    Change-Id: Ia519870b88e6e6dd464d1c0415411aa999f80bc9
    BUG: 1032927
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/6341
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Shishir Gowda <sgowda>
    Reviewed-by: Shyamsundar Ranganathan <srangana>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 5 Niels de Vos 2014-04-17 11:51:04 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.