This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1031971 - DHT:SELF-HEAL:Remove-brick with self-heal causes data loss
DHT:SELF-HEAL:Remove-brick with self-heal causes data loss
Status: CLOSED DUPLICATE of bug 1032558
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
Sudhir D
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-19 04:26 EST by shylesh
Modified: 2013-11-20 06:58 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-20 06:58:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description shylesh 2013-11-19 04:26:57 EST
Description of problem:
After starting remove-brick on a distributed-replicate volume , if self-heal is triggered it will end up with data loss

Version-Release number of selected component (if applicable):
glusterfs-fuse-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.44rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.44rhs-1.el6rhs.x86_64


How reproducible:
Always

Steps to Reproduce:
1. created a 5x2 distributed-replicate volume  and enabled quota with limit 1TB
2. kill one of the brick from any replica pair
3. created some data of deep directory depth on the mount point
4. start remove-brick operation on any pair (except the one in which one of the brick is down) using remove-brick start
 gluster v remove-brick $vol <brick1> <brick2> start
5. while migration is in progress forcefully start the volume so that all the bricks will be up and heal starts
 gluster volume start <vol> force
6. check the remove-brick status till migration completes
7. once the migration is complete commit the remove-brick operation
 gluster v remove-brick <vol> <brick1> <brick2> commit
8. Now check the number of files on the mount point

Actual results:
There will be data loss, some of the files are missing
 
For every file missing we can see heal info from rebalance logs

[2013-11-19 04:27:49.791226] I [dht-common.c:2644:dht_setxattr] 0-rebal-dht: fixing the layout of /5/2/5/1
[2013-11-19 04:27:49.795429] I [dht-rebalance.c:1116:gf_defrag_migrate_data] 0-rebal-dht: migrate data called on /5/2/5/1
[2013-11-19 04:27:49.816696] I [afr-self-heal-common.c:2843:afr_log_self_heal_completion_status] 0-rebal-replicate-0:  on /5/2/5/1
[2013-11-19 04:27:49.866120] I [afr-self-heal-common.c:2843:afr_log_self_heal_completion_status] 0-rebal-replicate-0:  gfid or missing entry self hea
l  is successfully completed, on /5/2/5/1/file.1
[2013-11-19 04:27:49.867329] I [dht-rebalance.c:1333:gf_defrag_migrate_data] 0-rebal-dht: Migration operation on dir /5/2/5/1 took 0.07 secs


[root@rhs-client4 mnt1]# ll 5/2/5/1/file.1
ls: cannot access 5/2/5/1/file.1: No such file or directory


pair1
------
[root@rhs-client4 1]# getfattr -d -m . -e hex /home/rebal0/5/2/5/1                                                                                
getfattr: Removing leading '/' from absolute path names
# file: home/rebal0/5/2/5/1
trusted.afr.rebal-client-0=0x000000000000000000000000
trusted.afr.rebal-client-1=0x000000000000000000000000
trusted.gfid=0xd21bafa9759c417d8e147e3444aa38e3
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f096b5c9-2558-4985-a570-fb2596026c1f.contri=0x0000000000400000
trusted.glusterfs.quota.size=0x0000000000400000


[root@rhs-client9 ~]# getfattr -d -m . -e hex /home/rebal1/5/2/5/1
getfattr: Removing leading '/' from absolute path names
# file: home/rebal1/5/2/5/1
trusted.afr.rebal-client-0=0x000000000000000000000000
trusted.afr.rebal-client-1=0x000000000000000000000000
trusted.gfid=0xd21bafa9759c417d8e147e3444aa38e3
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f096b5c9-2558-4985-a570-fb2596026c1f.contri=0x0000000000400000
trusted.glusterfs.quota.size=0x0000000000400000


pair2
-------
[root@rhs-client39 ~]#  getfattr -d -m . -e hex /home/rebal2/5/2/5/1
getfattr: Removing leading '/' from absolute path names
# file: home/rebal2/5/2/5/1
trusted.gfid=0xd21bafa9759c417d8e147e3444aa38e3
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f096b5c9-2558-4985-a570-fb2596026c1f.contri=0x0000000000100000
trusted.glusterfs.quota.size=0x0000000000100000

[root@rhs-client4 1]# getfattr -d -m . -e hex /home/rebal3/5/2/5/1                                                                                
getfattr: Removing leading '/' from absolute path names
# file: home/rebal3/5/2/5/1
trusted.gfid=0xd21bafa9759c417d8e147e3444aa38e3
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f096b5c9-2558-4985-a570-fb2596026c1f.contri=0x0000000000100000
trusted.glusterfs.quota.size=0x0000000000100000


pair3
------
[root@rhs-client9 ~]# getfattr -d -m . -e hex /home/rebal4/5/2/5/1
getfattr: Removing leading '/' from absolute path names
# file: home/rebal4/5/2/5/1
trusted.gfid=0xd21bafa9759c417d8e147e3444aa38e3
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f096b5c9-2558-4985-a570-fb2596026c1f.contri=0x0000000000000000
trusted.glusterfs.quota.size=0x0000000000000000

[root@rhs-client39 ~]# getfattr -d -m . -e hex /home/rebal5/5/2/5/1
getfattr: Removing leading '/' from absolute path names
# file: home/rebal5/5/2/5/1
trusted.gfid=0xd21bafa9759c417d8e147e3444aa38e3
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f096b5c9-2558-4985-a570-fb2596026c1f.contri=0x0000000000000000
trusted.glusterfs.quota.size=0x0000000000000000


pair4
------
[root@rhs-client4 1]# getfattr -d -m . -e hex /home/rebal6/5/2/5/1                                                                                
getfattr: Removing leading '/' from absolute path names
# file: home/rebal6/5/2/5/1
trusted.gfid=0xd21bafa9759c417d8e147e3444aa38e3
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f096b5c9-2558-4985-a570-fb2596026c1f.contri=0x0000000000000000
trusted.glusterfs.quota.size=0x0000000000000000

[root@rhs-client9 ~]# getfattr -d -m . -e hex /home/rebal7/5/2/5/1
getfattr: Removing leading '/' from absolute path names
# file: home/rebal7/5/2/5/1
trusted.gfid=0xd21bafa9759c417d8e147e3444aa38e3
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f096b5c9-2558-4985-a570-fb2596026c1f.contri=0x0000000000000000
trusted.glusterfs.quota.size=0x0000000000000000






 

 
More info
----------

[root@rhs-client4 mnt1]# gluster v info rebal
 
Volume Name: rebal
Type: Distributed-Replicate
Volume ID: d29ec985-e908-4f0d-9e51-39ed79bf24f2
Status: Started
Number of Bricks: 5 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/rebal0
Brick2: rhs-client9.lab.eng.blr.redhat.com:/home/rebal1
Brick3: rhs-client39.lab.eng.blr.redhat.com:/home/rebal2
Brick4: rhs-client4.lab.eng.blr.redhat.com:/home/rebal3
Brick5: rhs-client9.lab.eng.blr.redhat.com:/home/rebal4
Brick6: rhs-client39.lab.eng.blr.redhat.com:/home/rebal5
Brick7: rhs-client4.lab.eng.blr.redhat.com:/home/rebal6
Brick8: rhs-client9.lab.eng.blr.redhat.com:/home/rebal7
Brick9: rhs-client39.lab.eng.blr.redhat.com:/home/rebal8------>decommissioned
Brick10: rhs-client4.lab.eng.blr.redhat.com:/home/rebal9------>decommissioned
Options Reconfigured:
features.quota: on




Cluster info
-----------
rhs-client9.lab.eng.blr.redhat.com
rhs-client39.lab.eng.blr.redhat.com
rhs-client4.lab.eng.blr.redhat.com

Mounted on
-----------
rhs-client4.lab.eng.blr.redhat.com:/mnt1


Attached the sosreports
Comment 3 shylesh 2013-11-20 06:58:24 EST

*** This bug has been marked as a duplicate of bug 1032558 ***

Note You need to log in before you can comment on or make changes to this bug.