Bug 1035182 - DHT + rebalance : after rebalance crash many Directory has overlapping hash layout
Summary: DHT + rebalance : after rebalance crash many Directory has overlapping hash l...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: 2.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1286166
TreeView+ depends on / blocked
 
Reported: 2013-11-27 09:07 UTC by Rachana Patel
Modified: 2015-11-27 12:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1286166 (view as bug list)
Environment:
Last Closed: 2015-11-27 12:11:49 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Rachana Patel 2013-11-27 09:07:00 UTC
Description of problem:
Rebalance process was crashed on all servers. 
so It's possible that those rebalance processes might be in the middle of fixing layout(setting new hash layout) and couldn't complete on all node due to crash. But even in that case directories having overlap can not be more than no. of rebalance process .

But we found many Directories having overlap layout.
e.g


[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x000000010000000099999999cccccccb   <-------------------


[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd


2)
[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x00000001000000003333333366666665          <---------------------

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff

[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc


3)

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x00000001000000006666666699999998   <-----------------------

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff

[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc


4)

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x0000000100000000ccccccccffffffff   <----------------------

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff


[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc


5)

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x00000001000000000000000000000000

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x00000001000000000000000055555554

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x00000001000000000000000033333332    <------------------


[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff


[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9


6)

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x00000001000000000000000000000000

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x00000001000000000000000055555554

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x00000001000000000000000033333332   <---------------

[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

7)
[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x00000001000000006666666699999998   <----------------

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x00000001000000006666666699999998

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x00000001000000000000000000000000

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x00000001000000000000000055555554


there were many Directories like this..listing only few to prove that no. of Directories in such conditions were greater than no of rebalance process

Version-Release number of selected component (if applicable):
============================================
3.4.0.44rhs-1.el6rhs.x86_64

How reproducible:
==================
haven't tried


Steps to Reproduce:
====================
1. create and mount DHT volume. Create Data from mount point(Directory depth was 10)
2.add brick to volume and start rebalance.
3. while rebalance is in progress, perform rename operation for directories and files 
3. after 44+ hours rebalance process was crashed on all node and rebalance status was 'failed'

[root@7-VM1 core]#  gluster volume rebalance flat status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           832000        13.7GB       5344344             1           228         failed        159836.00
                            10.70.36.133          1009405        15.7GB       5362837             2           206         failed        159836.00
                            10.70.36.132           823206        12.9GB       5416604             1           233         failed        159836.00
                            10.70.36.131                0        0Bytes       5227829             0             0         failed        159836.00
volume rebalance: flat: success: 

4. verify hash layout of Directories to find overlap or holes. and as mentined above found overlap for many Directories

Actual results:
overlap in hash layout for many Directories

Expected results:
Even-though there was crash in rebalance process, no. of Directories having overlap/holes due to process in progress should not be more than no. of rebalance process.




Additional info:

Comment 6 Susant Kumar Palai 2015-11-27 12:11:49 UTC
Cloning this to 3.1. to be fixed in future release.


Note You need to log in before you can comment on or make changes to this bug.