Bug 1035182

Summary: DHT + rebalance : after rebalance crash many Directory has overlapping hash layout
Product: Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED DEFERRED QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: mzywusko, spalai, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286166 (view as bug list) Environment:
Last Closed: 2015-11-27 12:11:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1286166    

Description Rachana Patel 2013-11-27 09:07:00 UTC
Description of problem:
Rebalance process was crashed on all servers. 
so It's possible that those rebalance processes might be in the middle of fixing layout(setting new hash layout) and couldn't complete on all node due to crash. But even in that case directories having overlap can not be more than no. of rebalance process .

But we found many Directories having overlap layout.
e.g


[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x000000010000000099999999cccccccb   <-------------------


[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc1
trusted.gfid=0x5c3c20c395304ab1ad59e4969d84ca0b
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd


2)
[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x00000001000000003333333366666665          <---------------------

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff

[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc3
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc3
trusted.gfid=0xc0386717b0a64231aa83d260db2670cd
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc


3)

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x00000001000000006666666699999998   <-----------------------

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff

[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc2
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc2
trusted.gfid=0x71bdf4e4058b4923818dcde827a30b5a
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc


4)

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x0000000100000000ccccccccffffffff   <----------------------

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff


[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc4
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc4
trusted.gfid=0x200ecbd616694d8180985c3f68d3d86e
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc


5)

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x00000001000000000000000000000000

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x00000001000000000000000055555554

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x00000001000000000000000033333332    <------------------


[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff


[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc5
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc5
trusted.gfid=0xb6b2ff31d7f14a5a94275eeccf637faa
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9


6)

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x00000001000000000000000000000000

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x00000001000000000000000055555554

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x00000001000000000000000033333332   <---------------

[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc8
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc8
trusted.gfid=0x90036c7183ab40f09b5e21fa795d0619
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

7)
[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x00000001000000006666666699999998   <----------------

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick4/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick4/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x00000001000000006666666699999998

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick2/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

[root@7-VM4 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x00000001000000000000000000000000

[root@7-VM3 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

[root@7-VM1 ~]# getfattr -d -m . -e hex /rhs/brick1/f/mvs1/mvetc9
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/f/mvs1/mvetc9
trusted.gfid=0xc767aeb557094f4497556d7f2d7969b8
trusted.glusterfs.dht=0x00000001000000000000000055555554


there were many Directories like this..listing only few to prove that no. of Directories in such conditions were greater than no of rebalance process

Version-Release number of selected component (if applicable):
============================================
3.4.0.44rhs-1.el6rhs.x86_64

How reproducible:
==================
haven't tried


Steps to Reproduce:
====================
1. create and mount DHT volume. Create Data from mount point(Directory depth was 10)
2.add brick to volume and start rebalance.
3. while rebalance is in progress, perform rename operation for directories and files 
3. after 44+ hours rebalance process was crashed on all node and rebalance status was 'failed'

[root@7-VM1 core]#  gluster volume rebalance flat status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           832000        13.7GB       5344344             1           228         failed        159836.00
                            10.70.36.133          1009405        15.7GB       5362837             2           206         failed        159836.00
                            10.70.36.132           823206        12.9GB       5416604             1           233         failed        159836.00
                            10.70.36.131                0        0Bytes       5227829             0             0         failed        159836.00
volume rebalance: flat: success: 

4. verify hash layout of Directories to find overlap or holes. and as mentined above found overlap for many Directories

Actual results:
overlap in hash layout for many Directories

Expected results:
Even-though there was crash in rebalance process, no. of Directories having overlap/holes due to process in progress should not be more than no. of rebalance process.




Additional info:

Comment 6 Susant Kumar Palai 2015-11-27 12:11:49 UTC
Cloning this to 3.1. to be fixed in future release.