Bug 987327

Summary: Rebalance : Files missing on mount point after rebalance on Distribute -Replicate volume
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: senaik
Component: distributeAssignee: Raghavendra G <rgowdapp>
Status: CLOSED DEFERRED QA Contact: Anoop <annair>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: nsathyan, rhs-bugs, rwheeler, spalai, ssaha, surs, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286093 (view as bug list) Environment:
Last Closed: 2015-11-27 10:43:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1093043, 1286093, 1286585    

Description senaik 2013-07-23 09:09:31 UTC
Description of problem:
=========================
On a Distributed Replicate Volume, renaming some files, while Rebalance process was running, resulted in few files missing on the mount point after rebalance process completion . 


Version-Release number of selected component (if applicable):
=================================================================== 
3.4.0.12rhs.beta5-2.el6rhs.x86_64

How reproducible:
================= 



Steps to Reproduce:
================== 

1.Create a 2x2 distributed replicate volume and start it 

2.Mount the volume and create some files 
for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done

3.Calculate are-equal check sum on mount point before starting rebalance
[root@RHEL6 Vol4]# /opt/qa/tools/arequal-checksum /mnt/Vol4/

Entry counts
Regular files   : 500
Directories     : 1
Symbolic links  : 0
Other           : 0
Total           : 501

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 42c1a6fabb8c9a6e4729e4047d1810e1
Directories     : 30312a00
Symbolic links  : 0
Other           : 0
Total           : 5e842fef6a5a08f

4. Add 2 bricks and start rebalance 

5. While rebalance is in progress , rename some files on mount point 

gluster v rebalance Vol4 status

Node  Rebalanced-files  size     scanned    failures  status run time in secs
----- ----------------  -----    -------    --------  ------ -----------------
localhost      1        10.0MB       3         0      in progress     4.00
10.70.34.88    0        0Bytes     282         0      in progress     3.00
10.70.34.86    0        0Bytes     283         0      in progress     4.00
10.70.34.87    0        0Bytes     271         2      in progress     4.00

On the mount point : 
-------------------- 
for i in {11..400} ; do mv f"$i" files"$i" ; done


6. After rebalance is completed, calculate the are-equal checksum on the mount point 

[root@RHEL6 Vol4]# /opt/qa/tools/arequal-checksum /mnt/Vol4/

Entry counts
Regular files   : 490
Directories     : 1
Symbolic links  : 0
Other           : 0
Total           : 491

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : e0d372ae49b9765121e4d799eca3b6f6
Directories     : 3a0d090000302f00
Symbolic links  : 0
Other           : 0
Total           : fb3aac37a52aefa7

file count has changed from 500 to 490 after rebalance 

Files missing on the mount point : 
----------------------------------- 
files113
files143
files178
files201
files259
files288
files344
files386
files79
files86


Actual results:
=============== 
Few files missing on the mount point after rebalance process 

Expected results:
================= 
There should be no files missing on the mount point after rebalance process 


Additional info:
================
[root@boost ~]# gluster v i Vol4
 
Volume Name: Vol4
Type: Distributed-Replicate
Volume ID: c66bae64-1ee2-4e0f-a339-429b7892d052
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.34.85:/rhs/brick1/E1
Brick2: 10.70.34.86:/rhs/brick1/E2
Brick3: 10.70.34.87:/rhs/brick1/E3
Brick4: 10.70.34.88:/rhs/brick1/E4
Brick5: 10.70.34.85:/rhs/brick1/E5
Brick6: 10.70.34.86:/rhs/brick1/E6

Comment 4 shishir gowda 2013-07-23 12:02:52 UTC
This works as expected in a pure distribute volume. Bug 976755 is tracking the distribute only bug

Comment 5 shishir gowda 2013-07-24 03:46:49 UTC
Could you please provide the dump of all the xattrs(all the bricks) of the files  that are missing from the mount?

Comment 6 senaik 2013-07-24 06:17:29 UTC
[root@jay brick1]# ls -l */files113
---------T. 2 root root        0 Jul 23 13:54 E2/files113

file: E2/files113
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0xd34689a687f94f46ab0c67957dc7c038
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200

============================================================
[root@jay brick1]# ls -l */files143

---------T. 2 root root        0 Jul 23 13:54 E2/files143

# file: E2/files143
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0x472a391ef9dd4ae68b9d90035317832b
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200

===================================================================== 
[root@jay brick1]# ls -l */files178
---------T. 2 root root        0 Jul 23 13:54 E2/files178

 file: E2/files178
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0xf3138df26a354cb3975a54bc0a9a29a2
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200

==================================================================== 
[root@jay brick1]# ls -l */files201
---------T. 2 root root        0 Jul 23 13:54 E2/files201

file: E2/files201
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0x6c557eb4152c459681c375a058d7ce7b
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200

====================================================================== 
[root@jay brick1]# ls -l */files259

---------T. 2 root root        0 Jul 23 13:54 E2/files259

file: E2/files259
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0x42599c8016f74f468b6424dcf89c831d
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200

====================================================================== 
[root@jay brick1]# ls -l */files288
---------T. 2 root root        0 Jul 23 13:54 E2/files288

file: E2/files288
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0x6a6d3190d364416faa9761034b9d3ae8
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200

================================================================= 
[root@jay brick1]# ls -l */files344

---------T. 2 root root        0 Jul 23 13:54 E2/files344

# file: E2/files344
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0x1b22588b535a46ec96d7bdd348b6722d
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200

================================================================ 

[root@jay brick1]# ls -l */files386

---------T. 2 root root        0 Jul 23 13:54 E2/files386

file: E2/files386
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0x1077737010084bdbad3edeeb614cf361
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200


================================================================== 
[root@jay brick1]# ls -l */files79
---------T. 2 root root        0 Jul 23 13:53 E2/files79


 file: E2/files79
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0x839665ddf0f74b38880fbd351ec5c876
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200


===================================================================== 
[root@jay brick1]# ls -l */files86
---------T. 2 root root        0 Jul 23 13:54 E2/files86

file: E2/files86
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.Vol4-client-0=0x000000000000000000000000
trusted.afr.Vol4-client-1=0x000000000000000000000000
trusted.gfid=0xd0156f0d1ec84e8bb3688cecb114355b
trusted.glusterfs.dht.linkto=0x566f6c342d7265706c69636174652d3200

Comment 7 shishir gowda 2013-07-24 07:00:19 UTC
Please dump the linkto xattrs as text.

Comment 8 senaik 2013-07-24 07:21:00 UTC
dht.linkto value is same across all the missing files 

Below is the text format :
trusted.glusterfs.dht.linkto=Vol4-replicate-2

Comment 9 shishir gowda 2013-07-24 10:00:39 UTC
Would a unmount and remount make the files visible again?

Comment 10 senaik 2013-07-26 09:27:39 UTC
Shishir , 
I tried to unmount and remount the volume , files were still not visible .

Comment 11 Amar Tumballi 2013-07-30 11:54:13 UTC
With rebalance (or remove-brick start) operation in progress and one doing a 'rename' (ie, mv command) operations on the files getting rebalanced, we have certain race conditions which are causing this particular bug.

This bug existed from day0 of rebalance process, and is not a regression. Development needs time to find out the root cause of this, and then take appropriate action. Hence requesting to take down the 'blocker' flag (please set it back if this comment is not sufficient to agree).

Comment 12 Nagaprasad Sathyanarayana 2014-05-27 12:33:51 UTC
Question: How to stop the re-balance process totally.
Ans:
[Raghavendra G]
Currently we start rebalance process automatically only on execution of remove-brick. Otherwise it has to be started manually. So, we don't have to do anything explicitly to "stop" rebalance process if we've not started it ourselves.

If there is a rebalance process already running, we can use:
# gluster volume rebalance <VOLNAME> [force] stop

And the usage format for rebalance command is:
[root@unused src]# gluster volume  rebalance
Usage: volume rebalance <VOLNAME> {{fix-layout start} | {start [force]|stop|status}}