Bug 987422 - Rebalance : NFS mount : few files missing on mount point after rebalance process
Rebalance : NFS mount : few files missing on mount point after rebalance process
Status: CLOSED DEFERRED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute (Show other bugs)
2.1
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Nithya Balachandran
storage-qa-internal@redhat.com
dht-file-access
:
Depends On: 969298 1130888 1138395 1139998 1140348 1146895
Blocks: 1286059
  Show dependency treegraph
 
Reported: 2013-07-23 07:15 EDT by senaik
Modified: 2015-11-27 05:26 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Renaming of files during rebalance/remove-brick runs might lead to data/files missing from mount-points. Consequence: Few renamed files will not be listed on the mount. Due to which applications might see failures. But the file will be available on the backend. Fix: RCA in progress Result:
Story Points: ---
Clone Of:
: 1286059 (view as bug list)
Environment:
Last Closed: 2015-11-27 05:25:20 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description senaik 2013-07-23 07:15:14 EDT
Description of problem:
=========================
Renaming some files, while Rebalance process was running, resulted in few files missing on the mount point after rebalance process completion . 

Version-Release number of selected component (if applicable):
============================================================= 
3.4.0.12rhs.beta5-2.el6rhs.x86_64


How reproducible:
=================== 

Steps to Reproduce:
=================== 
1.Create a distribute volume with 4 bricks and start it 

2.NFS mount the volume and create some files 
for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done

3.Calculate are-equal check sum on mount point before starting rebalance

[root@RHEL6 vOL5]# /opt/qa/tools/arequal-checksum /mnt/vOL5/

Entry counts
Regular files   : 500
Directories     : 1
Symbolic links  : 0
Other           : 0
Total           : 501

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : e9e46cb6bd7afeb5e1695e1113e67b5
Directories     : 30312a00
Symbolic links  : 0
Other           : 0
Total           : e7f2f9579c75b300

4. Add brick and start rebalance 

5. While rebalance is running , rename some files 

Node   Rebalanced-files  size   scanned  failures   status   run time in secs
-----  ---------------- ------ ---------  --------  -------  ----------------
 
localhost     6         60.0MB   7     0       in progress    3.00
10.70.34.88   0         0Bytes   163   0       in progress    3.00
10.70.34.86   0         0Bytes   144   0       in progress    3.00
10.70.34.87   0         0Bytes   166   0       in progress    3.00

volume rebalance: Vol5: success:
On the mount point : 
-------------------- 
for i in {11..400} ; do mv f"$i" files"$i" ; done

6)After rebalance is completed, calculate the are-equal checksum on the mount point 

[root@RHEL6 vOL5]# /opt/qa/tools/arequal-checksum /mnt/vOL5/

Entry counts
Regular files   : 496
Directories     : 1
Symbolic links  : 0
Other           : 0
Total           : 497

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : a372879cf5eadea220d2d1b5c0cd3dcc
Directories     : 3c0b060000302f00
Symbolic links  : 0
Other           : 0
Total           : bfab50293517cc6e

The regular files count has changed from 500 to 496 after rebalance process . 

Files missing on mount point :
--------------------------------- 
files193 files244 files233 files248

Actual results:
================
Few files missing on the mount point after rebalance process

Expected results:
================== 
The regular files count has changed from 500 to 496 after rebalance process . 


Additional info:
================= 
 gluster v i Vol5
 
Volume Name: Vol5
Type: Distribute
Volume ID: 73357b50-2e5d-4902-8564-d9f09404da46
Status: Started
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: 10.70.34.85:/rhs/brick1/F1
Brick2: 10.70.34.86:/rhs/brick1/F2
Brick3: 10.70.34.87:/rhs/brick1/F3
Brick4: 10.70.34.88:/rhs/brick1/F4
Brick5: 10.70.34.85:/rhs/brick1/F5
Brick6: 10.70.34.86:/rhs/brick1/F6
Comment 4 shishir gowda 2013-07-23 08:44:54 EDT
Could you please provide the dump of all the xattrs(all the bricks) of the files  that are missing from the mount?
Comment 5 senaik 2013-07-24 02:23:53 EDT
[root@kori brick1]#  ls -l */files193

---------T. 2 root root        0 Jul 23 13:30 F4/files193

# file: F4/files193
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.gfid=0xf07281d9849348a9926f810cb96deef1
trusted.glusterfs.dht.linkto=0x566f6c352d636c69656e742d3000

============================================================== 
[root@boost brick1]# ls -l */files244

---------T. 2 root root        0 Jul 23 15:53 F1/files244

# file: F1/files244
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.gfid=0x0159a793be6744599f4206c39996925b
trusted.glusterfs.dht.linkto=0x566f6c352d636c69656e742d3400

================================================================== 
[root@boost brick1]# ls -l */files233

---------T. 2 root root        0 Jul 23 15:53 F1/files233
# file: F1/files233
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.gfid=0x948eebd63ef14baabd51846517f7f223
trusted.glusterfs.dht.linkto=0x566f6c352d636c69656e742d3400


===================================================================== 
[root@boost brick1]# ls -l */files248
---------T. 2 root root        0 Jul 23 15:53 F1/files248

# file: F1/files248
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.gfid=0x6082534e6ea5492793b64f076df87c65
trusted.glusterfs.dht.linkto=0x566f6c352d636c69656e742d3400

====================================================================
Comment 6 shishir gowda 2013-07-24 03:00:44 EDT
Please dump the linkto xattrs as text.
Comment 7 senaik 2013-07-24 03:25:25 EDT
dht.linkto value is same across all the missing files 

Below is the text format :
Vol5-client-4
Comment 8 Amar Tumballi 2013-07-30 07:53:51 EDT
With rebalance (or remove-brick start) operation in progress and one doing a 'rename' (ie, mv command) operations on the files getting rebalanced, we have certain race conditions which are causing this particular bug.

This bug existed from day0 of rebalance process, and is not a regression. Development needs time to find out the root cause of this, and then take appropriate action. Hence requesting to take down the 'blocker' flag (please set it back if this comment is not sufficient to agree).
Comment 11 Susant Kumar Palai 2015-11-27 05:25:37 EST
Cloning this bug in 3.1. To be fixed in future release.

Note You need to log in before you can comment on or make changes to this bug.