Bug 976755 - Rebalance : Few files missing on mount point after rebalance process [FUSE Mount]
Summary: Rebalance : Few files missing on mount point after rebalance process [FUSE Mo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: shishir gowda
QA Contact: senaik
URL:
Whiteboard:
Depends On:
Blocks: 983399
TreeView+ depends on / blocked
 
Reported: 2013-06-21 11:30 UTC by senaik
Modified: 2015-09-01 12:23 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0.12rhs.beta5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 983399 (view as bug list)
Environment:
Last Closed: 2013-09-23 22:35:39 UTC
Embargoed:


Attachments (Terms of Use)

Description senaik 2013-06-21 11:30:31 UTC
Description of problem:
---------------------------
Renaming some files, while Rebalance process was running, resulted in few files missing on the mount point after rebalance process completion . 

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.4.0.11rhs-1.el6rhs.x86_64

How reproducible:
-----------------

Steps to Reproduce:
--------------------
1) Create a distributed volume and start it 

2) Mount the volume and create some files 
for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done

3) Calculate are-equal check sum on mount point before starting rebalance

Regular files   : 500
Directories     : 1
Symbolic links  : 0
Other           : 0
Total           : 501

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 5fa4a71a9e8f6b2dc7e8920215d3a3f
Directories     : 30312a00
Symbolic links  : 0
Other           : 0
Total           : d984c351b884e68d

3) Add brick and start rebalance 
gluster v rebalance Vol1 start
volume rebalance: Vol1: success: Starting rebalance on volume Vol1 has been successful.
ID: 4eaff74b-18b4-4920-ad45-c778566ab3ee

4) While rebalance is running , rename some files 

Node   Rebalanced-files  size   scanned  failures   status   run time in secs
-----  ---------------- ------ ---------  --------  -------  ----------------
 
localhost     0         0Bytes    529     80       completed      2.00
10.70.34.86   21        210.0MB   32      10       in progress    3.00
10.70.34.85   16        160.0MB   206     0        in progress    3.00

volume rebalance: Vol1: success: 

On the mount point : 
-------------------- 
for i in {11..400} ; do mv f"$i" files"$i" ; done

5)After rebalance is completed, calculate the are-equal checksum on the mount point 

Entry counts
Regular files   : 490
Directories     : 1
Symbolic links  : 0
Other           : 0
Total           : 491

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : e2ba899cb42cc0f81a43b363db9a4ec8
Directories     : 3d06060000302f00
Symbolic links  : 0
Other           : 0
Total           : c5ff3cff6f86a130

The regular files count has changed from 500 to 490 after rebalance process . 

Files missing on the mount point : 
----------------------------------- 
210 320 228 242 265 281 193
 
Actual results:
--------------- 
Few files missing on the mount point after rebalance process 

Expected results:
----------------- 
There should be no files missing on the mount point after rebalance process 

Additional info:
=======================

Missing file info : 
--------------------- 

[root@jay brick1]# ls -l */files281
---------T. 2 root root 10485760 Jun 21 12:47 a1/files281

[root@jay brick1]# getfattr -m . -d -e text */files281
# file: a1/files281
trusted.glusterfs.dht.linkto="Vol1-client-4"
trusted.gfid=0x0bb2a3ceeaaf449ca00d6a2c7a2fd1e6

----------------------------------------------------------------------
[root@fillmore brick1]# ls -l */files281
---------T. 2 root root 0 Jun 21 14:25 a3/files281
[root@fillmore brick1]# getfattr -m . -d -e text */files281
# file: a3/files281
trusted.glusterfs.dht.linkto="Vol1-client-0"
trusted.gfid=0x0bb2a3ceeaaf449ca00d6a2c7a2fd1e6


Volume Info : 
---------------- 
gluster v i
 
Volume Name: Vol1
Type: Distribute
Volume ID: da4ff732-34a1-44e1-beac-4c6da139af46
Status: Started
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: 10.70.34.86:/rhs/brick1/a1
Brick2: 10.70.34.85:/rhs/brick1/a2
Brick3: 10.70.34.105:/rhs/brick1/a3
Brick4: 10.70.34.85:/rhs/brick1/a4
Brick5: 10.70.34.86:/rhs/brick1/a5
Brick6: 10.70.34.85:/rhs/brick1/a6

Comment 3 shishir gowda 2013-06-24 08:00:15 UTC
The issue seems to be this:
1. Rebalance identifies file f281 to be migrated
2. file f281 migration starts
3. file f281 gets renames to files281
4. Rebalance (migration and truncation) complete as they are mostly fop based op
5. unlink call on f281 fails, as the file is now renamed to files281
6. files281 has linkto xattrs set on both src and dst pointing to each other

Solution: make unlink after migration gfid based?

Comment 5 senaik 2013-07-25 09:26:45 UTC
Verfied in Version : 3.4.0.12rhs.beta6-1.el6rhs.x86_64 

Verification Steps : 
==================== 
1) Create a distributed volume and start it 

2) FUSE mount the volume and create files on mount point 
for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done

3) calculate are-equal check sum before starting rebalance 
/opt/qa/tools/arequal-checksum /mnt/Vol9/

Entry counts
Regular files   : 500
Directories     : 1
Symbolic links  : 0
Other           : 0
Total           : 501

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : fcfe440c40a06a0023bf5120513d713c
Directories     : 30312a00
Symbolic links  : 0
Other           : 0
Total           : df41152c21ac313c

4) Add 2 bricks and start rebalance 
5) While rebalance is in progress , rename some files

Node   Rebalanced-files  size   scanned  failures   status   run time in secs
-----  ---------------- ------ ---------  --------  -------  ----------------
 
localhost     29       290.0MB   33     3        in progress      5.00
10.70.34.86   24       240.0MB   206    41       in progress      5.00
10.70.34.87   0        0Bytes    518    80       completed        3.00
10.70.34.88   5        50.0MB    524    27       completed        3.00
volume rebalance: Vol9: success: 

 
for i in {11..400} ; do mv f"$i" files"$i" ; done

6) after rebalance is complete , calculate are equal check sum again 

/opt/qa/tools/arequal-checksum /mnt/Vol9/

Entry counts
Regular files   : 500
Directories     : 1
Symbolic links  : 0
Other           : 0
Total           : 501

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : fcfe440c40a06a0023bf5120513d713c
Directories     : 3001050000302f00
Symbolic links  : 0
Other           : 0
Total           : ef40102c11ad343c


Regular files count has not changed before and after rebalance . 

Marking as 'Verfied'

Comment 6 Scott Haines 2013-09-23 22:35:39 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.