Bug 962400

Summary: Rebalance :: Renaming of files while rebalance is in progress fails with message : "File exists"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: senaik
Component: glusterfsAssignee: shishir gowda <sgowda>
Status: CLOSED ERRATA QA Contact: senaik
Severity: medium Docs Contact:
Priority: high    
Version: 2.1CC: amarts, nsathyan, rhs-bugs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.10rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 966858 (view as bug list) Environment:
Last Closed: 2013-09-23 22:35:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 966858    
Attachments:
Description Flags
Logs none

Description senaik 2013-05-13 11:05:52 UTC
Description of problem:
========================== 
While Rebalance is in progress , when we try to rename files , we get the following message for a few files : "File exists"


Version-Release number of selected component (if applicable):
=============================================================== 
3.4.0.6rhs-1.el6rhs.x86_64


How reproducible:
=============== 
Quite Often 

Steps to Reproduce:
=========================== 
1.Create a distribute volume and start it 

2.Mount the volume and create some files 
for i in {1..500} ; do touch f"$i"; done

3.Add Brick and start rebalance 

4. while rebalance is in progress , rename some files on mount point 

 gluster v rebalance sample status

Node      Rebalanced-files size   scanned failures status    run time in secs 
localhost     30           0Bytes  652      117     in progress   3.00
localhost     30           0Bytes  652      117     in progress   3.00
localhost     30           0Bytes  652      117     in progress   3.00
10.70.34.86   30           0Bytes  652      117     in progress   3.00


[root@RHEL6 sample]# for i in {11..400} ; do mv f"$i" files"$i" ; done
mv: cannot move `f22' to `files22': File exists
mv: cannot move `f52' to `files52': File exists
mv: cannot move `f77' to `files77': File exists
mv: cannot move `f84' to `files84': File exists
mv: cannot move `f99' to `files99': File exists
mv: cannot move `f104' to `files104': File exists
mv: cannot move `f147' to `files147': File exists
mv: cannot move `f167' to `files167': File exists
mv: cannot move `f190' to `files190': File exists
mv: cannot move `f215' to `files215': File exists
mv: cannot move `f219' to `files219': File exists
mv: cannot move `f228' to `files228': File exists
mv: cannot move `f244' to `files244': File exists
mv: cannot move `f258' to `files258': File exists
mv: cannot move `f265' to `files265': File exists
mv: cannot move `f336' to `files336': File exists
mv: cannot move `f390' to `files390': File exists

The above files are not getting renamed as it says file already exists . 


Actual results:
=================
Renaming of files while rebalance is in progress should work . 

Expected results:
======================
Few files do not get renamed and reports "File Exists"

Additional info:

Comment 4 shishir gowda 2013-05-16 06:16:30 UTC
Looks like dht rename not being atomic issue.

Suspect this to be flow

1. A layout change triggers a creation of linkfile (lookup/rebalance) on subvolume S
2. A rename op which has same hash subvolume as S (or in this case as src) tries to create a linkfile on subvolume S. (We rename this src file to target file later)
3. If step 1 succeeds before step 2, then rename op fails with EEXISTS error

We have 2 possible approaches:

1. If in rename linkfile creation fails with EEXISTS, a subsequent lookup on Subvolume S to check if they are same (gfid). If same, proceed with rename

2. Bring in namespace locks across dht.

Additionally, dht-rename needs to be migration aware like other fops.

Comment 5 senaik 2013-05-16 06:43:30 UTC
Created attachment 748616 [details]
Logs

Comment 6 senaik 2013-05-16 06:44:34 UTC
[root@fillmore ~]# gluster v info sample
 
Volume Name: sample
Type: Distribute
Volume ID: 936e0e38-7c6f-46c1-88a2-99e5eb579012
Status: Stopped
Number of Bricks: 9
Transport-type: tcp
Bricks:
Brick1: 10.70.34.85:/rhs/brick1/o1
Brick2: 10.70.34.86:/rhs/brick1/o2
Brick3: 10.70.34.105:/rhs/brick1/o3
Brick4: 10.70.34.86:/rhs/brick1/o4
Brick5: 10.70.34.105:/rhs/brick1/o5
Brick6: 10.70.34.86:/rhs/brick1/o6
Brick7: 10.70.34.105:/rhs/brick1/o7
Brick8: 10.70.34.86:/rhs/brick1/p4
Brick9: 10.70.34.105:/rhs/brick1/p5

Comment 9 senaik 2013-06-21 11:46:52 UTC
Version : 
------------
glusterfs-3.4.0.11rhs-1.el6rhs.x86_64

Renaming of files while rebalance is in progress works fine . 

Steps : 
------ 
1) Create a distributed volume and start it 
2) Mount the volume and create files 
for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done
3) Add brick and start rebalance 
gluster v rebalance Vol1 start
volume rebalance: Vol1: success: Starting rebalance on volume Vol1 has been successful.
ID: 4eaff74b-18b4-4920-ad45-c778566ab3ee

4) While rebalance is running , rename some files 


Node   Rebalanced-files  size   scanned  failures   status   run time in secs
-----  ---------------- ------ ---------  --------  -------  ----------------
 
localhost     0         0Bytes    529     80       completed      2.00
10.70.34.86   21        210.0MB   32      10       in progress    3.00
10.70.34.85   16        160.0MB   206     0        in progress    3.00

volume rebalance: Vol1: success: 

On the mount point : 
-------------------- 
for i in {11..400} ; do mv f"$i" files"$i" ; done

Files are successfully renamed with no error message . 

Error found while verifying the bug : 
=======================================

Few files were missing on the mount point after rebalance process . 

are-equal checksum shows count of 500 files before rebalance, after rebalance process, the file count has reduced to 490. 
Raised another bug to track this issue [#976755]

Keeping this open as I am blocked by bug #976755

Comment 10 shishir gowda 2013-07-11 07:03:51 UTC
Fix https://code.engineering.redhat.com/gerrit/10053 for bug 976755 has been sent for review

Comment 11 shishir gowda 2013-07-25 05:16:51 UTC
Fix for bug 976755 has been merged downstream.

Comment 12 shishir gowda 2013-07-25 05:18:00 UTC
Fix for bug 976755 is available in release glusterfs-3.4.0.12rhs.beta5 .

Comment 13 senaik 2013-07-25 09:35:36 UTC
Version : 3.4.0.12rhs.beta6-1.el6rhs.x86_64
========

Renaming of files while rebalance is in progress succeeds without any error message.

Bug 976755 which was marked as blocker to this bug has been verified . 

Marking this bug as 'verified'

Comment 14 Scott Haines 2013-09-23 22:35:32 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html