Bug 962400 - Rebalance :: Renaming of files while rebalance is in progress fails with message : "File exists"
Summary: Rebalance :: Renaming of files while rebalance is in progress fails with mes...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: shishir gowda
QA Contact: senaik
URL:
Whiteboard:
Depends On:
Blocks: 966858
TreeView+ depends on / blocked
 
Reported: 2013-05-13 11:05 UTC by senaik
Modified: 2015-09-01 12:23 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.4.0.10rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 966858 (view as bug list)
Environment:
Last Closed: 2013-09-23 22:35:32 UTC
Embargoed:


Attachments (Terms of Use)
Logs (26.07 MB, text/x-log)
2013-05-16 06:43 UTC, senaik
no flags Details

Description senaik 2013-05-13 11:05:52 UTC
Description of problem:
========================== 
While Rebalance is in progress , when we try to rename files , we get the following message for a few files : "File exists"


Version-Release number of selected component (if applicable):
=============================================================== 
3.4.0.6rhs-1.el6rhs.x86_64


How reproducible:
=============== 
Quite Often 

Steps to Reproduce:
=========================== 
1.Create a distribute volume and start it 

2.Mount the volume and create some files 
for i in {1..500} ; do touch f"$i"; done

3.Add Brick and start rebalance 

4. while rebalance is in progress , rename some files on mount point 

 gluster v rebalance sample status

Node      Rebalanced-files size   scanned failures status    run time in secs 
localhost     30           0Bytes  652      117     in progress   3.00
localhost     30           0Bytes  652      117     in progress   3.00
localhost     30           0Bytes  652      117     in progress   3.00
10.70.34.86   30           0Bytes  652      117     in progress   3.00


[root@RHEL6 sample]# for i in {11..400} ; do mv f"$i" files"$i" ; done
mv: cannot move `f22' to `files22': File exists
mv: cannot move `f52' to `files52': File exists
mv: cannot move `f77' to `files77': File exists
mv: cannot move `f84' to `files84': File exists
mv: cannot move `f99' to `files99': File exists
mv: cannot move `f104' to `files104': File exists
mv: cannot move `f147' to `files147': File exists
mv: cannot move `f167' to `files167': File exists
mv: cannot move `f190' to `files190': File exists
mv: cannot move `f215' to `files215': File exists
mv: cannot move `f219' to `files219': File exists
mv: cannot move `f228' to `files228': File exists
mv: cannot move `f244' to `files244': File exists
mv: cannot move `f258' to `files258': File exists
mv: cannot move `f265' to `files265': File exists
mv: cannot move `f336' to `files336': File exists
mv: cannot move `f390' to `files390': File exists

The above files are not getting renamed as it says file already exists . 


Actual results:
=================
Renaming of files while rebalance is in progress should work . 

Expected results:
======================
Few files do not get renamed and reports "File Exists"

Additional info:

Comment 4 shishir gowda 2013-05-16 06:16:30 UTC
Looks like dht rename not being atomic issue.

Suspect this to be flow

1. A layout change triggers a creation of linkfile (lookup/rebalance) on subvolume S
2. A rename op which has same hash subvolume as S (or in this case as src) tries to create a linkfile on subvolume S. (We rename this src file to target file later)
3. If step 1 succeeds before step 2, then rename op fails with EEXISTS error

We have 2 possible approaches:

1. If in rename linkfile creation fails with EEXISTS, a subsequent lookup on Subvolume S to check if they are same (gfid). If same, proceed with rename

2. Bring in namespace locks across dht.

Additionally, dht-rename needs to be migration aware like other fops.

Comment 5 senaik 2013-05-16 06:43:30 UTC
Created attachment 748616 [details]
Logs

Comment 6 senaik 2013-05-16 06:44:34 UTC
[root@fillmore ~]# gluster v info sample
 
Volume Name: sample
Type: Distribute
Volume ID: 936e0e38-7c6f-46c1-88a2-99e5eb579012
Status: Stopped
Number of Bricks: 9
Transport-type: tcp
Bricks:
Brick1: 10.70.34.85:/rhs/brick1/o1
Brick2: 10.70.34.86:/rhs/brick1/o2
Brick3: 10.70.34.105:/rhs/brick1/o3
Brick4: 10.70.34.86:/rhs/brick1/o4
Brick5: 10.70.34.105:/rhs/brick1/o5
Brick6: 10.70.34.86:/rhs/brick1/o6
Brick7: 10.70.34.105:/rhs/brick1/o7
Brick8: 10.70.34.86:/rhs/brick1/p4
Brick9: 10.70.34.105:/rhs/brick1/p5

Comment 9 senaik 2013-06-21 11:46:52 UTC
Version : 
------------
glusterfs-3.4.0.11rhs-1.el6rhs.x86_64

Renaming of files while rebalance is in progress works fine . 

Steps : 
------ 
1) Create a distributed volume and start it 
2) Mount the volume and create files 
for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done
3) Add brick and start rebalance 
gluster v rebalance Vol1 start
volume rebalance: Vol1: success: Starting rebalance on volume Vol1 has been successful.
ID: 4eaff74b-18b4-4920-ad45-c778566ab3ee

4) While rebalance is running , rename some files 


Node   Rebalanced-files  size   scanned  failures   status   run time in secs
-----  ---------------- ------ ---------  --------  -------  ----------------
 
localhost     0         0Bytes    529     80       completed      2.00
10.70.34.86   21        210.0MB   32      10       in progress    3.00
10.70.34.85   16        160.0MB   206     0        in progress    3.00

volume rebalance: Vol1: success: 

On the mount point : 
-------------------- 
for i in {11..400} ; do mv f"$i" files"$i" ; done

Files are successfully renamed with no error message . 

Error found while verifying the bug : 
=======================================

Few files were missing on the mount point after rebalance process . 

are-equal checksum shows count of 500 files before rebalance, after rebalance process, the file count has reduced to 490. 
Raised another bug to track this issue [#976755]

Keeping this open as I am blocked by bug #976755

Comment 10 shishir gowda 2013-07-11 07:03:51 UTC
Fix https://code.engineering.redhat.com/gerrit/10053 for bug 976755 has been sent for review

Comment 11 shishir gowda 2013-07-25 05:16:51 UTC
Fix for bug 976755 has been merged downstream.

Comment 12 shishir gowda 2013-07-25 05:18:00 UTC
Fix for bug 976755 is available in release glusterfs-3.4.0.12rhs.beta5 .

Comment 13 senaik 2013-07-25 09:35:36 UTC
Version : 3.4.0.12rhs.beta6-1.el6rhs.x86_64
========

Renaming of files while rebalance is in progress succeeds without any error message.

Bug 976755 which was marked as blocker to this bug has been verified . 

Marking this bug as 'verified'

Comment 14 Scott Haines 2013-09-23 22:35:32 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.