Description of problem: ========================== While Rebalance is in progress , when we try to rename files , we get the following message for a few files : "File exists" Version-Release number of selected component (if applicable): =============================================================== 3.4.0.6rhs-1.el6rhs.x86_64 How reproducible: =============== Quite Often Steps to Reproduce: =========================== 1.Create a distribute volume and start it 2.Mount the volume and create some files for i in {1..500} ; do touch f"$i"; done 3.Add Brick and start rebalance 4. while rebalance is in progress , rename some files on mount point gluster v rebalance sample status Node Rebalanced-files size scanned failures status run time in secs localhost 30 0Bytes 652 117 in progress 3.00 localhost 30 0Bytes 652 117 in progress 3.00 localhost 30 0Bytes 652 117 in progress 3.00 10.70.34.86 30 0Bytes 652 117 in progress 3.00 [root@RHEL6 sample]# for i in {11..400} ; do mv f"$i" files"$i" ; done mv: cannot move `f22' to `files22': File exists mv: cannot move `f52' to `files52': File exists mv: cannot move `f77' to `files77': File exists mv: cannot move `f84' to `files84': File exists mv: cannot move `f99' to `files99': File exists mv: cannot move `f104' to `files104': File exists mv: cannot move `f147' to `files147': File exists mv: cannot move `f167' to `files167': File exists mv: cannot move `f190' to `files190': File exists mv: cannot move `f215' to `files215': File exists mv: cannot move `f219' to `files219': File exists mv: cannot move `f228' to `files228': File exists mv: cannot move `f244' to `files244': File exists mv: cannot move `f258' to `files258': File exists mv: cannot move `f265' to `files265': File exists mv: cannot move `f336' to `files336': File exists mv: cannot move `f390' to `files390': File exists The above files are not getting renamed as it says file already exists . Actual results: ================= Renaming of files while rebalance is in progress should work . Expected results: ====================== Few files do not get renamed and reports "File Exists" Additional info:
sosreports at : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/962400/
Looks like dht rename not being atomic issue. Suspect this to be flow 1. A layout change triggers a creation of linkfile (lookup/rebalance) on subvolume S 2. A rename op which has same hash subvolume as S (or in this case as src) tries to create a linkfile on subvolume S. (We rename this src file to target file later) 3. If step 1 succeeds before step 2, then rename op fails with EEXISTS error We have 2 possible approaches: 1. If in rename linkfile creation fails with EEXISTS, a subsequent lookup on Subvolume S to check if they are same (gfid). If same, proceed with rename 2. Bring in namespace locks across dht. Additionally, dht-rename needs to be migration aware like other fops.
Created attachment 748616 [details] Logs
[root@fillmore ~]# gluster v info sample Volume Name: sample Type: Distribute Volume ID: 936e0e38-7c6f-46c1-88a2-99e5eb579012 Status: Stopped Number of Bricks: 9 Transport-type: tcp Bricks: Brick1: 10.70.34.85:/rhs/brick1/o1 Brick2: 10.70.34.86:/rhs/brick1/o2 Brick3: 10.70.34.105:/rhs/brick1/o3 Brick4: 10.70.34.86:/rhs/brick1/o4 Brick5: 10.70.34.105:/rhs/brick1/o5 Brick6: 10.70.34.86:/rhs/brick1/o6 Brick7: 10.70.34.105:/rhs/brick1/o7 Brick8: 10.70.34.86:/rhs/brick1/p4 Brick9: 10.70.34.105:/rhs/brick1/p5
Version : ------------ glusterfs-3.4.0.11rhs-1.el6rhs.x86_64 Renaming of files while rebalance is in progress works fine . Steps : ------ 1) Create a distributed volume and start it 2) Mount the volume and create files for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 3) Add brick and start rebalance gluster v rebalance Vol1 start volume rebalance: Vol1: success: Starting rebalance on volume Vol1 has been successful. ID: 4eaff74b-18b4-4920-ad45-c778566ab3ee 4) While rebalance is running , rename some files Node Rebalanced-files size scanned failures status run time in secs ----- ---------------- ------ --------- -------- ------- ---------------- localhost 0 0Bytes 529 80 completed 2.00 10.70.34.86 21 210.0MB 32 10 in progress 3.00 10.70.34.85 16 160.0MB 206 0 in progress 3.00 volume rebalance: Vol1: success: On the mount point : -------------------- for i in {11..400} ; do mv f"$i" files"$i" ; done Files are successfully renamed with no error message . Error found while verifying the bug : ======================================= Few files were missing on the mount point after rebalance process . are-equal checksum shows count of 500 files before rebalance, after rebalance process, the file count has reduced to 490. Raised another bug to track this issue [#976755] Keeping this open as I am blocked by bug #976755
Fix https://code.engineering.redhat.com/gerrit/10053 for bug 976755 has been sent for review
Fix for bug 976755 has been merged downstream.
Fix for bug 976755 is available in release glusterfs-3.4.0.12rhs.beta5 .
Version : 3.4.0.12rhs.beta6-1.el6rhs.x86_64 ======== Renaming of files while rebalance is in progress succeeds without any error message. Bug 976755 which was marked as blocker to this bug has been verified . Marking this bug as 'verified'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html