Description of problem: --------------------------- Renaming some files, while Rebalance process was running, resulted in few files missing on the mount point after rebalance process completion . Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.4.0.11rhs-1.el6rhs.x86_64 How reproducible: ----------------- Steps to Reproduce: -------------------- 1) Create a distributed volume and start it 2) Mount the volume and create some files for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 3) Calculate are-equal check sum on mount point before starting rebalance Regular files : 500 Directories : 1 Symbolic links : 0 Other : 0 Total : 501 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 5fa4a71a9e8f6b2dc7e8920215d3a3f Directories : 30312a00 Symbolic links : 0 Other : 0 Total : d984c351b884e68d 3) Add brick and start rebalance gluster v rebalance Vol1 start volume rebalance: Vol1: success: Starting rebalance on volume Vol1 has been successful. ID: 4eaff74b-18b4-4920-ad45-c778566ab3ee 4) While rebalance is running , rename some files Node Rebalanced-files size scanned failures status run time in secs ----- ---------------- ------ --------- -------- ------- ---------------- localhost 0 0Bytes 529 80 completed 2.00 10.70.34.86 21 210.0MB 32 10 in progress 3.00 10.70.34.85 16 160.0MB 206 0 in progress 3.00 volume rebalance: Vol1: success: On the mount point : -------------------- for i in {11..400} ; do mv f"$i" files"$i" ; done 5)After rebalance is completed, calculate the are-equal checksum on the mount point Entry counts Regular files : 490 Directories : 1 Symbolic links : 0 Other : 0 Total : 491 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : e2ba899cb42cc0f81a43b363db9a4ec8 Directories : 3d06060000302f00 Symbolic links : 0 Other : 0 Total : c5ff3cff6f86a130 The regular files count has changed from 500 to 490 after rebalance process . Files missing on the mount point : ----------------------------------- 210 320 228 242 265 281 193 Actual results: --------------- Few files missing on the mount point after rebalance process Expected results: ----------------- There should be no files missing on the mount point after rebalance process Additional info: ======================= Missing file info : --------------------- [root@jay brick1]# ls -l */files281 ---------T. 2 root root 10485760 Jun 21 12:47 a1/files281 [root@jay brick1]# getfattr -m . -d -e text */files281 # file: a1/files281 trusted.glusterfs.dht.linkto="Vol1-client-4" trusted.gfid=0x0bb2a3ceeaaf449ca00d6a2c7a2fd1e6 ---------------------------------------------------------------------- [root@fillmore brick1]# ls -l */files281 ---------T. 2 root root 0 Jun 21 14:25 a3/files281 [root@fillmore brick1]# getfattr -m . -d -e text */files281 # file: a3/files281 trusted.glusterfs.dht.linkto="Vol1-client-0" trusted.gfid=0x0bb2a3ceeaaf449ca00d6a2c7a2fd1e6 Volume Info : ---------------- gluster v i Volume Name: Vol1 Type: Distribute Volume ID: da4ff732-34a1-44e1-beac-4c6da139af46 Status: Started Number of Bricks: 6 Transport-type: tcp Bricks: Brick1: 10.70.34.86:/rhs/brick1/a1 Brick2: 10.70.34.85:/rhs/brick1/a2 Brick3: 10.70.34.105:/rhs/brick1/a3 Brick4: 10.70.34.85:/rhs/brick1/a4 Brick5: 10.70.34.86:/rhs/brick1/a5 Brick6: 10.70.34.85:/rhs/brick1/a6
sosreoports : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/976755/
The issue seems to be this: 1. Rebalance identifies file f281 to be migrated 2. file f281 migration starts 3. file f281 gets renames to files281 4. Rebalance (migration and truncation) complete as they are mostly fop based op 5. unlink call on f281 fails, as the file is now renamed to files281 6. files281 has linkto xattrs set on both src and dst pointing to each other Solution: make unlink after migration gfid based?
Verfied in Version : 3.4.0.12rhs.beta6-1.el6rhs.x86_64 Verification Steps : ==================== 1) Create a distributed volume and start it 2) FUSE mount the volume and create files on mount point for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 3) calculate are-equal check sum before starting rebalance /opt/qa/tools/arequal-checksum /mnt/Vol9/ Entry counts Regular files : 500 Directories : 1 Symbolic links : 0 Other : 0 Total : 501 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : fcfe440c40a06a0023bf5120513d713c Directories : 30312a00 Symbolic links : 0 Other : 0 Total : df41152c21ac313c 4) Add 2 bricks and start rebalance 5) While rebalance is in progress , rename some files Node Rebalanced-files size scanned failures status run time in secs ----- ---------------- ------ --------- -------- ------- ---------------- localhost 29 290.0MB 33 3 in progress 5.00 10.70.34.86 24 240.0MB 206 41 in progress 5.00 10.70.34.87 0 0Bytes 518 80 completed 3.00 10.70.34.88 5 50.0MB 524 27 completed 3.00 volume rebalance: Vol9: success: for i in {11..400} ; do mv f"$i" files"$i" ; done 6) after rebalance is complete , calculate are equal check sum again /opt/qa/tools/arequal-checksum /mnt/Vol9/ Entry counts Regular files : 500 Directories : 1 Symbolic links : 0 Other : 0 Total : 501 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : fcfe440c40a06a0023bf5120513d713c Directories : 3001050000302f00 Symbolic links : 0 Other : 0 Total : ef40102c11ad343c Regular files count has not changed before and after rebalance . Marking as 'Verfied'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html