Description of problem: ========================= Renaming some files, while Rebalance process was running, resulted in few files missing on the mount point after rebalance process completion . Version-Release number of selected component (if applicable): ============================================================= 3.4.0.12rhs.beta5-2.el6rhs.x86_64 How reproducible: =================== Steps to Reproduce: =================== 1.Create a distribute volume with 4 bricks and start it 2.NFS mount the volume and create some files for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 3.Calculate are-equal check sum on mount point before starting rebalance [root@RHEL6 vOL5]# /opt/qa/tools/arequal-checksum /mnt/vOL5/ Entry counts Regular files : 500 Directories : 1 Symbolic links : 0 Other : 0 Total : 501 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : e9e46cb6bd7afeb5e1695e1113e67b5 Directories : 30312a00 Symbolic links : 0 Other : 0 Total : e7f2f9579c75b300 4. Add brick and start rebalance 5. While rebalance is running , rename some files Node Rebalanced-files size scanned failures status run time in secs ----- ---------------- ------ --------- -------- ------- ---------------- localhost 6 60.0MB 7 0 in progress 3.00 10.70.34.88 0 0Bytes 163 0 in progress 3.00 10.70.34.86 0 0Bytes 144 0 in progress 3.00 10.70.34.87 0 0Bytes 166 0 in progress 3.00 volume rebalance: Vol5: success: On the mount point : -------------------- for i in {11..400} ; do mv f"$i" files"$i" ; done 6)After rebalance is completed, calculate the are-equal checksum on the mount point [root@RHEL6 vOL5]# /opt/qa/tools/arequal-checksum /mnt/vOL5/ Entry counts Regular files : 496 Directories : 1 Symbolic links : 0 Other : 0 Total : 497 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : a372879cf5eadea220d2d1b5c0cd3dcc Directories : 3c0b060000302f00 Symbolic links : 0 Other : 0 Total : bfab50293517cc6e The regular files count has changed from 500 to 496 after rebalance process . Files missing on mount point : --------------------------------- files193 files244 files233 files248 Actual results: ================ Few files missing on the mount point after rebalance process Expected results: ================== The regular files count has changed from 500 to 496 after rebalance process . Additional info: ================= gluster v i Vol5 Volume Name: Vol5 Type: Distribute Volume ID: 73357b50-2e5d-4902-8564-d9f09404da46 Status: Started Number of Bricks: 6 Transport-type: tcp Bricks: Brick1: 10.70.34.85:/rhs/brick1/F1 Brick2: 10.70.34.86:/rhs/brick1/F2 Brick3: 10.70.34.87:/rhs/brick1/F3 Brick4: 10.70.34.88:/rhs/brick1/F4 Brick5: 10.70.34.85:/rhs/brick1/F5 Brick6: 10.70.34.86:/rhs/brick1/F6
sosreports at : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/987422/
Could you please provide the dump of all the xattrs(all the bricks) of the files that are missing from the mount?
[root@kori brick1]# ls -l */files193 ---------T. 2 root root 0 Jul 23 13:30 F4/files193 # file: F4/files193 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.gfid=0xf07281d9849348a9926f810cb96deef1 trusted.glusterfs.dht.linkto=0x566f6c352d636c69656e742d3000 ============================================================== [root@boost brick1]# ls -l */files244 ---------T. 2 root root 0 Jul 23 15:53 F1/files244 # file: F1/files244 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.gfid=0x0159a793be6744599f4206c39996925b trusted.glusterfs.dht.linkto=0x566f6c352d636c69656e742d3400 ================================================================== [root@boost brick1]# ls -l */files233 ---------T. 2 root root 0 Jul 23 15:53 F1/files233 # file: F1/files233 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.gfid=0x948eebd63ef14baabd51846517f7f223 trusted.glusterfs.dht.linkto=0x566f6c352d636c69656e742d3400 ===================================================================== [root@boost brick1]# ls -l */files248 ---------T. 2 root root 0 Jul 23 15:53 F1/files248 # file: F1/files248 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.gfid=0x6082534e6ea5492793b64f076df87c65 trusted.glusterfs.dht.linkto=0x566f6c352d636c69656e742d3400 ====================================================================
Please dump the linkto xattrs as text.
dht.linkto value is same across all the missing files Below is the text format : Vol5-client-4
With rebalance (or remove-brick start) operation in progress and one doing a 'rename' (ie, mv command) operations on the files getting rebalanced, we have certain race conditions which are causing this particular bug. This bug existed from day0 of rebalance process, and is not a regression. Development needs time to find out the root cause of this, and then take appropriate action. Hence requesting to take down the 'blocker' flag (please set it back if this comment is not sufficient to agree).
Cloning this bug in 3.1. To be fixed in future release.