Description of problem: ========================= Files missing on mount point after stopping and starting rebalance while rebalance process was running Version-Release number of selected component (if applicable): =========================================================== 3.4.0.12rhs.beta6-1.el6rhs.x86_64 How reproducible: Steps to Reproduce: =================== 1.Create a distribute volume and start it 2.Fuse mount the volume and create some files for i in {1..500} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 3.calculate are equal check sum [root@RHEL6 Volume1]# /opt/qa/tools/arequal-checksum /mnt/Volume1 Entry counts Regular files : 500 Directories : 1 Symbolic links : 0 Other : 0 Total : 501 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : eb1faeaf1dc9dcfc68a6cb49ce3231f0 Directories : 30312a00 Symbolic links : 0 Other : 0 Total : 83b965e6e3cac70c 4) add 2 bricks and start rebalance 5) While rebalance is running , stop rebalance process gluster v rebalance Volume1 status Node Rebalanced-files size scanned failures status run time in secs localhost 29 290.0MB 30 0 in progress 6.00 10.70.34.86 28 280.0MB 232 0 in progress 6.00 10.70.34.87 0 0Bytes 517 80 completed 2.00 10.70.34.88 6 60.0MB 529 26 completed 3.00 volume rebalance: Volume1: success: gluster v rebalance Volume1 stop Node Rebalanced-files size scanned failures status run time in secs localhost 63 630.0MB 72 8 stopped 12.00 10.70.34.86 28 420.0MB 588 0 completed 10.00 10.70.34.87 0 0Bytes 517 80 completed 2.00 10.70.34.88 6 60.0MB 529 26 completed 3.00 volume rebalance: Volume1: success: rebalance process may be in the middle of a file migration. The process will be fully stopped once the migration of the file is complete. Please check rebalance process for completion before doing any further brick related tasks on the volume. 6) executed rebalance stop command 3-4 times 7) started rebalance again 8) check rebalance status 9) After rebalance is completed , check are equal check sum [root@RHEL6 Volume1]# /opt/qa/tools/arequal-checksum /mnt/Volume1 Entry counts Regular files : 499 Directories : 1 Symbolic links : 0 Other : 0 Total : 500 Metadata checksums Regular files : 486e85 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : d2975638243ab0d13de03c5cfa867cb0 Directories : 30021b66 Symbolic links : 0 Other : 0 Total : ef776a64eebed707 regular file count has changed from 500 to 499 . Note : Tried to umount and mount the volume back , file was still missing Actual results: ================= Files missing on mount point Expected results: ================= There should be no files missing on mount point Additional info: ================== Missing file info : ------------------ f13 [root@jay brick1]# ls -l */f13 ---------T. 2 root root 0 Jul 25 18:58 d6/f13 [root@jay brick1]# getfattr -m . -d -e hex /rhs/brick1/d6/f13 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/d6/f13 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.gfid=0xaab1adad2103416eb1e54b598264fd55 trusted.glusterfs.dht.linkto=0x566f6c756d65312d636c69656e742d3500 [root@jay brick1]# getfattr -m . -d -e text /rhs/brick1/d6/f13 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/d6/f13 security.selinux="unconfined_u:object_r:file_t:s0" trusted.gfid="����!An��KY�d�U" trusted.glusterfs.dht.linkto="Volume1-client-5" [root@boost brick1]# gluster v i Volume1 Volume Name: Volume1 Type: Distribute Volume ID: ca804585-b8bc-4804-8484-928442bbc698 Status: Started Number of Bricks: 6 Transport-type: tcp Bricks: Brick1: 10.70.34.85:/rhs/brick1/d1 Brick2: 10.70.34.86:/rhs/brick1/d2 Brick3: 10.70.34.87:/rhs/brick1/d3 Brick4: 10.70.34.88:/rhs/brick1/d4 Brick5: 10.70.34.85:/rhs/brick1/d5 Brick6: 10.70.34.86:/rhs/brick1/d6
sosreports at : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/988419/
There are no errors reported in the logs for the missing file. [shishirng@sgowda new]$ grep "f13 " */var/log/glusterfs/Volume1-rebalance.log jay-2013072520201374763856/var/log/glusterfs/Volume1-rebalance.log:[2013-07-25 13:45:31.918180] I [dht-rebalance.c:872:dht_migrate_file] 0-Volume1-dht: completed migration of /f13 from subvolume Volume1-client-1 to Volume1-client-5 junior-2013072517581374755297/var/log/glusterfs/Volume1-rebalance.log:[2013-07-25 11:22:05.295182] I [dht-common.c:1051:dht_lookup_everywhere_cbk] 0-Volume1-dht: deleting stale linkfile /f13 on Volume1-client-5 kori-2013072517591374755347/var/log/glusterfs/Volume1-rebalance.log:[2013-07-25 11:22:05.778611] I [dht-common.c:1051:dht_lookup_everywhere_cbk] 0-Volume1-dht: deleting stale linkfile /f13 on Volume1-client-5 The only error seen seems to be linkfile pointing to itself. volume Volume1-client-5 type protocol/client option remote-host 10.70.34.86 option remote-subvolume /rhs/brick1/d6 option transport-type socket option username 77bc7b46-d99e-4695-82be-3ec1251d3904 option password fa57f711-4283-4d13-865b-7e2be9a0f6d9 end-volume [root@jay brick1]# getfattr -m . -d -e text /rhs/brick1/d6/f13 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/d6/f13 security.selinux="unconfined_u:object_r:file_t:s0" <=== selinux is on trusted.gfid=0xaab1adad2103416eb1e54b598264fd55 trusted.glusterfs.dht.linkto="Volume1-client-5" <=== pointing to itself Additionally errors reported in the logs: [2013-07-25 13:51:02.779050] I [dht-layout.c:749:dht_layout_dir_mismatch] 0-Volume1-dht: subvol: Volume1-client-0; inode layout - 0 - 1073741822; disk layout - 2147483646 - 2863311527 [2013-07-25 13:51:02.779095] I [dht-common.c:654:dht_revalidate_cbk] 0-Volume1-dht: mismatching layouts for / [2013-07-25 13:51:02.779393] I [dht-layout.c:749:dht_layout_dir_mismatch] 0-Volume1-dht: subvol: Volume1-client-1; inode layout - 1073741823 - 2147483645; disk layout - 1431655764 - 2147483645 [2013-07-25 13:51:02.779413] I [dht-common.c:654:dht_revalidate_cbk] 0-Volume1-dht: mismatching layouts for / [2013-07-25 13:51:02.781766] I [dht-layout.c:749:dht_layout_dir_mismatch] 0-Volume1-dht: subvol: Volume1-client-2; inode layout - 2147483646 - 3221225468; disk layout - 2863311528 - 3579139409 [2013-07-25 13:51:02.781867] I [dht-common.c:654:dht_revalidate_cbk] 0-Volume1-dht: mismatching layouts for / [2013-07-25 13:51:02.781945] I [dht-layout.c:749:dht_layout_dir_mismatch] 0-Volume1-dht: subvol: Volume1-client-3; inode layout - 3221225469 - 4294967295; disk layout - 3579139410 - 4294967295 [2013-07-25 13:51:02.781961] I [dht-common.c:654:dht_revalidate_cbk] 0-Volume1-dht: mismatching layouts for / [2013-07-25 13:51:02.783531] I [dht-layout.c:636:dht_layout_normalize] 0-Volume1-dht: found anomalies in /. holes=1 overlaps=0 missing=0 down=0 misc=0 [2013-07-25 13:51:02.902111] I [dht-layout.c:749:dht_layout_dir_mismatch] 1-Volume1-dht: subvol: Volume1-client-1; inode layout - 1431655764 - 2147483645; disk layout - 1073741823 - 2147483645 [2013-07-25 13:51:02.902150] I [dht-common.c:654:dht_revalidate_cbk] 1-Volume1-dht: mismatching layouts for / [2013-07-25 13:51:02.902196] I [dht-layout.c:749:dht_layout_dir_mismatch] 1-Volume1-dht: subvol: Volume1-client-0; inode layout - 2147483646 - 2863311527; disk layout - 0 - 1073741822 [2013-07-25 13:51:02.902212] I [dht-common.c:654:dht_revalidate_cbk] 1-Volume1-dht: mismatching layouts for / [2013-07-25 13:51:02.902323] I [dht-layout.c:749:dht_layout_dir_mismatch] 1-Volume1-dht: subvol: Volume1-client-2; inode layout - 2863311528 - 3579139409; disk layout - 2147483646 - 3221225468 [2013-07-25 13:51:02.902341] I [dht-common.c:654:dht_revalidate_cbk] 1-Volume1-dht: mismatching layouts for / [2013-07-25 13:51:02.902390] I [dht-layout.c:749:dht_layout_dir_mismatch] 1-Volume1-dht: subvol: Volume1-client-3; inode layout - 3579139410 - 4294967295; disk layout - 3221225469 - 4294967295 [2013-07-25 13:51:02.902403] I [dht-common.c:654:dht_revalidate_cbk] 1-Volume1-dht: mismatching layouts for / [2013-07-25 13:51:02.905684] I [dht-layout.c:636:dht_layout_normalize] 1-Volume1-dht: found anomalies in /. holes=0 overlaps=2 missing=0 down=0 misc=0
Re-tested the scenario on Version : 3.4.0.17rhs-1.el6rhs.x86_64, could not reproduce the issue .
(In reply to senaik from comment #5) > Re-tested the scenario on Version : 3.4.0.17rhs-1.el6rhs.x86_64, could not > reproduce the issue . Issue seem to be magically fixed. Moving this to Closed WorksForMe. Reopen if the this regress with future builds.