+++ This bug was initially created as a clone of Bug #1104653 +++ Description of problem: ====================== rebalance process was crashes(on all node) after running for 44+ hours. no I/O was being done on mount point when it was crashed. pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-11-23 03:17:58configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.44rhs /lib64/libc.so.6[0x3b4d032960] /lib64/libc.so.6[0x3b4d07f92a] /usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(dht_layout_entry_cmp_volname+0x2a)[0x7f9288a31dca] /usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(dht_layout_sort_volname+0x3d)[0x7f9288a31e1d] /usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(dht_fix_layout_of_directory+0xeb)[0x7f9288a3ad0b] /usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(dht_fix_directory_layout+0x49)[0x7f9288a3c5b9] /usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(dht_setxattr+0x9d6)[0x7f9288a4e726] /usr/lib64/libglusterfs.so.0(syncop_setxattr+0x1a1)[0x3ab304fbd1] /usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(gf_defrag_fix_layout+0x4b4)[0x7f9288a374b4] /usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(gf_defrag_fix_layout+0x4d8)[0x7f9288a374d8] /usr/lib64/glusterfs/3.4.0.44rhs/xlator/cluster/distribute.so(gf_defrag_start_crawl+0x286)[0x7f9288a379d6] /usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3ab3049ad2] /lib64/libc.so.6[0x3b4d043bb0] --------- core was generated [root@7-VM1 core]# ls -l total 1018116 -rw------- 1 root root 1042546688 Nov 23 09:14 core.20128.1385176679.dump.1 [root@7-VM1 core]# file core.20128.1385176679.dump.1 core.20128.1385176679.dump.1: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfs -s localhost --volfile-id flat --xlator-option *dht.use-rea' How reproducible: ================== got it twice(different volume, same cluster). For another volume, process was running for more than 7 days Steps to Reproduce: ==================== 1. create and mount DHT volume. Create Data from mount point(Directory depth was 10) 2.add brick to volume and start rebalance 3. after 44+ hours rebalance process was crashed on all node and rebalance status was 'failed' [root@7-VM1 core]# gluster volume rebalance flat status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 832000 13.7GB 5344344 1 228 failed 159836.00 10.70.36.133 1009405 15.7GB 5362837 2 206 failed 159836.00 10.70.36.132 823206 12.9GB 5416604 1 233 failed 159836.00 10.70.36.131 0 0Bytes 5227829 0 0 failed 159836.00 volume rebalance: flat: success: [root@7-VM1 core]# ps auxw | grep rebalance root 31760 0.0 0.0 103244 804 pts/1 R+ 10:49 0:00 grep rebalance Actual results: ================ rebalance process crashed Expected results: Additional info: --- Additional comment from Rachana Patel on 2013-11-25 04:32:54 EST --- volume info:-Volume Name: flat Type: Distribute Volume ID: e305c335-0859-46e3-acf9-ee245f205a99 Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: 10.70.36.130:/rhs/brick1/f Brick2: 10.70.36.132:/rhs/brick1/f Brick3: 10.70.36.133:/rhs/brick1/f Brick4: 10.70.36.133:/rhs/brick2/f Brick5: 10.70.36.133:/rhs/brick4/f mount info - [root@rhs-client22 ~]# mount | grep flat 10.70.36.130:/flat on /mnt/flat-nfs type nfs (rw,addr=10.70.36.130) 10.70.36.130:/flat on /mnt/flat type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) will upload sosreport soon --- Additional comment from Rachana Patel on 2013-11-26 05:27:50 EST --- updating the same bug as this behaviour is noticed on volume where rebalance was crashed. From mount point many Directories are not visible which are present on backend.(present on 3 sub-vols, and not present on 2 sub-vols etc.; one of them is newly added sub-vol) Directories have files and other directories also. So It's data loss. + lokkup is not healing Directory on from mount point --- Additional comment from Rachana Patel on 2013-11-26 05:40:32 EST --- updating the same bug as this behaviour is noticed on volume where rebalance was crashed. 1) From mount point many Directories are not visible which are present on backend.(Directories are present on some sub-vols, and not present on some sub-vols ) Directories have files and other directories also. So It's data loss. 2) lookup is not healing Directory on sub-volumes e.g. etc65 is present on # sub-vol and not present on 2 sub-vol. It is not shown on mount point. It has data. and lookup is not healing that Dir. from mount point:- [root@7-VM2 mvs1]# pwd /mnt/flat/mvs1 [root@7-VM2 mvs1]# ls etc100 mvetc10 mvetc12 mvetc14 mvetc17 mvetc19 mvetc20 mvetc22 mvetc25 mvetc4 mvetc63 mvetc9 etc100 mvetc10 mvetc12 mvetc14 mvetc17 mvetc19 mvetc20 mvetc22 mvetc25 mvetc4 mvetc63 mvetc9 mvetc1 mvetc11 mvetc13 mvetc15 mvetc18 mvetc2 mvetc21 mvetc24 mvetc3 mvetc5 mvetc8 mvetc1 mvetc11 mvetc13 mvetc15 mvetc18 mvetc2 mvetc21 mvetc24 mvetc3 mvetc5 mvetc8 Backend:-[root@7-VM1 ~]# ls /rhs/brick1/f/mvs1/ ddf79 f112 f268 f432 f560 f698 f845 f994 mvetc24 ddf84 f116 f269 f435 f561 f70 f846 f996 mvetc25 ddf86 f118 f273 f441 f563 f703 f848 f999 mvetc27 ddf89 f125 f274 f442 f564 f704 f849 mvddf1 mvetc28 ddf92 f130 f278 f443 f570 f711 f85 mvddf10 mvetc29 ddf96 f134 f280 f449 f573 f712 f851 mvddf11 mvetc3 ddf98 f135 f282 f456 f578 f714 f853 mvddf15 mvetc30 etc100 f145 f284 f46 f589 f716 f858 mvddf17 mvetc31 etc65 f148 f287 f463 f59 f722 f864 mvddf2 mvetc35 etc67 f155 f295 f47 f591 f724 f868 mvddf20 mvetc36 etc68 f162 f297 f470 f596 f728 f871 mvddf21 mvetc37 etc70 f163 f299 f474 f597 f737 f874 mvddf3 mvetc39 etc71 f170 f301 f481 f606 f739 f880 mvddf30 mvetc4 etc72 f171 f307 f484 f608 f741 f882 mvddf32 mvetc40 etc73 f176 f309 f487 f610 f747 f884 mvddf34 mvetc41 etc74 f18 f310 f489 f611 f756 f885 mvddf36 mvetc42 etc75 f186 f314 f491 f612 f759 f890 mvddf37 mvetc43 etc76 f189 f320 f494 f614 f761 f891 mvddf38 mvetc45 etc78 f19 f323 f499 f618 f769 f893 mvddf42 mvetc46 etc79 f2 f327 f500 f621 f770 f894 mvddf44 mvetc49 etc80 f20 f328 f501 f624 f773 f896 mvddf47 mvetc5 etc81 f206 f329 f502 f625 f778 f897 mvddf48 mvetc50 etc82 f212 f332 f507 f63 f783 f903 mvddf49 mvetc54 etc83 f217 f345 f508 f631 f784 f911 mvddf53 mvetc57 etc84 f218 f346 f51 f645 f786 f917 mvddf56 mvetc58 etc85 f219 f357 f511 f648 f789 f922 mvddf57 mvetc59 etc86 f22 f361 f514 f657 f791 f925 mvddf6 mvetc60 etc89 f225 f362 f516 f664 f792 f93 mvddf60 mvetc61 etc90 f226 f363 f518 f666 f801 f933 mvetc1 mvetc62 etc91 f227 f364 f520 f669 f803 f936 mvetc10 mvetc63 etc92 f23 f373 f522 f674 f805 f951 mvetc11 mvetc64 etc95 f231 f375 f524 f678 f807 f955 mvetc12 mvetc8 etc96 f237 f379 f525 f679 f808 f957 mvetc13 mvetc9 etc97 f239 f380 f526 f68 f811 f96 mvetc14 s2 etc98 f240 f393 f531 f680 f813 f962 mvetc15 etc99 f245 f402 f538 f681 f82 f967 mvetc17 f1 f260 f403 f542 f682 f823 f970 mvetc18 f10 f261 f409 f543 f685 f825 f977 mvetc19 f103 f262 f410 f544 f688 f827 f978 mvetc2 f106 f263 f419 f554 f69 f83 f980 mvetc20 f107 f264 f422 f555 f691 f830 f987 mvetc21 f110 f266 f426 f56 f692 f839 f988 mvetc22 [root@7-VM4 ~]# ls /rhs/brick4/f/mvs1/ mvetc1 mvetc13 mvetc18 mvetc21 mvetc3 mvetc8 mvetc10 mvetc14 mvetc19 mvetc22 mvetc4 mvetc9 mvetc11 mvetc15 mvetc2 mvetc24 mvetc5 mvetc12 mvetc17 mvetc20 mvetc25 mvetc54 [root@7-VM4 ~]# ls /rhs/brick2/f/mvs1/ ddf100 etc97 f243 f383 f576 f735 f898 mvddf19 mvetc28 ddf68 etc98 f249 f396 f583 f743 f90 mvddf20 mvetc29 ddf70 etc99 f254 f399 f592 f75 f900 mvddf27 mvetc3 ddf73 f11 f272 f401 f598 f754 f902 mvddf28 mvetc30 ddf76 f111 f277 f408 f603 f757 f904 mvddf29 mvetc31 ddf77 f114 f279 f411 f605 f758 f908 mvddf33 mvetc35 ddf80 f115 f283 f413 f607 f762 f91 mvddf34 mvetc36 ddf82 f117 f285 f415 f615 f765 f912 mvddf4 mvetc37 ddf83 f120 f286 f417 f620 f766 f913 mvddf40 mvetc39 ddf93 f123 f29 f42 f623 f768 f916 mvddf44 mvetc4 ddf95 f128 f3 f423 f627 f771 f919 mvddf45 mvetc40 ddf97 f132 f302 f427 f628 f775 f92 mvddf48 mvetc41 etc100 f137 f304 f434 f629 f777 f921 mvddf49 mvetc42 etc65 f14 f306 f445 f636 f779 f932 mvddf5 mvetc43 etc67 f15 f31 f45 f637 f78 f935 mvddf53 mvetc45 etc68 f150 f311 f452 f64 f806 f937 mvddf54 mvetc46 etc70 f157 f312 f453 f641 f810 f939 mvddf59 mvetc49 etc71 f161 f313 f455 f642 f815 f94 mvddf6 mvetc5 etc72 f173 f315 f464 f644 f816 f940 mvddf60 mvetc50 etc73 f175 f319 f465 f647 f820 f941 mvddf63 mvetc54 etc74 f181 f321 f467 f649 f824 f947 mvddf8 mvetc57 etc75 f183 f324 f472 f651 f836 f948 mvetc1 mvetc58 etc76 f185 f33 f473 f653 f837 f950 mvetc10 mvetc59 etc78 f187 f338 f497 f66 f840 f956 mvetc11 mvetc60 etc79 f196 f340 f498 f662 f841 f958 mvetc12 mvetc61 etc80 f200 f347 f5 f67 f842 f959 mvetc13 mvetc62 etc81 f202 f349 f521 f670 f847 f960 mvetc14 mvetc63 etc82 f207 f351 f523 f672 f857 f965 mvetc15 mvetc64 etc83 f208 f352 f535 f673 f859 f969 mvetc17 mvetc8 etc84 f209 f354 f545 f676 f87 f971 mvetc18 mvetc9 etc85 f211 f355 f547 f686 f872 f972 mvetc19 s2 etc86 f216 f368 f548 f690 f873 f976 mvetc2 etc89 f220 f37 f549 f71 f879 f979 mvetc20 etc90 f222 f372 f556 f717 f88 f992 mvetc21 etc91 f223 f374 f567 f727 f888 f995 mvetc22 etc92 f224 f378 f568 f73 f89 f997 mvetc24 etc95 f238 f38 f569 f732 f892 mvddf12 mvetc25 etc96 f241 f382 f57 f734 f895 mvddf18 mvetc27 [root@7-VM4 ~]# ls /rhs/brick1/f/mvs1/ etc65 f201 f330 f418 f54 f660 f76 f875 mvddf17 mvetc29 f1000 f205 f331 f421 f541 f663 f760 f876 mvddf18 mvetc3 f101 f21 f334 f424 f550 f665 f764 f883 mvddf19 mvetc30 f102 f213 f337 f425 f558 f667 f767 f887 mvddf2 mvetc31 f108 f214 f34 f429 f559 f668 f772 f9 mvddf3 mvetc35 f113 f221 f342 f43 f565 f675 f774 f909 mvddf32 mvetc36 f121 f232 f343 f431 f566 f684 f776 f910 mvddf33 mvetc37 f127 f234 f344 f436 f571 f687 f782 f914 mvddf35 mvetc39 f129 f235 f348 f437 f575 f694 f79 f918 mvddf45 mvetc4 f13 f247 f35 f440 f577 f696 f790 f923 mvddf51 mvetc40 f131 f25 f356 f444 f579 f699 f795 f924 mvddf56 mvetc41 f133 f256 f358 f446 f584 f7 f796 f928 mvddf8 mvetc42 f136 f257 f36 f457 f586 f707 f797 f938 mvddf9 mvetc43 f138 f259 f360 f458 f590 f709 f80 f942 mvetc1 mvetc45 f139 f26 f367 f479 f595 f710 f802 f943 mvetc10 mvetc46 f143 f267 f371 f480 f599 f718 f81 f945 mvetc11 mvetc49 f144 f27 f376 f482 f6 f719 f814 f946 mvetc12 mvetc5 f153 f271 f377 f485 f600 f720 f817 f952 mvetc13 mvetc50 f158 f275 f386 f49 f609 f723 f818 f954 mvetc14 mvetc54 f16 f276 f388 f490 f613 f725 f826 f961 mvetc15 mvetc57 f165 f288 f389 f496 f616 f729 f828 f963 mvetc17 mvetc58 f168 f291 f391 f50 f617 f733 f833 f966 mvetc18 mvetc59 f169 f292 f392 f504 f622 f738 f844 f968 mvetc19 mvetc60 f178 f294 f395 f505 f630 f740 f850 f974 mvetc2 mvetc61 f180 f298 f398 f506 f633 f744 f852 f975 mvetc20 mvetc62 f182 f30 f40 f515 f639 f746 f854 f98 mvetc21 mvetc63 f191 f303 f404 f527 f643 f748 f855 f981 mvetc22 mvetc64 f194 f308 f405 f529 f65 f751 f86 f984 mvetc24 mvetc8 f197 f316 f407 f530 f650 f752 f860 f985 mvetc25 mvetc9 f198 f32 f412 f532 f654 f753 f867 f990 mvetc27 f199 f325 f416 f537 f655 f755 f869 f993 mvetc28 [root@7-VM3 ~]# ls /rhs/brick1/f/mvs1/ ddf65 f12 f248 f406 f53 f671 f821 f983 mvetc22 ddf67 f122 f250 f41 f533 f677 f822 f986 mvetc24 ddf72 f124 f251 f414 f534 f683 f829 f989 mvetc25 ddf75 f126 f252 f420 f536 f689 f831 f99 mvetc27 ddf78 f140 f253 f428 f539 f693 f832 f991 mvetc28 ddf85 f141 f255 f430 f540 f695 f834 f998 mvetc29 ddf87 f142 f258 f433 f546 f697 f835 mvddf10 mvetc3 ddf88 f146 f265 f438 f55 f700 f838 mvddf12 mvetc30 ddf91 f147 f270 f439 f551 f701 f84 mvddf16 mvetc31 etc100 f149 f28 f44 f552 f702 f843 mvddf21 mvetc35 etc65 f151 f281 f447 f553 f705 f856 mvddf22 mvetc36 etc67 f152 f289 f448 f557 f706 f861 mvddf23 mvetc37 etc68 f154 f290 f450 f562 f708 f862 mvddf25 mvetc39 etc70 f156 f293 f451 f572 f713 f863 mvddf27 mvetc4 etc71 f159 f296 f454 f574 f715 f865 mvddf29 mvetc40 etc72 f160 f300 f459 f58 f72 f866 mvddf30 mvetc41 etc73 f164 f305 f460 f580 f721 f870 mvddf35 mvetc42 etc74 f166 f317 f461 f581 f726 f877 mvddf36 mvetc43 etc75 f167 f318 f462 f582 f730 f878 mvddf37 mvetc45 etc76 f17 f322 f466 f585 f731 f881 mvddf39 mvetc46 etc78 f172 f326 f468 f587 f736 f886 mvddf4 mvetc49 etc79 f174 f333 f469 f588 f74 f889 mvddf40 mvetc5 etc80 f177 f335 f471 f593 f742 f899 mvddf42 mvetc50 etc81 f179 f336 f475 f594 f745 f901 mvddf47 mvetc54 etc82 f184 f339 f476 f60 f749 f905 mvddf5 mvetc57 etc83 f188 f341 f477 f601 f750 f906 mvddf51 mvetc58 etc84 f190 f350 f478 f602 f763 f907 mvddf54 mvetc59 etc85 f192 f353 f48 f604 f77 f915 mvddf58 mvetc60 etc86 f193 f359 f483 f61 f780 f920 mvddf63 mvetc61 etc89 f195 f365 f486 f619 f781 f926 mvddf9 mvetc62 etc90 f203 f366 f488 f62 f785 f927 mvetc1 mvetc63 etc91 f204 f369 f492 f626 f787 f929 mvetc10 mvetc64 etc92 f210 f370 f493 f632 f788 f930 mvetc11 mvetc8 etc95 f215 f381 f495 f634 f793 f931 mvetc12 mvetc9 etc96 f228 f384 f503 f635 f794 f934 mvetc13 s2 etc97 f229 f385 f509 f638 f798 f944 mvetc14 etc98 f230 f387 f510 f640 f799 f949 mvetc15 etc99 f233 f39 f512 f646 f8 f95 mvetc17 f100 f236 f390 f513 f652 f800 f953 mvetc18 f104 f24 f394 f517 f656 f804 f964 mvetc19 f105 f242 f397 f519 f658 f809 f97 mvetc2 f109 f244 f4 f52 f659 f812 f973 mvetc20 f119 f246 f400 f528 f661 f819 f982 mvetc21 data from one of the brick:- [root@7-VM3 ~]# ls /rhs/brick1/f/mvs1/etc65 | wc 149 149 1294 ########################################################### 3) now try to create same Dir from mount point and it will be successful and after that lookup will show 2 Dir with same name mount:- [root@7-VM2 mvs1]# mkdir etc65 [root@7-VM2 mvs1]# ls etc100 mvetc1 mvetc11 mvetc13 mvetc15 mvetc18 mvetc2 mvetc21 mvetc24 mvetc3 mvetc5 mvetc8 etc100 mvetc1 mvetc11 mvetc13 mvetc15 mvetc18 mvetc2 mvetc21 mvetc24 mvetc3 mvetc5 mvetc8 etc65 mvetc10 mvetc12 mvetc14 mvetc17 mvetc19 mvetc20 mvetc22 mvetc25 mvetc4 mvetc63 mvetc9 etc65 mvetc10 mvetc12 mvetc14 mvetc17 mvetc19 mvetc20 mvetc22 mvetc25 mvetc4 mvetc63 mvetc9 Backend:- [root@7-VM4 ~]# ls /rhs/brick1/f/mvs1/etc65 | wc 0 0 0 [root@7-VM4 ~]# ls /rhs/brick4/f/mvs1/etc65 | wc ls: cannot access /rhs/brick4/f/mvs1/etc65: No such file or directory 0 0 0 [root@7-VM4 ~]# ls /rhs/brick2/f/mvs1/etc65 | wc 141 141 1203 [root@7-VM1 ~]# ls /rhs/brick1/f/mvs1/etc65 | wc 155 155 1426 [root@7-VM3 ~]# ls /rhs/brick1/f/mvs1/etc65 | wc 149 149 1294 --> It hasn't healed Dir on all sub-volumes and mkdir didn't give any error even though Directory exist. --. It also shows Dir twice.
This is already fixed in release 3.6 branch, before branching as a part of bug #1104653. Closing this bug as NOTABUG.