Description of problem: Crash happened in inodelk, when .glusterfs is removed from one of the back end export directory. Version-Release number of selected component (if applicable): git master with head at 347b4d48cba3cc1e00d40ec50e62497d65a27c84 How reproducible: 1/1 Steps to Reproduce: 1. Create and start a 2*2 dist-rep volume. 2. Now create some data on mountpoint, like utarring the linux kernel. 3. After untar, rm -rf .glusterfs on one of the back-end export brick. 4. Run find . on mountpoint. Actual results: fuse client crashed with the following backtrace. Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64 (gdb) bt #0 0x000000334ca32905 in raise () from /lib64/libc.so.6 #1 0x000000334ca340e5 in abort () from /lib64/libc.so.6 #2 0x000000334ca2b9be in __assert_fail_base () from /lib64/libc.so.6 #3 0x000000334ca2ba80 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f22d872bf94 in client3_1_inodelk (frame=0x7f22dbadb560, this=0x2566bc0, data=0x7fff7ed6c600) at client3_1-fops.c:4616 #5 0x00007f22d8712571 in client_inodelk (frame=0x7f22dbadb560, this=0x2566bc0, volume=0x256aa00 "hosdu-replicate-0", loc=0x7f22d1658084, cmd=6, lock=0x7fff7ed6cb70) at client.c:1592 #6 0x00007f22d84d1c57 in afr_nonblocking_inodelk (frame=0x7f22db8d5190, this=0x256af00) at afr-lk-common.c:1515 #7 0x00007f22d84bdaa4 in afr_sh_metadata_lock (frame=0x7f22db8d5190, this=0x256af00) at afr-self-heal-metadata.c:584 #8 0x00007f22d84bdb0a in afr_self_heal_metadata (frame=0x7f22db8d5190, this=0x256af00) at afr-self-heal-metadata.c:600 #9 0x00007f22d84b6a6c in afr_sh_missing_entries_done (frame=0x7f22db8d5190, this=0x256af00) at afr-self-heal-common.c:924 #10 0x00007f22d84cd148 in afr_unlock_common_cbk (frame=0x7f22db8d5190, cookie=0x0, this=0x256af00, op_ret=0, op_errno=0) at afr-lk-common.c:544 #11 0x00007f22d84cdd60 in afr_unlock_entrylk_cbk (frame=0x7f22db8d5190, cookie=0x0, this=0x256af00, op_ret=0, op_errno=0) at afr-lk-common.c:706 #12 0x00007f22d871d432 in client3_1_entrylk_cbk (req=0x7f22d11fe060, iov=0x7f22d11fe0a0, count=1, myframe=0x7f22dbadaea8) at client3_1-fops.c:1307 #13 0x00007f22dca80928 in rpc_clnt_handle_reply (clnt=0x2610ba0, pollin=0x1809a5c0) at rpc-clnt.c:797 #14 0x00007f22dca80cc5 in rpc_clnt_notify (trans=0x26206e0, mydata=0x2610bd0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1809a5c0) at rpc-clnt.c:916 #15 0x00007f22dca7cda8 in rpc_transport_notify (this=0x26206e0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1809a5c0) at rpc-transport.c:498 #16 0x00007f22d9564270 in socket_event_poll_in (this=0x26206e0) at socket.c:1686 #17 0x00007f22d95647f4 in socket_event_handler (fd=7, idx=1, data=0x26206e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801 #18 0x00007f22dccd7030 in event_dispatch_epoll_handler (event_pool=0x2552c20, events=0x2560e80, i=1) at event.c:794 #19 0x00007f22dccd7253 in event_dispatch_epoll (event_pool=0x2552c20) at event.c:856 #20 0x00007f22dccd75de in event_dispatch (event_pool=0x2552c20) at event.c:956 #21 0x0000000000407dcc in main (argc=4, argv=0x7fff7ed6d748) at glusterfsd.c:1612 Expected results: There should not be any crashes. Additional info: Last few entries from client log file. [2012-03-06 03:35:24.638715] W [client3_1-fops.c:827:client3_1_setxattr_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory [2012-03-06 03:35:24.638734] I [afr-self-heal-metadata.c:244:afr_sh_metadata_sync_cbk] 0-hosdu-replicate-0: setting attributes failed for /linux-3.0.1.tar.gz on hosdu-client-0 (No such file or directory) [2012-03-06 03:35:24.639610] W [client3_1-fops.c:375:client3_1_open_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /linux-3.0.1.tar.gz [2012-03-06 03:35:24.639630] E [afr-self-heal-data.c:1239:afr_sh_data_open_cbk] 0-hosdu-replicate-0: open of /linux-3.0.1.tar.gz failed on child hosdu-client-0 (No such file or directory) [2012-03-06 03:35:24.639643] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal failed on /linux-3.0.1.tar.gz [2012-03-06 03:35:24.640305] I [afr-common.c:1165:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of /kernel-source. [2012-03-06 03:35:24.640327] I [afr-common.c:1290:afr_launch_self_heal] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal triggered. path: /kernel-source, reason: lookup detected pending operations [2012-03-06 03:35:24.641214] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path / on subvolume hosdu-client-0 => -1 (No such file or directory) [2012-03-06 03:35:24.641289] I [afr-self-heal-common.c:1681:afr_sh_find_fresh_parents] 0-hosdu-replicate-0: Parent dir missing for /kernel-source, in missing entry self-heal, aborting missing-entry self-heal [2012-03-06 03:35:24.642035] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source on subvolume hosdu-client-0 => -1 (No such file or directory) [2012-03-06 03:35:24.642829] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source on subvolume hosdu-client-0 => -1 (No such file or directory) [2012-03-06 03:35:24.642904] I [afr-self-heal-entry.c:2332:afr_sh_entry_fix] 0-hosdu-replicate-0: /kernel-source: Performing conservative merge [2012-03-06 03:35:24.643223] W [client3_1-fops.c:2100:client3_1_opendir_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /kernel-source [2012-03-06 03:35:24.643243] E [afr-self-heal-entry.c:2150:afr_sh_entry_opendir_cbk] 0-hosdu-replicate-0: opendir of /kernel-source failed on child hosdu-client-0 (No such file or directory) [2012-03-06 03:35:24.643467] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal failed on /kernel-source [2012-03-06 03:35:24.643879] W [client3_1-fops.c:2100:client3_1_opendir_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /kernel-source [2012-03-06 03:35:24.647009] I [afr-common.c:1165:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of /kernel-source/linux-3.0.1. [2012-03-06 03:35:24.647033] I [afr-common.c:1290:afr_launch_self_heal] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal triggered. path: /kernel-source/linux-3.0.1, reason: lookup detected pending opera tions [2012-03-06 03:35:24.647610] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source on subvolume hosdu-client-0 => -1 (No such file or directory) [2012-03-06 03:35:24.647631] I [afr-self-heal-common.c:1681:afr_sh_find_fresh_parents] 0-hosdu-replicate-0: Parent dir missing for /kernel-source/linux-3.0.1, in missing entry self-heal, aborting missing-entry self-heal [2012-03-06 03:35:24.648484] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source/linux-3.0.1 on subvolume hosdu-client-0 => -1 (No such file or directory) [2012-03-06 03:35:24.649586] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source/linux-3.0.1 on subvolume hosdu-client-0 => -1 (No such file or directory) [2012-03-06 03:35:24.649629] I [afr-self-heal-entry.c:2332:afr_sh_entry_fix] 0-hosdu-replicate-0: /kernel-source/linux-3.0.1: Performing conservative merge [2012-03-06 03:35:24.650058] W [client3_1-fops.c:2100:client3_1_opendir_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /kernel-source/linux-3.0.1 [2012-03-06 03:35:24.650080] E [afr-self-heal-entry.c:2150:afr_sh_entry_opendir_cbk] 0-hosdu-replicate-0: opendir of /kernel-source/linux-3.0.1 failed on child hosdu-client-0 (No such file or directory) [2012-03-06 03:35:24.650342] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal failed on /kernel-source/linux-3.0.1 [2012-03-06 03:35:24.650792] W [client3_1-fops.c:2100:client3_1_opendir_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /kernel-source/linux-3.0.1 [2012-03-06 03:35:24.662835] I [afr-common.c:1165:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of /kernel-source/linux-3.0.1/.gitignore. [2012-03-06 03:35:24.662865] I [afr-common.c:1290:afr_launch_self_heal] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal triggered. path: /kernel-source/linux-3.0.1/.gitignore, reason: lookup detected pending operations [2012-03-06 03:35:24.663396] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source/linux-3.0.1 on subvolume hosdu-client-0 => -1 (No such file or directory) [2012-03-06 03:35:24.663430] I [afr-self-heal-common.c:1681:afr_sh_find_fresh_parents] 0-hosdu-replicate-0: Parent dir missing for /kernel-source/linux-3.0.1/.gitignore, in missing entry self-heal, aborting missing-entry self-heal pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2012-03-06 03:35:24 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3git /lib64/libc.so.6[0x334ca32980] /lib64/libc.so.6(gsignal+0x35)[0x334ca32905] /lib64/libc.so.6(abort+0x175)[0x334ca340e5] /lib64/libc.so.6[0x334ca2b9be] /lib64/libc.so.6(__assert_perror_fail+0x0)[0x334ca2ba80] /usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_inodelk+0x157)[0x7f22d872bf94] /usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client_inodelk+0x18b)[0x7f22d8712571] /usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_nonblocking_inodelk+0xadc)[0x7f22d84d1c57] /usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_sh_metadata_lock+0xb6)[0x7f22d84bdaa4] /usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_self_heal_metadata+0x5f)[0x7f22d84bdb0a] /usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_sh_missing_entries_done+0x160)[0x7f22d84b6a6c] /usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(+0x55148)[0x7f22d84cd148] /usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(+0x55d60)[0x7f22d84cdd60] /usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_entrylk_cbk+0x2c3)[0x7f22d871d432] /usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211)[0x7f22dca80928] /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2d3)[0x7f22dca80cc5] /usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x130)[0x7f22dca7cda8] /usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_poll_in+0x54)[0x7f22d9564270] /usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_handler+0x21d)[0x7f22d95647f4] /usr/local/lib/libglusterfs.so.0(+0x4d030)[0x7f22dccd7030] /usr/local/lib/libglusterfs.so.0(+0x4d253)[0x7f22dccd7253] /usr/local/lib/libglusterfs.so.0(event_dispatch+0x88)[0x7f22dccd75de] /usr/local/sbin/glusterfs(main+0x238)[0x407dcc] /lib64/libc.so.6(__libc_start_main+0xfd)[0x334ca1ecdd] /usr/local/sbin/glusterfs[0x403f79] --------- The followed the same steps as in bug-788049. But crash happened in different place. Have archived all the logs and core.
Created attachment 567967 [details] test case script to reproduce the bug Please set/change the proper ip address and mountpoint in the script.
*** Bug 803616 has been marked as a duplicate of this bug. ***
With glusterfs-3.3.0qa39, the crash doesn't happen. But If I delete .glusterfs from two back-end's of separate replica pair, then each directory is listed twice.
Removing 3.3.0beta as the crash does not happen any more.
Please re-open if observed again.