Bug 800291

Summary: [347b4d48cba3cc1e00d40ec50e62497d65a27c84] - crash in inodelk when .glusterfs is removed from the back-end.
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: replicateAssignee: Vijay Bellur <vbellur>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: pre-releaseCC: gluster-bugs, mzywusko, vbellur, vinaraya
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-22 10:49:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test case script to reproduce the bug none

Description M S Vishwanath Bhat 2012-03-06 08:54:04 UTC
Description of problem:
Crash happened in inodelk, when .glusterfs is removed from one of the back end export directory. 

Version-Release number of selected component (if applicable):
git master with head at 347b4d48cba3cc1e00d40ec50e62497d65a27c84

How reproducible:
1/1

Steps to Reproduce:
1. Create and start a 2*2 dist-rep volume.
2. Now create some data on mountpoint, like utarring the linux kernel. 
3. After untar, rm -rf .glusterfs on one of the back-end export brick.
4. Run find . on mountpoint.
  
Actual results:
fuse client crashed with the following backtrace.

Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x000000334ca32905 in raise () from /lib64/libc.so.6
#1  0x000000334ca340e5 in abort () from /lib64/libc.so.6
#2  0x000000334ca2b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000334ca2ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f22d872bf94 in client3_1_inodelk (frame=0x7f22dbadb560, this=0x2566bc0, data=0x7fff7ed6c600) at client3_1-fops.c:4616
#5  0x00007f22d8712571 in client_inodelk (frame=0x7f22dbadb560, this=0x2566bc0, volume=0x256aa00 "hosdu-replicate-0", loc=0x7f22d1658084, cmd=6, lock=0x7fff7ed6cb70) at client.c:1592
#6  0x00007f22d84d1c57 in afr_nonblocking_inodelk (frame=0x7f22db8d5190, this=0x256af00) at afr-lk-common.c:1515
#7  0x00007f22d84bdaa4 in afr_sh_metadata_lock (frame=0x7f22db8d5190, this=0x256af00) at afr-self-heal-metadata.c:584
#8  0x00007f22d84bdb0a in afr_self_heal_metadata (frame=0x7f22db8d5190, this=0x256af00) at afr-self-heal-metadata.c:600
#9  0x00007f22d84b6a6c in afr_sh_missing_entries_done (frame=0x7f22db8d5190, this=0x256af00) at afr-self-heal-common.c:924
#10 0x00007f22d84cd148 in afr_unlock_common_cbk (frame=0x7f22db8d5190, cookie=0x0, this=0x256af00, op_ret=0, op_errno=0) at afr-lk-common.c:544
#11 0x00007f22d84cdd60 in afr_unlock_entrylk_cbk (frame=0x7f22db8d5190, cookie=0x0, this=0x256af00, op_ret=0, op_errno=0) at afr-lk-common.c:706
#12 0x00007f22d871d432 in client3_1_entrylk_cbk (req=0x7f22d11fe060, iov=0x7f22d11fe0a0, count=1, myframe=0x7f22dbadaea8) at client3_1-fops.c:1307
#13 0x00007f22dca80928 in rpc_clnt_handle_reply (clnt=0x2610ba0, pollin=0x1809a5c0) at rpc-clnt.c:797
#14 0x00007f22dca80cc5 in rpc_clnt_notify (trans=0x26206e0, mydata=0x2610bd0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1809a5c0) at rpc-clnt.c:916
#15 0x00007f22dca7cda8 in rpc_transport_notify (this=0x26206e0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1809a5c0) at rpc-transport.c:498
#16 0x00007f22d9564270 in socket_event_poll_in (this=0x26206e0) at socket.c:1686
#17 0x00007f22d95647f4 in socket_event_handler (fd=7, idx=1, data=0x26206e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#18 0x00007f22dccd7030 in event_dispatch_epoll_handler (event_pool=0x2552c20, events=0x2560e80, i=1) at event.c:794
#19 0x00007f22dccd7253 in event_dispatch_epoll (event_pool=0x2552c20) at event.c:856
#20 0x00007f22dccd75de in event_dispatch (event_pool=0x2552c20) at event.c:956
#21 0x0000000000407dcc in main (argc=4, argv=0x7fff7ed6d748) at glusterfsd.c:1612


Expected results:
There should not be any crashes.

Additional info:

Last few entries from client log file.


[2012-03-06 03:35:24.638715] W [client3_1-fops.c:827:client3_1_setxattr_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory
[2012-03-06 03:35:24.638734] I [afr-self-heal-metadata.c:244:afr_sh_metadata_sync_cbk] 0-hosdu-replicate-0: setting attributes failed for /linux-3.0.1.tar.gz on hosdu-client-0 (No such file or directory)
[2012-03-06 03:35:24.639610] W [client3_1-fops.c:375:client3_1_open_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /linux-3.0.1.tar.gz
[2012-03-06 03:35:24.639630] E [afr-self-heal-data.c:1239:afr_sh_data_open_cbk] 0-hosdu-replicate-0: open of /linux-3.0.1.tar.gz failed on child hosdu-client-0 (No such file or directory)
[2012-03-06 03:35:24.639643] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal failed on /linux-3.0.1.tar.gz
[2012-03-06 03:35:24.640305] I [afr-common.c:1165:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of /kernel-source.
[2012-03-06 03:35:24.640327] I [afr-common.c:1290:afr_launch_self_heal] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: /kernel-source, reason: lookup detected pending operations
[2012-03-06 03:35:24.641214] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path / on subvolume hosdu-client-0 => -1 (No such file or directory)
[2012-03-06 03:35:24.641289] I [afr-self-heal-common.c:1681:afr_sh_find_fresh_parents] 0-hosdu-replicate-0: Parent dir missing for /kernel-source, in missing entry self-heal, aborting missing-entry self-heal
[2012-03-06 03:35:24.642035] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source on subvolume hosdu-client-0 => -1 (No such file or directory)
[2012-03-06 03:35:24.642829] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source on subvolume hosdu-client-0 => -1 (No such file or directory)
[2012-03-06 03:35:24.642904] I [afr-self-heal-entry.c:2332:afr_sh_entry_fix] 0-hosdu-replicate-0: /kernel-source: Performing conservative merge
[2012-03-06 03:35:24.643223] W [client3_1-fops.c:2100:client3_1_opendir_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /kernel-source
[2012-03-06 03:35:24.643243] E [afr-self-heal-entry.c:2150:afr_sh_entry_opendir_cbk] 0-hosdu-replicate-0: opendir of /kernel-source failed on child hosdu-client-0 (No such file or directory)
[2012-03-06 03:35:24.643467] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal failed on /kernel-source
[2012-03-06 03:35:24.643879] W [client3_1-fops.c:2100:client3_1_opendir_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /kernel-source
[2012-03-06 03:35:24.647009] I [afr-common.c:1165:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of /kernel-source/linux-3.0.1.
[2012-03-06 03:35:24.647033] I [afr-common.c:1290:afr_launch_self_heal] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: /kernel-source/linux-3.0.1, reason: lookup detected pending opera
tions
[2012-03-06 03:35:24.647610] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source on subvolume hosdu-client-0 => -1 (No such file or directory)
[2012-03-06 03:35:24.647631] I [afr-self-heal-common.c:1681:afr_sh_find_fresh_parents] 0-hosdu-replicate-0: Parent dir missing for /kernel-source/linux-3.0.1, in missing entry self-heal, aborting missing-entry self-heal
[2012-03-06 03:35:24.648484] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source/linux-3.0.1 on subvolume hosdu-client-0 => -1 (No such file or directory)
[2012-03-06 03:35:24.649586] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source/linux-3.0.1 on subvolume hosdu-client-0 => -1 (No such file or directory)
[2012-03-06 03:35:24.649629] I [afr-self-heal-entry.c:2332:afr_sh_entry_fix] 0-hosdu-replicate-0: /kernel-source/linux-3.0.1: Performing conservative merge
[2012-03-06 03:35:24.650058] W [client3_1-fops.c:2100:client3_1_opendir_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /kernel-source/linux-3.0.1
[2012-03-06 03:35:24.650080] E [afr-self-heal-entry.c:2150:afr_sh_entry_opendir_cbk] 0-hosdu-replicate-0: opendir of /kernel-source/linux-3.0.1 failed on child hosdu-client-0 (No such file or directory)
[2012-03-06 03:35:24.650342] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal failed on /kernel-source/linux-3.0.1
[2012-03-06 03:35:24.650792] W [client3_1-fops.c:2100:client3_1_opendir_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory. Path: /kernel-source/linux-3.0.1
[2012-03-06 03:35:24.662835] I [afr-common.c:1165:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of /kernel-source/linux-3.0.1/.gitignore.
[2012-03-06 03:35:24.662865] I [afr-common.c:1290:afr_launch_self_heal] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: /kernel-source/linux-3.0.1/.gitignore, reason: lookup detected pending operations
[2012-03-06 03:35:24.663396] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-0: path /kernel-source/linux-3.0.1 on subvolume hosdu-client-0 => -1 (No such file or directory)
[2012-03-06 03:35:24.663430] I [afr-self-heal-common.c:1681:afr_sh_find_fresh_parents] 0-hosdu-replicate-0: Parent dir missing for /kernel-source/linux-3.0.1/.gitignore, in missing entry self-heal, aborting missing-entry self-heal
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-03-06 03:35:24
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib64/libc.so.6[0x334ca32980]
/lib64/libc.so.6(gsignal+0x35)[0x334ca32905]

/lib64/libc.so.6(abort+0x175)[0x334ca340e5]
/lib64/libc.so.6[0x334ca2b9be]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x334ca2ba80]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_inodelk+0x157)[0x7f22d872bf94]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client_inodelk+0x18b)[0x7f22d8712571]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_nonblocking_inodelk+0xadc)[0x7f22d84d1c57]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_sh_metadata_lock+0xb6)[0x7f22d84bdaa4]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_self_heal_metadata+0x5f)[0x7f22d84bdb0a]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_sh_missing_entries_done+0x160)[0x7f22d84b6a6c]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(+0x55148)[0x7f22d84cd148]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(+0x55d60)[0x7f22d84cdd60]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_entrylk_cbk+0x2c3)[0x7f22d871d432]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211)[0x7f22dca80928]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2d3)[0x7f22dca80cc5]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x130)[0x7f22dca7cda8]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_poll_in+0x54)[0x7f22d9564270]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_handler+0x21d)[0x7f22d95647f4]
/usr/local/lib/libglusterfs.so.0(+0x4d030)[0x7f22dccd7030]
/usr/local/lib/libglusterfs.so.0(+0x4d253)[0x7f22dccd7253]
/usr/local/lib/libglusterfs.so.0(event_dispatch+0x88)[0x7f22dccd75de]
/usr/local/sbin/glusterfs(main+0x238)[0x407dcc]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x334ca1ecdd]
/usr/local/sbin/glusterfs[0x403f79]
---------


The followed the same steps as in bug-788049. But crash happened in different place. Have archived all the logs and core.

Comment 1 M S Vishwanath Bhat 2012-03-06 14:30:44 UTC
Created attachment 567967 [details]
test case script to reproduce the bug

Please set/change the proper ip address and mountpoint in the script.

Comment 2 Pranith Kumar K 2012-03-16 17:47:50 UTC
*** Bug 803616 has been marked as a duplicate of this bug. ***

Comment 3 M S Vishwanath Bhat 2012-05-07 08:08:32 UTC
With glusterfs-3.3.0qa39, the crash doesn't happen. But If I delete .glusterfs from two back-end's of separate replica pair, then each directory is listed twice.

Comment 4 Vijay Bellur 2012-05-15 05:16:00 UTC
Removing 3.3.0beta as the crash does not happen any more.

Comment 5 Vijay Bellur 2013-02-22 10:49:52 UTC
Please re-open if observed again.