Bug 788049

Summary: crash in afr when .glusterfs directory is removed from one of the bricks
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: coreAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED UPSTREAM QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, mzywusko, shwetha.h.panduranga
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-11 08:13:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
glusterfs client log none

Description M S Vishwanath Bhat 2012-02-07 10:51:22 UTC
Created attachment 559914 [details]
glusterfs client log

Description of problem:
Fuse client crashed when .glusterfs is removed from one of the bricks, in a 4 node replicate system.

Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa21

Steps to Reproduce:
1. Create and start a 4 node replicate volume.
2. Download and start untarring the linux kernel.
3. While the untarring is going on, add 4 more bricks to the volume. So that the volume becomes a 2*4 dist-rep volume.
4. Now rm -rf .glusterfs in one of the bricks in back-end. 
  
Actual results:
glusterfs client crashed with following back trace.

Core was generated by `/usr/local/sbin/glusterfs --volfile-id=hosdu --volfile-server=10.1.11.137 /mnt/'.
Program terminated with signal 6, Aborted.
#0  0x000000334ca32905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x000000334ca32905 in raise () from /lib64/libc.so.6
#1  0x000000334ca340e5 in abort () from /lib64/libc.so.6
#2  0x000000334ca2b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000334ca2ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f92aea3276e in afr_self_heal_parent_entrylk (frame=0x7f92b1d84bac, this=0x2437d50, lock_cbk=0x7f92aea32331 <afr_sh_post_nb_entrylk_conflicting_sh_cbk>) at afr-self-heal-common.c:1894
#5  0x00007f92aea327eb in afr_self_heal_conflicting_entries (frame=0x7f92b1d84bac, this=0x2437d50) at afr-self-heal-common.c:1905
#6  0x00007f92aea332e5 in afr_self_heal (frame=0x7f92b2dbfbb8, this=0x2437d50, inode=0x7f92a43ca04c) at afr-self-heal-common.c:2146
#7  0x00007f92aea53474 in afr_launch_self_heal (frame=0x7f92b2dbfbb8, this=0x2437d50, inode=0x7f92a43ca04c, background=_gf_true, ia_type=IA_IFDIR, reason=0x7f92aea688b8 "lookup detected pending operations", 
    gfid_sh_success_cbk=0x7f92aea54016 <afr_post_gfid_sh_success>, unwind=0x7f92aea53da4 <afr_self_heal_lookup_unwind>) at afr-common.c:1290
#8  0x00007f92aea543c2 in afr_lookup_perform_self_heal (frame=0x7f92b2dbfbb8, this=0x2437d50, sh_launched=0x7ffffac11dc8) at afr-common.c:1583
#9  0x00007f92aea549ca in afr_lookup_done (frame=0x7f92b2dbfbb8, this=0x2437d50) at afr-common.c:1733
#10 0x00007f92aea5529a in afr_lookup_cbk (frame=0x7f92b2dbfbb8, cookie=0x3, this=0x2437d50, op_ret=0, op_errno=0, inode=0x7f92a43ca04c, buf=0x7ffffac11f80, xattr=0x2cec180, postparent=0x7ffffac11f10) at afr-common.c:1904
#11 0x00007f92aec95138 in client3_1_lookup_cbk (req=0x7f929c8d81c4, iov=0x7f929c8d8204, count=1, myframe=0x7f92b2dbf908) at client3_1-fops.c:2292
#12 0x00007f92b3f696a4 in rpc_clnt_handle_reply (clnt=0x1e65f00, pollin=0x2026f20) at rpc-clnt.c:790
#13 0x00007f92b3f69a2b in rpc_clnt_notify (trans=0x1e6c120, mydata=0x1e65f30, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2026f20) at rpc-clnt.c:909
#14 0x00007f92b3f65c08 in rpc_transport_notify (this=0x1e6c120, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2026f20) at rpc-transport.c:498
#15 0x00007f92afad523d in socket_event_poll_in (this=0x1e6c120) at socket.c:1675
#16 0x00007f92afad57c1 in socket_event_handler (fd=15, idx=5, data=0x1e6c120, poll_in=1, poll_out=0, poll_err=0) at socket.c:1790
#17 0x00007f92b41be76c in event_dispatch_epoll_handler (event_pool=0x1d3bb80, events=0x1d41050, i=3) at event.c:794
#18 0x00007f92b41be98f in event_dispatch_epoll (event_pool=0x1d3bb80) at event.c:856
#19 0x00007f92b41bed1a in event_dispatch (event_pool=0x1d3bb80) at event.c:956
#20 0x0000000000407c2e in main (argc=4, argv=0x7ffffac12678) at glusterfsd.c:1601
(gdb) f 4
#4  0x00007f92aea3276e in afr_self_heal_parent_entrylk (frame=0x7f92b1d84bac, this=0x2437d50, lock_cbk=0x7f92aea32331 <afr_sh_post_nb_entrylk_conflicting_sh_cbk>) at afr-self-heal-common.c:1894
1894            GF_ASSERT (local->loc.parent);
(gdb) p local
$1 = (afr_local_t *) 0x1e76a90
(gdb) p local->loc.parent
$2 = (inode_t *) 0x0
(gdb) 



Expected results:
There should be no crashes.

Additional info:

Last few entries from the client log.


[2012-02-07 04:34:22.463668] W [client3_1-fops.c:1350:client3_1_entrylk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.464403] W [client3_1-fops.c:1350:client3_1_entrylk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.465747] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.466044] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-mi2020.c
[2012-02-07 04:34:22.467034] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.467410] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-mi2020.c
[2012-02-07 04:34:22.468715] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.469054] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-mi2020.c
[2012-02-07 04:34:22.469944] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.470543] W [client3_1-fops.c:739:client3_1_flush_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.471229] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.472626] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.473413] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.474630] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.475382] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.478208] W [client3_1-fops.c:1350:client3_1_entrylk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.479006] W [client3_1-fops.c:1350:client3_1_entrylk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.480481] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.480799] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-ov2640.c
[2012-02-07 04:34:22.482034] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.482474] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-ov2640.c
[2012-02-07 04:34:22.483501] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.483968] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-ov2640.c
[2012-02-07 04:34:22.484707] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.485497] W [client3_1-fops.c:739:client3_1_flush_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.486090] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.487319] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.488113] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.489331] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.490097] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.493178] W [client3_1-fops.c:1350:client3_1_entrylk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.493909] W [client3_1-fops.c:1350:client3_1_entrylk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.495355] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.495820] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-ov9655.c
[2012-02-07 04:34:22.496767] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.497227] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-ov9655.c
[2012-02-07 04:34:22.498242] I [afr-inode-write.c:430:afr_open_fd_fix] 1-hosdu-replicate-0: Opening fd 0x7f929ab3b104
[2012-02-07 04:34:22.498663] W [client3_1-fops.c:373:client3_1_open_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory. Path: /linux-3.0.2/drivers/media/video/gspca/gl860/gl860-ov9655.c
[2012-02-07 04:34:22.499526] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.500442] W [client3_1-fops.c:739:client3_1_flush_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.501019] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.502178] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.503094] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.504378] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:22.505156] W [client3_1-fops.c:1273:client3_1_inodelk_cbk] 1-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-07 04:34:23.863924] I [afr-common.c:1163:afr_detect_self_heal_by_lookup_status] 1-hosdu-replicate-0: entries are missing in lookup of /.
[2012-02-07 04:34:23.863981] I [afr-common.c:1288:afr_launch_self_heal] 1-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: /, reason: lookup detected pending operations
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)

patchset: git://git.gluster.com/glusterfs.git

signal received: 6
time of crash: 2012-02-07 04:34:23
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa21
/lib64/libc.so.6[0x334ca32980]
/lib64/libc.so.6(gsignal+0x35)[0x334ca32905]
/lib64/libc.so.6(abort+0x175)[0x334ca340e5]
/lib64/libc.so.6[0x334ca2b9be]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x334ca2ba80]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(+0x3f76e)[0x7f92aea3276e]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(+0x3f7eb)[0x7f92aea327eb]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(afr_self_heal+0x45e)[0x7f92aea332e5]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(afr_launch_self_heal+0x228)[0x7f92aea53474]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(+0x613c2)[0x7f92aea543c2]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(+0x619ca)[0x7f92aea549ca]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(afr_lookup_cbk+0xed)[0x7f92aea5529a]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/protocol/client.so(client3_1_lookup_cbk+0x6ff)[0x7f92aec95138]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211)[0x7f92b3f696a4]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2bd)[0x7f92b3f69a2b]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x130)[0x7f92b3f65c08]
/usr/local/lib/glusterfs/3.3.0qa21/rpc-transport/socket.so(socket_event_poll_in+0x54)[0x7f92afad523d]
/usr/local/lib/glusterfs/3.3.0qa21/rpc-transport/socket.so(socket_event_handler+0x21d)[0x7f92afad57c1]
/usr/local/lib/libglusterfs.so.0(+0x4b76c)[0x7f92b41be76c]
/usr/local/lib/libglusterfs.so.0(+0x4b98f)[0x7f92b41be98f]
/usr/local/lib/libglusterfs.so.0(event_dispatch+0x88)[0x7f92b41bed1a]
/usr/local/sbin/glusterfs(main+0x238)[0x407c2e]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x334ca1ecdd]
/usr/local/sbin/glusterfs[0x403e99]
---------


I have attached the client log. I have archived other logs.

Comment 1 M S Vishwanath Bhat 2012-02-13 12:34:14 UTC
*** Bug 789989 has been marked as a duplicate of this bug. ***

Comment 2 M S Vishwanath Bhat 2012-05-11 08:13:23 UTC
Somehow I'm not getting this crash anymore. Moving it to closed upstream.