Bug 764229 (GLUSTER-2497)

Summary: client crashes
Product: [Community] GlusterFS Reporter: Benjamin Henrion <bh>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.2CC: bh, gluster-bugs, rabhat, vijay, visonge
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
log file of the client none

Description Benjamin Henrion 2011-03-08 07:09:07 UTC
Created attachment 450

Comment 1 Benjamin Henrion 2011-03-08 10:08:34 UTC
The client crashes while I shutting down some servers.

The mounted directory over /glusterfs is unresponsive, and reports:

root@machine / [92]# cd /glusterfs 
-bash: cd: /glusterfs: Transport endpoint is not connected
root@machine / [93]# l
ls: cannot access glusterfs: Transport endpoint is not connected

See attached for the full log.

Comment 2 Pranith Kumar K 2011-03-10 06:30:02 UTC
Steps to reproduce and the back trace with symbols:
mount a replicated volume.
start a dd for a file
bring one of the bricks down
remove the file from the mount point while the dd is still in progress
bring the brick backup
the next write call triggers an openfd self heal but the loc will be all zeros so loc_copy will crash the client

#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
#1  0x00007f8312b8e015 in gf_strdup (src=0x0) at ../../../libglusterfs/src/mem-pool.h:89
#2  0x00007f8312b9187c in loc_copy (dst=0x7f83080024d0, src=0x7f83080008e8) at ../../../libglusterfs/src/xlator.c:1055
#3  0x00007f830f539bb8 in afr_build_parent_loc (parent=0x7f83080024d0, child=0x7f83080008e8) at ../../../../../xlators/cluster/afr/src/afr-dir-write.c:57
#4  0x00007f830f55fd45 in afr_self_heal_missing_entries (frame=0x7f83112fd94c, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1419
#5  0x00007f830f5607cb in afr_self_heal (frame=0x7f8311526e10, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1629
#6  0x00007f830f55177b in afr_openfd_sh (frame=0x7f8311526e10, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-open.c:437
#7  0x00007f830f556d4f in afr_internal_lock_finish (frame=0x7f8311526e10, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:1169
#8  0x00007f830f55657b in afr_post_blocking_inodelk_cbk (frame=0x7f8311526e10, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:954
#9  0x00007f830f57441b in afr_lock_blocking (frame=0x7f8311526e10, this=0x665200, child_index=2) at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:992
#10 0x00007f830f573932 in afr_lock_cbk (frame=0x7f8311526e10, cookie=0x1, this=0x665200, op_ret=-1, op_errno=77)
    at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:756
#11 0x00007f830f5739a5 in afr_blocking_inodelk_cbk (frame=0x7f8311526e10, cookie=0x1, this=0x665200, op_ret=-1, op_errno=77)
    at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:770
#12 0x00007f830f7bfb49 in client3_1_finodelk (frame=0x7f83115270a4, this=0x664600, data=0x7fffb1ca1110)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:4529
#13 0x00007f830f7a94c2 in client_finodelk (frame=0x7f83115270a4, this=0x664600, volume=0x664400 "vol-replicate-0", fd=0x7f830dd8d024, cmd=7, lock=0x7fffb1ca1260)
    at ../../../../../xlators/protocol/client/src/client.c:1290
#14 0x00007f830f574701 in afr_lock_blocking (frame=0x7f8311526e10, this=0x665200, child_index=1) at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:1005
#15 0x00007f830f573932 in afr_lock_cbk (frame=0x7f8311526e10, cookie=0x0, this=0x665200, op_ret=0, op_errno=0)
    at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:756
#16 0x00007f830f5739a5 in afr_blocking_inodelk_cbk (frame=0x7f8311526e10, cookie=0x0, this=0x665200, op_ret=0, op_errno=0)
    at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:770
#17 0x00007f830f7b2458 in client3_1_finodelk_cbk (req=0x7f830e42d4ac, iov=0x7f830e42d4ec, count=1, myframe=0x7f83115268e8)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1101
#18 0x00007f831296a51d in rpc_clnt_handle_reply (clnt=0x66e3f0, pollin=0x7f8308006170) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:757
#19 0x00007f831296a87c in rpc_clnt_notify (trans=0x66e510, mydata=0x66e420, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f8308006170)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:870
#20 0x00007f8312967c75 in rpc_transport_notify (this=0x66e510, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f8308006170)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:1027
#21 0x00007f83103e7e44 in socket_event_poll_in (this=0x66e510) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1623
#22 0x00007f83103e81f7 in socket_event_handler (fd=7, idx=1, data=0x66e510, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1737
#23 0x00007f8312bbc333 in event_dispatch_epoll_handler (event_pool=0x659330, events=0x65d9a0, i=0) at ../../../libglusterfs/src/event.c:812
#24 0x00007f8312bbc543 in event_dispatch_epoll (event_pool=0x659330) at ../../../libglusterfs/src/event.c:876
#25 0x00007f8312bbc8ab in event_dispatch (event_pool=0x659330) at ../../../libglusterfs/src/event.c:984
#26 0x00000000004067e8 in main (argc=5, argv=0x7fffb1ca1998) at ../../../glusterfsd/src/glusterfsd.c:1442

Comment 3 Vijay Bellur 2011-03-22 08:20:24 UTC
PATCH: http://patches.gluster.com/patch/6400 in master (cluster/afr: skip openfd flush when the file is already deleted)

Comment 4 Vijay Bellur 2011-03-29 08:14:42 UTC
PATCH: http://patches.gluster.com/patch/6611 in master (features/marker: check for op_ret before doing any operations in lookup callback)

Comment 5 Anand Avati 2011-04-11 09:44:22 UTC
PATCH: http://patches.gluster.com/patch/6546 in release-3.1 (cluster/afr: skip openfd flush when the file is already deleted)

Comment 6 Raghavendra Bhat 2011-04-13 08:15:25 UTC
Tried with the steps mentioned by Pranith. client crashed for 3.1.3. With master it did not crash.

Comment 7 Pranith Kumar K 2011-07-27 06:40:54 UTC
*** Bug 3261 has been marked as a duplicate of this bug. ***