Bug 764229 (GLUSTER-2497) - client crashes
Summary: client crashes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-2497
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.1.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
: GLUSTER-3261 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-08 10:08 UTC by Benjamin Henrion
Modified: 2011-07-27 09:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
log file of the client (60.59 KB, text/x-log)
2011-03-08 07:09 UTC, Benjamin Henrion
no flags Details

Description Benjamin Henrion 2011-03-08 07:09:07 UTC
Created attachment 450

Comment 1 Benjamin Henrion 2011-03-08 10:08:34 UTC
The client crashes while I shutting down some servers.

The mounted directory over /glusterfs is unresponsive, and reports:

root@machine / [92]# cd /glusterfs 
-bash: cd: /glusterfs: Transport endpoint is not connected
root@machine / [93]# l
ls: cannot access glusterfs: Transport endpoint is not connected

See attached for the full log.

Comment 2 Pranith Kumar K 2011-03-10 06:30:02 UTC
Steps to reproduce and the back trace with symbols:
mount a replicated volume.
start a dd for a file
bring one of the bricks down
remove the file from the mount point while the dd is still in progress
bring the brick backup
the next write call triggers an openfd self heal but the loc will be all zeros so loc_copy will crash the client

#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
#1  0x00007f8312b8e015 in gf_strdup (src=0x0) at ../../../libglusterfs/src/mem-pool.h:89
#2  0x00007f8312b9187c in loc_copy (dst=0x7f83080024d0, src=0x7f83080008e8) at ../../../libglusterfs/src/xlator.c:1055
#3  0x00007f830f539bb8 in afr_build_parent_loc (parent=0x7f83080024d0, child=0x7f83080008e8) at ../../../../../xlators/cluster/afr/src/afr-dir-write.c:57
#4  0x00007f830f55fd45 in afr_self_heal_missing_entries (frame=0x7f83112fd94c, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1419
#5  0x00007f830f5607cb in afr_self_heal (frame=0x7f8311526e10, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1629
#6  0x00007f830f55177b in afr_openfd_sh (frame=0x7f8311526e10, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-open.c:437
#7  0x00007f830f556d4f in afr_internal_lock_finish (frame=0x7f8311526e10, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:1169
#8  0x00007f830f55657b in afr_post_blocking_inodelk_cbk (frame=0x7f8311526e10, this=0x665200) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:954
#9  0x00007f830f57441b in afr_lock_blocking (frame=0x7f8311526e10, this=0x665200, child_index=2) at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:992
#10 0x00007f830f573932 in afr_lock_cbk (frame=0x7f8311526e10, cookie=0x1, this=0x665200, op_ret=-1, op_errno=77)
    at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:756
#11 0x00007f830f5739a5 in afr_blocking_inodelk_cbk (frame=0x7f8311526e10, cookie=0x1, this=0x665200, op_ret=-1, op_errno=77)
    at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:770
#12 0x00007f830f7bfb49 in client3_1_finodelk (frame=0x7f83115270a4, this=0x664600, data=0x7fffb1ca1110)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:4529
#13 0x00007f830f7a94c2 in client_finodelk (frame=0x7f83115270a4, this=0x664600, volume=0x664400 "vol-replicate-0", fd=0x7f830dd8d024, cmd=7, lock=0x7fffb1ca1260)
    at ../../../../../xlators/protocol/client/src/client.c:1290
#14 0x00007f830f574701 in afr_lock_blocking (frame=0x7f8311526e10, this=0x665200, child_index=1) at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:1005
#15 0x00007f830f573932 in afr_lock_cbk (frame=0x7f8311526e10, cookie=0x0, this=0x665200, op_ret=0, op_errno=0)
    at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:756
#16 0x00007f830f5739a5 in afr_blocking_inodelk_cbk (frame=0x7f8311526e10, cookie=0x0, this=0x665200, op_ret=0, op_errno=0)
    at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:770
#17 0x00007f830f7b2458 in client3_1_finodelk_cbk (req=0x7f830e42d4ac, iov=0x7f830e42d4ec, count=1, myframe=0x7f83115268e8)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1101
#18 0x00007f831296a51d in rpc_clnt_handle_reply (clnt=0x66e3f0, pollin=0x7f8308006170) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:757
#19 0x00007f831296a87c in rpc_clnt_notify (trans=0x66e510, mydata=0x66e420, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f8308006170)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:870
#20 0x00007f8312967c75 in rpc_transport_notify (this=0x66e510, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f8308006170)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:1027
#21 0x00007f83103e7e44 in socket_event_poll_in (this=0x66e510) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1623
#22 0x00007f83103e81f7 in socket_event_handler (fd=7, idx=1, data=0x66e510, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1737
#23 0x00007f8312bbc333 in event_dispatch_epoll_handler (event_pool=0x659330, events=0x65d9a0, i=0) at ../../../libglusterfs/src/event.c:812
#24 0x00007f8312bbc543 in event_dispatch_epoll (event_pool=0x659330) at ../../../libglusterfs/src/event.c:876
#25 0x00007f8312bbc8ab in event_dispatch (event_pool=0x659330) at ../../../libglusterfs/src/event.c:984
#26 0x00000000004067e8 in main (argc=5, argv=0x7fffb1ca1998) at ../../../glusterfsd/src/glusterfsd.c:1442

Comment 3 Vijay Bellur 2011-03-22 08:20:24 UTC
PATCH: http://patches.gluster.com/patch/6400 in master (cluster/afr: skip openfd flush when the file is already deleted)

Comment 4 Vijay Bellur 2011-03-29 08:14:42 UTC
PATCH: http://patches.gluster.com/patch/6611 in master (features/marker: check for op_ret before doing any operations in lookup callback)

Comment 5 Anand Avati 2011-04-11 09:44:22 UTC
PATCH: http://patches.gluster.com/patch/6546 in release-3.1 (cluster/afr: skip openfd flush when the file is already deleted)

Comment 6 Raghavendra Bhat 2011-04-13 08:15:25 UTC
Tried with the steps mentioned by Pranith. client crashed for 3.1.3. With master it did not crash.

Comment 7 Pranith Kumar K 2011-07-27 06:40:54 UTC
*** Bug 3261 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.