Bug 765473 (GLUSTER-3741) - [glusterfs-3.2.5qa1] glusterfs client process crashed
Summary: [glusterfs-3.2.5qa1] glusterfs client process crashed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3741
Product: GlusterFS
Classification: Community
Component: rdma
Version: pre-release
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
: 797742 (view as bug list)
Depends On:
Blocks: 849125 895528
TreeView+ depends on / blocked
 
Reported: 2011-10-19 14:21 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:55 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 849125 (view as bug list)
Environment:
Last Closed: 2013-07-24 17:46:08 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
glusterfs client log (232.76 KB, text/x-log)
2011-10-19 11:21 UTC, M S Vishwanath Bhat
no flags Details

Description M S Vishwanath Bhat 2011-10-19 14:21:02 UTC
Created a volume with rdma transport type and ran sanity scripts. Client process crashed with following backtrace

#0  0x00007f4911d266ca in wb_sync_cbk (frame=0x7f491416a848, cookie=0x7f49143d51d4, this=0xcfcd20, op_ret=-1, op_errno=116, prebuf=0x0, postbuf=0x0) at write-behind.c:375
#1  0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d51d4, cookie=0x7f49143d5278, this=0xd08da0, op_ret=-1, op_errno=116, prebuf=0x0, postbuf=0x0) at dht-common.c:2670
#2  0x00007f491219e49c in client3_1_writev (frame=0x7f49143d5278, this=0xd07f00, data=0x7fff24c43f80) at client3_1-fops.c:3623
#3  0x00007f4912187f09 in client_writev (frame=0x7f49143d5278, this=0xd07f00, fd=0x7f490844b0d4, vector=0x26b5870, count=1, off=9891840, iobref=0x26b5920) at client.c:817
#4  0x00007f4911f5f0f9 in dht_writev (frame=0x7f49143d51d4, this=0xd08da0, fd=0x7f490844b0d4, vector=0x26b5870, count=1, off=9891840, iobref=0x26b5920) at dht-common.c:2706
#5  0x00007f4911d27128 in wb_sync (frame=0x7f491416a378, file=0x7f4905dadf10, winds=0x7fff24c442d0) at write-behind.c:548
#6  0x00007f4911d2d29e in wb_do_ops (frame=0x7f491416a378, file=0x7f4905dadf10, winds=0x7fff24c442d0, unwinds=0x7fff24c442c0, other_requests=0x7fff24c442b0) at write-behind.c:1859
#7  0x00007f4911d2db12 in wb_process_queue (frame=0x7f491416a378, file=0x7f4905dadf10) at write-behind.c:2048
#8  0x00007f4911d26859 in wb_sync_cbk (frame=0x7f491416a378, cookie=0x7f49143d508c, this=0xd0a010, op_ret=-1, op_errno=107, prebuf=0x7fff24c444d0, postbuf=0x7fff24c44460) at write-behind.c:405
#9  0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d508c, cookie=0x7f49143d5130, this=0xd08da0, op_ret=-1, op_errno=107, prebuf=0x7fff24c444d0, postbuf=0x7fff24c44460) at dht-common.c:2670
#10 0x00007f491219297f in client3_1_writev_cbk (req=0x7f4911085094, iov=0x0, count=0, myframe=0x7f49143d5130) at client3_1-fops.c:685
#11 0x00007f49150f78ea in rpc_clnt_submit (rpc=0xd1acd0, prog=0x7f49123b5260, procnum=13, cbkfn=0x7f4912192639 <client3_1_writev_cbk>, proghdr=0x7fff24c44820, proghdrcount=1, progpayload=0x26a5f30, progpayloadcount=1, iobref=0x26b57b0,
    frame=0x7f49143d5130, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1450
#12 0x00007f4912190148 in client_submit_vec_request (this=0xd07f00, req=0x7fff24c448c0, frame=0x7f49143d5130, prog=0x7f49123b5260, procnum=13, cbk=0x7f4912192639 <client3_1_writev_cbk>, payload=0x26a5f30, payloadcnt=1,
    iobref=0x26b5070, sfunc=0x7f4914ed8a6b <xdr_from_writev_req>) at client3_1-fops.c:95
#13 0x00007f491219e30f in client3_1_writev (frame=0x7f49143d5130, this=0xd07f00, data=0x7fff24c44980) at client3_1-fops.c:3613
#14 0x00007f4912187f09 in client_writev (frame=0x7f49143d5130, this=0xd07f00, fd=0x7f490844b0d4, vector=0x26a5f30, count=1, off=7335936, iobref=0x26b5070) at client.c:817
#15 0x00007f4911f5f0f9 in dht_writev (frame=0x7f49143d508c, this=0xd08da0, fd=0x7f490844b0d4, vector=0x26a5f30, count=1, off=7335936, iobref=0x26b5070) at dht-common.c:2706
#16 0x00007f4911d27128 in wb_sync (frame=0x7f491416a5e0, file=0x7f4905dadf10, winds=0x7fff24c44cd0) at write-behind.c:548
#17 0x00007f4911d2d29e in wb_do_ops (frame=0x7f491416a5e0, file=0x7f4905dadf10, winds=0x7fff24c44cd0, unwinds=0x7fff24c44cc0, other_requests=0x7fff24c44cb0) at write-behind.c:1859
#18 0x00007f4911d2db12 in wb_process_queue (frame=0x7f491416a5e0, file=0x7f4905dadf10) at write-behind.c:2048
#19 0x00007f4911d26859 in wb_sync_cbk (frame=0x7f491416a5e0, cookie=0x7f49143d4f44, this=0xd0a010, op_ret=-1, op_errno=107, prebuf=0x7fff24c44ed0, postbuf=0x7fff24c44e60) at write-behind.c:405
#20 0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d4f44, cookie=0x7f49143d4fe8, this=0xd08da0, op_ret=-1, op_errno=107, prebuf=0x7fff24c44ed0, postbuf=0x7fff24c44e60) at dht-common.c:2670
#21 0x00007f491219297f in client3_1_writev_cbk (req=0x7f4911084e50, iov=0x0, count=0, myframe=0x7f49143d4fe8) at client3_1-fops.c:685
#22 0x00007f49150f78ea in rpc_clnt_submit (rpc=0xd1acd0, prog=0x7f49123b5260, procnum=13, cbkfn=0x7f4912192639 <client3_1_writev_cbk>, proghdr=0x7fff24c45220, proghdrcount=1, progpayload=0x26a5680, progpayloadcount=1, iobref=0x26a5e70,
    frame=0x7f49143d4fe8, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1450
#23 0x00007f4912190148 in client_submit_vec_request (this=0xd07f00, req=0x7fff24c452c0, frame=0x7f49143d4fe8, prog=0x7f49123b5260, procnum=13, cbk=0x7f4912192639 <client3_1_writev_cbk>, payload=0x26a5680, payloadcnt=1,
    iobref=0x26a5730, sfunc=0x7f4914ed8a6b <xdr_from_writev_req>) at client3_1-fops.c:95
#24 0x00007f491219e30f in client3_1_writev (frame=0x7f49143d4fe8, this=0xd07f00, data=0x7fff24c45380) at client3_1-fops.c:3613
#25 0x00007f4912187f09 in client_writev (frame=0x7f49143d4fe8, this=0xd07f00, fd=0x7f490844b0d4, vector=0x26a5680, count=1, off=2670592, iobref=0x26a5730) at client.c:817
#26 0x00007f4911f5f0f9 in dht_writev (frame=0x7f49143d4f44, this=0xd08da0, fd=0x7f490844b0d4, vector=0x26a5680, count=1, off=2670592, iobref=0x26a5730) at dht-common.c:2706
#27 0x00007f4911d27128 in wb_sync (frame=0x7f4914169fdc, file=0x7f4905dadf10, winds=0x7fff24c456d0) at write-behind.c:548
#28 0x00007f4911d2d29e in wb_do_ops (frame=0x7f4914169fdc, file=0x7f4905dadf10, winds=0x7fff24c456d0, unwinds=0x7fff24c456c0, other_requests=0x7fff24c456b0) at write-behind.c:1859
#29 0x00007f4911d2db12 in wb_process_queue (frame=0x7f4914169fdc, file=0x7f4905dadf10) at write-behind.c:2048
#30 0x00007f4911d26859 in wb_sync_cbk (frame=0x7f4914169fdc, cookie=0x7f49143d4dfc, this=0xd0a010, op_ret=-1, op_errno=107, prebuf=0x7fff24c458d0, postbuf=0x7fff24c45860) at write-behind.c:405
#31 0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d4dfc, cookie=0x7f49143d4ea0, this=0xd08da0, op_ret=-1, op_errno=107, prebuf=0x7fff24c458d0, postbuf=0x7fff24c45860) at dht-common.c:2670
#32 0x00007f491219297f in client3_1_writev_cbk (req=0x7f4911084c0c, iov=0x0, count=0, myframe=0x7f49143d4ea0) at client3_1-fops.c:685
#33 0x00007f49150f78ea in rpc_clnt_submit (rpc=0xd1acd0, prog=0x7f49123b5260, procnum=13, cbkfn=0x7f4912192639 <client3_1_writev_cbk>, proghdr=0x7fff24c45c20, proghdrcount=1, progpayload=0x26a1d50, progpayloadcount=1, iobref=0x26a1f30,
    frame=0x7f49143d4ea0, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1450
#34 0x00007f4912190148 in client_submit_vec_request (this=0xd07f00, req=0x7fff24c45cc0, frame=0x7f49143d4ea0, prog=0x7f49123b5260, procnum=13, cbk=0x7f4912192639 <client3_1_writev_cbk>, payload=0x26a1d50, payloadcnt=1,
    iobref=0x26a1e00, sfunc=0x7f4914ed8a6b <xdr_from_writev_req>) at client3_1-fops.c:95
#35 0x00007f491219e30f in client3_1_writev (frame=0x7f49143d4ea0, this=0xd07f00, data=0x7fff24c45d80) at client3_1-fops.c:3613
#36 0x00007f4912187f09 in client_writev (frame=0x7f49143d4ea0, this=0xd07f00, fd=0x7f490844b0d4, vector=0x26a1d50, count=1, off=5947392, iobref=0x26a1e00) at client.c:817
#37 0x00007f4911f5f0f9 in dht_writev (frame=0x7f49143d4dfc, this=0xd08da0, fd=0x7f490844b0d4, vector=0x26a1d50, count=1, off=5947392, iobref=0x26a1e00) at dht-common.c:2706
#38 0x00007f4911d27128 in wb_sync (frame=0x7f4914169d74, file=0x7f4905dadf10, winds=0x7fff24c460d0) at write-behind.c:548
#39 0x00007f4911d2d29e in wb_do_ops (frame=0x7f4914169d74, file=0x7f4905dadf10, winds=0x7fff24c460d0, unwinds=0x7fff24c460c0, other_requests=0x7fff24c460b0) at write-behind.c:1859
#40 0x00007f4911d2db12 in wb_process_queue (frame=0x7f4914169d74, file=0x7f4905dadf10) at write-behind.c:2048
#41 0x00007f4911d26859 in wb_sync_cbk (frame=0x7f4914169d74, cookie=0x7f49143d4cb4, this=0xd0a010, op_ret=-1, op_errno=107, prebuf=0x7fff24c462d0, postbuf=0x7fff24c46260) at write-behind.c:405
#42 0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d4cb4, cookie=0x7f49143d4d58, this=0xd08da0, op_ret=-1, op_errno=107, prebuf=0x7fff24c462d0, postbuf=0x7fff24c46260) at dht-common.c:2670
#43 0x00007f491219297f in client3_1_writev_cbk (req=0x7f49110849c8, iov=0x0, count=0, myframe=0x7f49143d4d58) at client3_1-fops.c:685
#44 0x00007f49150f78ea in rpc_clnt_submit (rpc=0xd1acd0, prog=0x7f49123b5260, procnum=13, cbkfn=0x7f4912192639 <client3_1_writev_cbk>, proghdr=0x7fff24c46620, proghdrcount=1, progpayload=0x269ddc0, progpayloadcount=1, iobref=0x26a1c90,
    frame=0x7f49143d4d58, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1450
#45 0x00007f4912190148 in client_submit_vec_request (this=0xd07f00, req=0x7fff24c466c0, frame=0x7f49143d4d58, prog=0x7f49123b5260, procnum=13, cbk=0x7f4912192639 <client3_1_writev_cbk>, payload=0x269ddc0, payloadcnt=1,
    iobref=0x269de70, sfunc=0x7f4914ed8a6b <xdr_from_writev_req>) at client3_1-fops.c:95
#46 0x00007f491219e30f in client3_1_writev (frame=0x7f49143d4d58, this=0xd07f00, data=0x7fff24c46780) at client3_1-fops.c:3613
#47 0x00007f4912187f09 in client_writev (frame=0x7f49143d4d58, this=0xd07f00, fd=0x7f490844b0d4, vector=0x269ddc0, count=1, off=3321856, iobref=0x269de70) at client.c:817


There are more frames in core file.

I have attached the client log and archived the core file.

Comment 1 Raghavendra G 2011-10-24 04:30:25 UTC
Its a case of stack overflow. Since the transport is not connected, write frame is unwound from protocol/client and there by resulting in an indirect recursion of wb_process_queue. Also, the number of write requests is large enough to cause this stack to overflow.

Comment 2 Amar Tumballi 2012-10-11 10:03:10 UTC
http://review.gluster.org/3874 should help to fix this... need a rebase..

Comment 3 Amar Tumballi 2012-10-11 10:10:08 UTC
*** Bug 797742 has been marked as a duplicate of this bug. ***

Comment 4 Vijay Bellur 2013-02-18 07:33:52 UTC
CHANGE: http://review.gluster.org/4515 (performance/write-behind: mark fd bad if any written behind writes fail.) merged in master by Anand Avati (avati)

Comment 5 Raghavendra G 2013-02-22 07:28:37 UTC
*** Bug 890472 has been marked as a duplicate of this bug. ***

Comment 6 Vijay Bellur 2013-02-22 20:12:45 UTC
CHANGE: http://review.gluster.org/4559 (tests: move common funtion definitions to include.rc) merged in master by Anand Avati (avati)

Comment 7 Vijay Bellur 2013-02-28 23:22:14 UTC
CHANGE: http://review.gluster.org/4560 (performance/write-behind: Add test case for fd being marked bad                           after write failures.) merged in master by Anand Avati (avati)

Comment 8 Vijay Bellur 2013-03-07 05:13:23 UTC
CHANGE: http://review.gluster.org/4631 (tests: move common funtion definitions to include.rc) merged in release-3.4 by Anand Avati (avati)

Comment 9 Vijay Bellur 2013-03-07 05:18:15 UTC
CHANGE: http://review.gluster.org/4630 (performance/write-behind: mark fd bad if any written behind writes fail) merged in release-3.4 by Anand Avati (avati)

Comment 10 Vijay Bellur 2013-03-07 06:37:37 UTC
CHANGE: http://review.gluster.org/4632 (performance/write-behind: Add test case for fd being marked bad after write failures.) merged in release-3.4 by Vijay Bellur (vbellur)


Note You need to log in before you can comment on or make changes to this bug.