Bug 849125

Summary: [glusterfs-3.2.5qa1] glusterfs client process crashed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vidya Sakar <vinaraya>
Component: rdmaAssignee: Raghavendra G <rgowdapp>
Status: CLOSED WORKSFORME QA Contact: Ujjwala <ujjwala>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: aavati, amarts, gluster-bugs, rfortier, sdharane, vbhat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: GLUSTER-3741 Environment:
Last Closed: 2012-10-17 12:08:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 765473    
Bug Blocks:    

Description Vidya Sakar 2012-08-17 11:45:51 UTC
+++ This bug was initially created as a clone of Bug #765473 +++

Created a volume with rdma transport type and ran sanity scripts. Client process crashed with following backtrace

#0  0x00007f4911d266ca in wb_sync_cbk (frame=0x7f491416a848, cookie=0x7f49143d51d4, this=0xcfcd20, op_ret=-1, op_errno=116, prebuf=0x0, postbuf=0x0) at write-behind.c:375
#1  0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d51d4, cookie=0x7f49143d5278, this=0xd08da0, op_ret=-1, op_errno=116, prebuf=0x0, postbuf=0x0) at dht-common.c:2670
#2  0x00007f491219e49c in client3_1_writev (frame=0x7f49143d5278, this=0xd07f00, data=0x7fff24c43f80) at client3_1-fops.c:3623
#3  0x00007f4912187f09 in client_writev (frame=0x7f49143d5278, this=0xd07f00, fd=0x7f490844b0d4, vector=0x26b5870, count=1, off=9891840, iobref=0x26b5920) at client.c:817
#4  0x00007f4911f5f0f9 in dht_writev (frame=0x7f49143d51d4, this=0xd08da0, fd=0x7f490844b0d4, vector=0x26b5870, count=1, off=9891840, iobref=0x26b5920) at dht-common.c:2706
#5  0x00007f4911d27128 in wb_sync (frame=0x7f491416a378, file=0x7f4905dadf10, winds=0x7fff24c442d0) at write-behind.c:548
#6  0x00007f4911d2d29e in wb_do_ops (frame=0x7f491416a378, file=0x7f4905dadf10, winds=0x7fff24c442d0, unwinds=0x7fff24c442c0, other_requests=0x7fff24c442b0) at write-behind.c:1859
#7  0x00007f4911d2db12 in wb_process_queue (frame=0x7f491416a378, file=0x7f4905dadf10) at write-behind.c:2048
#8  0x00007f4911d26859 in wb_sync_cbk (frame=0x7f491416a378, cookie=0x7f49143d508c, this=0xd0a010, op_ret=-1, op_errno=107, prebuf=0x7fff24c444d0, postbuf=0x7fff24c44460) at write-behind.c:405
#9  0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d508c, cookie=0x7f49143d5130, this=0xd08da0, op_ret=-1, op_errno=107, prebuf=0x7fff24c444d0, postbuf=0x7fff24c44460) at dht-common.c:2670
#10 0x00007f491219297f in client3_1_writev_cbk (req=0x7f4911085094, iov=0x0, count=0, myframe=0x7f49143d5130) at client3_1-fops.c:685
#11 0x00007f49150f78ea in rpc_clnt_submit (rpc=0xd1acd0, prog=0x7f49123b5260, procnum=13, cbkfn=0x7f4912192639 <client3_1_writev_cbk>, proghdr=0x7fff24c44820, proghdrcount=1, progpayload=0x26a5f30, progpayloadcount=1, iobref=0x26b57b0,
    frame=0x7f49143d5130, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1450
#12 0x00007f4912190148 in client_submit_vec_request (this=0xd07f00, req=0x7fff24c448c0, frame=0x7f49143d5130, prog=0x7f49123b5260, procnum=13, cbk=0x7f4912192639 <client3_1_writev_cbk>, payload=0x26a5f30, payloadcnt=1,
    iobref=0x26b5070, sfunc=0x7f4914ed8a6b <xdr_from_writev_req>) at client3_1-fops.c:95
#13 0x00007f491219e30f in client3_1_writev (frame=0x7f49143d5130, this=0xd07f00, data=0x7fff24c44980) at client3_1-fops.c:3613
#14 0x00007f4912187f09 in client_writev (frame=0x7f49143d5130, this=0xd07f00, fd=0x7f490844b0d4, vector=0x26a5f30, count=1, off=7335936, iobref=0x26b5070) at client.c:817
#15 0x00007f4911f5f0f9 in dht_writev (frame=0x7f49143d508c, this=0xd08da0, fd=0x7f490844b0d4, vector=0x26a5f30, count=1, off=7335936, iobref=0x26b5070) at dht-common.c:2706
#16 0x00007f4911d27128 in wb_sync (frame=0x7f491416a5e0, file=0x7f4905dadf10, winds=0x7fff24c44cd0) at write-behind.c:548
#17 0x00007f4911d2d29e in wb_do_ops (frame=0x7f491416a5e0, file=0x7f4905dadf10, winds=0x7fff24c44cd0, unwinds=0x7fff24c44cc0, other_requests=0x7fff24c44cb0) at write-behind.c:1859
#18 0x00007f4911d2db12 in wb_process_queue (frame=0x7f491416a5e0, file=0x7f4905dadf10) at write-behind.c:2048
#19 0x00007f4911d26859 in wb_sync_cbk (frame=0x7f491416a5e0, cookie=0x7f49143d4f44, this=0xd0a010, op_ret=-1, op_errno=107, prebuf=0x7fff24c44ed0, postbuf=0x7fff24c44e60) at write-behind.c:405
#20 0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d4f44, cookie=0x7f49143d4fe8, this=0xd08da0, op_ret=-1, op_errno=107, prebuf=0x7fff24c44ed0, postbuf=0x7fff24c44e60) at dht-common.c:2670
#21 0x00007f491219297f in client3_1_writev_cbk (req=0x7f4911084e50, iov=0x0, count=0, myframe=0x7f49143d4fe8) at client3_1-fops.c:685
#22 0x00007f49150f78ea in rpc_clnt_submit (rpc=0xd1acd0, prog=0x7f49123b5260, procnum=13, cbkfn=0x7f4912192639 <client3_1_writev_cbk>, proghdr=0x7fff24c45220, proghdrcount=1, progpayload=0x26a5680, progpayloadcount=1, iobref=0x26a5e70,
    frame=0x7f49143d4fe8, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1450
#23 0x00007f4912190148 in client_submit_vec_request (this=0xd07f00, req=0x7fff24c452c0, frame=0x7f49143d4fe8, prog=0x7f49123b5260, procnum=13, cbk=0x7f4912192639 <client3_1_writev_cbk>, payload=0x26a5680, payloadcnt=1,
    iobref=0x26a5730, sfunc=0x7f4914ed8a6b <xdr_from_writev_req>) at client3_1-fops.c:95
#24 0x00007f491219e30f in client3_1_writev (frame=0x7f49143d4fe8, this=0xd07f00, data=0x7fff24c45380) at client3_1-fops.c:3613
#25 0x00007f4912187f09 in client_writev (frame=0x7f49143d4fe8, this=0xd07f00, fd=0x7f490844b0d4, vector=0x26a5680, count=1, off=2670592, iobref=0x26a5730) at client.c:817
#26 0x00007f4911f5f0f9 in dht_writev (frame=0x7f49143d4f44, this=0xd08da0, fd=0x7f490844b0d4, vector=0x26a5680, count=1, off=2670592, iobref=0x26a5730) at dht-common.c:2706
#27 0x00007f4911d27128 in wb_sync (frame=0x7f4914169fdc, file=0x7f4905dadf10, winds=0x7fff24c456d0) at write-behind.c:548
#28 0x00007f4911d2d29e in wb_do_ops (frame=0x7f4914169fdc, file=0x7f4905dadf10, winds=0x7fff24c456d0, unwinds=0x7fff24c456c0, other_requests=0x7fff24c456b0) at write-behind.c:1859
#29 0x00007f4911d2db12 in wb_process_queue (frame=0x7f4914169fdc, file=0x7f4905dadf10) at write-behind.c:2048
#30 0x00007f4911d26859 in wb_sync_cbk (frame=0x7f4914169fdc, cookie=0x7f49143d4dfc, this=0xd0a010, op_ret=-1, op_errno=107, prebuf=0x7fff24c458d0, postbuf=0x7fff24c45860) at write-behind.c:405
#31 0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d4dfc, cookie=0x7f49143d4ea0, this=0xd08da0, op_ret=-1, op_errno=107, prebuf=0x7fff24c458d0, postbuf=0x7fff24c45860) at dht-common.c:2670
#32 0x00007f491219297f in client3_1_writev_cbk (req=0x7f4911084c0c, iov=0x0, count=0, myframe=0x7f49143d4ea0) at client3_1-fops.c:685
#33 0x00007f49150f78ea in rpc_clnt_submit (rpc=0xd1acd0, prog=0x7f49123b5260, procnum=13, cbkfn=0x7f4912192639 <client3_1_writev_cbk>, proghdr=0x7fff24c45c20, proghdrcount=1, progpayload=0x26a1d50, progpayloadcount=1, iobref=0x26a1f30,
    frame=0x7f49143d4ea0, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1450
#34 0x00007f4912190148 in client_submit_vec_request (this=0xd07f00, req=0x7fff24c45cc0, frame=0x7f49143d4ea0, prog=0x7f49123b5260, procnum=13, cbk=0x7f4912192639 <client3_1_writev_cbk>, payload=0x26a1d50, payloadcnt=1,
    iobref=0x26a1e00, sfunc=0x7f4914ed8a6b <xdr_from_writev_req>) at client3_1-fops.c:95
#35 0x00007f491219e30f in client3_1_writev (frame=0x7f49143d4ea0, this=0xd07f00, data=0x7fff24c45d80) at client3_1-fops.c:3613
#36 0x00007f4912187f09 in client_writev (frame=0x7f49143d4ea0, this=0xd07f00, fd=0x7f490844b0d4, vector=0x26a1d50, count=1, off=5947392, iobref=0x26a1e00) at client.c:817
#37 0x00007f4911f5f0f9 in dht_writev (frame=0x7f49143d4dfc, this=0xd08da0, fd=0x7f490844b0d4, vector=0x26a1d50, count=1, off=5947392, iobref=0x26a1e00) at dht-common.c:2706
#38 0x00007f4911d27128 in wb_sync (frame=0x7f4914169d74, file=0x7f4905dadf10, winds=0x7fff24c460d0) at write-behind.c:548
#39 0x00007f4911d2d29e in wb_do_ops (frame=0x7f4914169d74, file=0x7f4905dadf10, winds=0x7fff24c460d0, unwinds=0x7fff24c460c0, other_requests=0x7fff24c460b0) at write-behind.c:1859
#40 0x00007f4911d2db12 in wb_process_queue (frame=0x7f4914169d74, file=0x7f4905dadf10) at write-behind.c:2048
#41 0x00007f4911d26859 in wb_sync_cbk (frame=0x7f4914169d74, cookie=0x7f49143d4cb4, this=0xd0a010, op_ret=-1, op_errno=107, prebuf=0x7fff24c462d0, postbuf=0x7fff24c46260) at write-behind.c:405
#42 0x00007f4911f5ec4c in dht_writev_cbk (frame=0x7f49143d4cb4, cookie=0x7f49143d4d58, this=0xd08da0, op_ret=-1, op_errno=107, prebuf=0x7fff24c462d0, postbuf=0x7fff24c46260) at dht-common.c:2670
#43 0x00007f491219297f in client3_1_writev_cbk (req=0x7f49110849c8, iov=0x0, count=0, myframe=0x7f49143d4d58) at client3_1-fops.c:685
#44 0x00007f49150f78ea in rpc_clnt_submit (rpc=0xd1acd0, prog=0x7f49123b5260, procnum=13, cbkfn=0x7f4912192639 <client3_1_writev_cbk>, proghdr=0x7fff24c46620, proghdrcount=1, progpayload=0x269ddc0, progpayloadcount=1, iobref=0x26a1c90,
    frame=0x7f49143d4d58, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1450
#45 0x00007f4912190148 in client_submit_vec_request (this=0xd07f00, req=0x7fff24c466c0, frame=0x7f49143d4d58, prog=0x7f49123b5260, procnum=13, cbk=0x7f4912192639 <client3_1_writev_cbk>, payload=0x269ddc0, payloadcnt=1,
    iobref=0x269de70, sfunc=0x7f4914ed8a6b <xdr_from_writev_req>) at client3_1-fops.c:95
#46 0x00007f491219e30f in client3_1_writev (frame=0x7f49143d4d58, this=0xd07f00, data=0x7fff24c46780) at client3_1-fops.c:3613
#47 0x00007f4912187f09 in client_writev (frame=0x7f49143d4d58, this=0xd07f00, fd=0x7f490844b0d4, vector=0x269ddc0, count=1, off=3321856, iobref=0x269de70) at client.c:817


There are more frames in core file.

I have attached the client log and archived the core file.

--- Additional comment from raghavendra on 2011-10-24 00:30:25 EDT ---

Its a case of stack overflow. Since the transport is not connected, write frame is unwound from protocol/client and there by resulting in an indirect recursion of wb_process_queue. Also, the number of write requests is large enough to cause this stack to overflow.

Comment 2 Amar Tumballi 2012-08-23 06:45:01 UTC
This bug is not seen in current master branch (which will get branched as RHS 2.1.0 soon). To consider it for fixing, want to make sure this bug still exists in RHS servers. If not reproduced, would like to close this.

Comment 3 Vidya Sakar 2012-10-17 12:08:54 UTC
Closing as works for me, reopen if the issue is seen in RHS.