Bug 763665 (GLUSTER-1933)

Summary: Segfault while expansion of volume from distributed mirror
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: coreAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: urgent    
Version: 3.1.0CC: anush, cww, gluster-bugs, jacob, lakshmipathi, rabhat, vijay, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harshavardhana 2010-10-12 19:14:37 UTC
(gdb) bt
#0  0x00007fd7ade49659 in io_stats_writev_cbk (frame=0x7fd7afc3f8e8, cookie=0x7fd7afc3f96c, 
    this=0x692128, op_ret=-1, op_errno=11, prebuf=0x7fd79c02dab0, postbuf=0x7fd79c02db20)
    at io-stats.c:521
#1  0x00007fd7ae05cee0 in qr_writev_cbk (frame=0x7fd7afc3f96c, cookie=0x7fd7afc3ff9c, this=0x690f48, 
    op_ret=-1, op_errno=11, prebuf=0x7fd79c02dab0, postbuf=0x7fd79c02db20) at quick-read.c:1163
#2  0x00007fd7ae26ff2f in ioc_writev_cbk (frame=0x7fd7afc3ff9c, cookie=0x7fd7afc40c80, 
    this=0x68fe08, op_ret=-1, op_errno=11, prebuf=0x7fd79c02dab0, postbuf=0x7fd79c02db20)
    at io-cache.c:1186
#3  0x00007fd7ae47ec42 in ra_writev_cbk (frame=0x7fd7afc40c80, cookie=0x7fd7afc428dc, this=0x68ebe8, 
    op_ret=-1, op_errno=11, prebuf=0x7fd79c02dab0, postbuf=0x7fd79c02db20) at read-ahead.c:628
#4  0x00007fd7ae68d09b in wb_writev_cbk (frame=0x7fd7afc428dc, cookie=0x7fd7afc42960, this=0x68d9c8, 
    op_ret=-1, op_errno=11, prebuf=0x7fd79c02dab0, postbuf=0x7fd79c02db20) at write-behind.c:1939
#5  0x00007fd7ae8b5ca6 in dht_writev_cbk (frame=0x7fd7afc42960, cookie=0x7fd7afc3d3c8, 
    this=0x68c7b8, op_ret=-1, op_errno=11, prebuf=0x7fd79c02dab0, postbuf=0x7fd79c02db20)
    at dht-common.c:2251
#6  0x00007fd7aeaeb2d5 in afr_writev_unwind (frame=0x7fd7afa13598, this=0x68aa88)
    at afr-inode-write.c:72
#7  0x00007fd7aeaeba9c in afr_writev_done (frame=0x7fd7afa13598, this=0x68aa88)
    at afr-inode-write.c:185
#8  0x00007fd7aeaf8797 in afr_post_blocking_inodelk_cbk (frame=0x7fd7afa13598, this=0x68aa88)
    at afr-transaction.c:907
#9  0x00007fd7aeb141c4 in afr_unlock_inodelk (frame=0x7fd7afa13598, this=0x68aa88)
    at afr-lk-common.c:601
#10 0x00007fd7aeb18a61 in afr_unlock (frame=0x7fd7afa13598, this=0x68aa88) at afr-lk-common.c:1666
#11 0x00007fd7aeb1583d in afr_lock_blocking (frame=0x7fd7afa13598, this=0x68aa88, child_index=2)
    at afr-lk-common.c:973
#12 0x00007fd7aeb14dbc in afr_lock_cbk (frame=0x7fd7afa13598, cookie=0x1, this=0x68aa88, op_ret=-1, 
    op_errno=107) at afr-lk-common.c:756
#13 0x00007fd7aeb14e2f in afr_blocking_inodelk_cbk (frame=0x7fd7afa13598, cookie=0x1, this=0x68aa88, 
    op_ret=-1, op_errno=107) at afr-lk-common.c:770
#14 0x00007fd7aed4eff0 in client3_1_finodelk_cbk (req=0x7fd7a91fb6e4, iov=0x7fd7af96cde0, count=1, 
    myframe=0x7fd7afc3f444) at client3_1-fops.c:1084
#15 0x00007fd7b1062bea in saved_frames_unwind (saved_frames=0x6a1638) at rpc-clnt.c:342
#16 0x00007fd7b1062ca2 in saved_frames_destroy (frames=0x6a1638) at rpc-clnt.c:358
#17 0x00007fd7b1063207 in rpc_clnt_connection_cleanup (conn=0x6a22a8) at rpc-clnt.c:506
#18 0x00007fd7b1065657 in rpc_clnt_destroy (rpc=0x6a2278) at rpc-clnt.c:1479
---Type <return> to continue, or q <return> to quit---
#19 0x00007fd7b106573d in rpc_clnt_unref (rpc=0x6a2278) at rpc-clnt.c:1507
#20 0x00007fd7aed490a9 in fini (this=0x687f68) at client.c:2036
#21 0x000000000040440f in cleanup_and_exit ()
#22 0x000000000040598f in glusterfs_sigwaiter ()
#23 0x00007fd7b0a1f85a in start_thread () from /lib64/libpthread.so.0
#24 0x00007fd7b078922d in clone () from /lib64/libc.so.6
#25 0x0000000000000000 in ?? ()

(gdb) p *frame
$2 = {root = 0x7fd7afa10c18, parent = 0x7fd7afa10ca0, next = 0x0, prev = 0x7fd7afc3f96c, 
  local = 0x0, this = 0x692128, ret = 0x7fd7adc005d6 <nfs_fop_writev_cbk>, ref_count = 0, lock = 1, 
  cookie = 0x692128, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, 
  end = {tv_sec = 0, tv_usec = 0}}
(gdb)

Comment 1 Amar Tumballi 2010-10-13 02:31:16 UTC
This happened in 'cleanup_and_exit()' path, ie, a SIGTERM was issued to the process from glusterd anyways. Hence its not considered a blocker at the moment. There are many 'cleanup_and_exit()' path crashes which are known.. we are taking it as part of 3.1.1 release. No need to panic.

Comment 2 Harshavardhana 2010-10-13 02:50:36 UTC
(In reply to comment #1)
> This happened in 'cleanup_and_exit()' path, ie, a SIGTERM was issued to the
> process from glusterd anyways. Hence its not considered a blocker at the
> moment. There are many 'cleanup_and_exit()' path crashes which are known.. we
> are taking it as part of 3.1.1 release. No need to panic.

Is that a rare case? in this scenario when we encountered Platform node was completely hosed resulting in access to mount point hung from nfs client.

Comment 3 Junaid 2010-10-18 10:40:14 UTC
*** Bug 1908 has been marked as a duplicate of this bug. ***

Comment 4 Junaid 2010-10-18 10:41:26 UTC
*** Bug 1892 has been marked as a duplicate of this bug. ***

Comment 5 Anand Avati 2010-10-26 07:01:59 UTC
PATCH: http://patches.gluster.com/patch/5501 in master (io-stats: handle the case of 'cleanup_and_exit()' properly)

Comment 6 Lakshmipathi G 2010-11-20 06:38:01 UTC
verified with 3.1.1qa9