Bug 1341452 - gluster coredump when multiple CLI command called at unstable network
Summary: gluster coredump when multiple CLI command called at unstable network
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.6.9
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-01 05:52 UTC by George
Modified: 2016-08-01 04:43 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-08-01 04:42:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description George 2016-06-01 05:52:47 UTC
Description of problem:

glusterd failed with coredump 
Version-Release number of selected component (if applicable):


How reproducible:

run gluster volume heal vol_name info during the network is not stable
Steps to Reproduce:
1.repeat run CLI command "gluster volume heal vol_name info"
2.let network unstable during the replicated VMs.
3.sometimes glsuter failed with coredump

Actual results:


Expected results:
glusterd exit with failed and cordump output.
the backtrace of coredump list as the below:

#0  0x00007f5c9dd38177 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f5c9b16d700 (LWP 5032))]
(gdb) bt
#0  0x00007f5c9dd38177 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f5c9dd395fa in __GI_abort () at abort.c:89
#2  0x00007f5c9dd3115d in __assert_fail_base (fmt=0x7f5c9de68768 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f5c9a7397c8 "iov",
    file=file@entry=0x7f5c9a73970c "glusterd-syncop.c", line=line@entry=417, function=function@entry=0x7f5c9a73a080 "gd_syncop_mgmt_v3_unlock_cbk_fn")
    at assert.c:92
#3  0x00007f5c9dd31212 in __GI___assert_fail (assertion=0x7f5c9a7397c8 "iov", file=0x7f5c9a73970c "glusterd-syncop.c", line=417,
    function=0x7f5c9a73a080 "gd_syncop_mgmt_v3_unlock_cbk_fn") at assert.c:101
#4  0x00007f5c9a6e3a22 in gd_syncop_mgmt_v3_unlock_cbk_fn () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#5  0x00007f5c9a68c3a8 in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#6  0x00007f5c9a6e3b8d in gd_syncop_mgmt_v3_unlock_cbk () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#7  0x00007f5c9eb498ed in rpc_clnt_submit () from /usr/lib64/libgfrpc.so.0
#8  0x00007f5c9a6e331a in gd_syncop_submit_request () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#9  0x00007f5c9a6e3cfc in gd_syncop_mgmt_v3_unlock () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#10 0x00007f5c9a6e618f in gd_unlock_op_phase () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#11 0x00007f5c9a6e6dd6 in gd_sync_task_begin () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#12 0x00007f5c9a6e6f47 in glusterd_op_begin_synctask () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#13 0x00007f5c9a646ede in __glusterd_handle_set_volume () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#14 0x00007f5c9a6424ec in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#15 0x00007f5c9a646fc9 in glusterd_handle_set_volume () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#16 0x00007f5c9edac062 in synctask_wrap () from /usr/lib64/libglusterfs.so.0
#17 0x00007f5c9dd48ee0 in ?? () from /lib64/libc.so.6
#18 0x0000000000000000 in ?? ()



Additional info:
After do some investigation with the error log and backtrace of coredump, the root cause seems clear as the below FYI:

1) found there a warning log with :
[2016-05-31 04:30:01.400773] W [rpc-clnt.c:1562:rpc_clnt_submit] 0-management: failed to submit rpc-request (XID: 0x9 Program: glusterd mgmt v3, ProgVers: 3,        Proc: 6) to rpc-transport (management)

2) when rpc_clnt_submit failed with some reason, the function will enter the below line (file  rpc-clnt.c)
01597                         cbkfn (rpcreq, NULL, 0, frame);

3) the cbkfun is gd_syncop_mgmt_v3_unlock_cbk_fn
and in source code file "glusterd_syncop.c" line
0417         GF_ASSERT(iov);

4) iov is the second parameter which called by cbkfn with NULL, so coredump happen.

Comment 1 Atin Mukherjee 2016-08-01 04:42:27 UTC
This is not a security bug, not going to fix this in 3.6.x because of
http://www.gluster.org/pipermail/gluster-users/2016-July/027682.html

Comment 2 Atin Mukherjee 2016-08-01 04:43:47 UTC
If the issue persists in the latest releases, please feel free to clone them


Note You need to log in before you can comment on or make changes to this bug.