Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1341452

Summary: gluster coredump when multiple CLI command called at unstable network
Product: [Community] GlusterFS Reporter: George <george.lian>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.9CC: bugs, george.lian
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-01 04:42:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description George 2016-06-01 05:52:47 UTC
Description of problem:

glusterd failed with coredump 
Version-Release number of selected component (if applicable):


How reproducible:

run gluster volume heal vol_name info during the network is not stable
Steps to Reproduce:
1.repeat run CLI command "gluster volume heal vol_name info"
2.let network unstable during the replicated VMs.
3.sometimes glsuter failed with coredump

Actual results:


Expected results:
glusterd exit with failed and cordump output.
the backtrace of coredump list as the below:

#0  0x00007f5c9dd38177 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f5c9b16d700 (LWP 5032))]
(gdb) bt
#0  0x00007f5c9dd38177 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f5c9dd395fa in __GI_abort () at abort.c:89
#2  0x00007f5c9dd3115d in __assert_fail_base (fmt=0x7f5c9de68768 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f5c9a7397c8 "iov",
    file=file@entry=0x7f5c9a73970c "glusterd-syncop.c", line=line@entry=417, function=function@entry=0x7f5c9a73a080 "gd_syncop_mgmt_v3_unlock_cbk_fn")
    at assert.c:92
#3  0x00007f5c9dd31212 in __GI___assert_fail (assertion=0x7f5c9a7397c8 "iov", file=0x7f5c9a73970c "glusterd-syncop.c", line=417,
    function=0x7f5c9a73a080 "gd_syncop_mgmt_v3_unlock_cbk_fn") at assert.c:101
#4  0x00007f5c9a6e3a22 in gd_syncop_mgmt_v3_unlock_cbk_fn () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#5  0x00007f5c9a68c3a8 in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#6  0x00007f5c9a6e3b8d in gd_syncop_mgmt_v3_unlock_cbk () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#7  0x00007f5c9eb498ed in rpc_clnt_submit () from /usr/lib64/libgfrpc.so.0
#8  0x00007f5c9a6e331a in gd_syncop_submit_request () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#9  0x00007f5c9a6e3cfc in gd_syncop_mgmt_v3_unlock () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#10 0x00007f5c9a6e618f in gd_unlock_op_phase () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#11 0x00007f5c9a6e6dd6 in gd_sync_task_begin () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#12 0x00007f5c9a6e6f47 in glusterd_op_begin_synctask () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#13 0x00007f5c9a646ede in __glusterd_handle_set_volume () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#14 0x00007f5c9a6424ec in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#15 0x00007f5c9a646fc9 in glusterd_handle_set_volume () from /usr/lib64/glusterfs/3.6.9/xlator/mgmt/glusterd.so
#16 0x00007f5c9edac062 in synctask_wrap () from /usr/lib64/libglusterfs.so.0
#17 0x00007f5c9dd48ee0 in ?? () from /lib64/libc.so.6
#18 0x0000000000000000 in ?? ()



Additional info:
After do some investigation with the error log and backtrace of coredump, the root cause seems clear as the below FYI:

1) found there a warning log with :
[2016-05-31 04:30:01.400773] W [rpc-clnt.c:1562:rpc_clnt_submit] 0-management: failed to submit rpc-request (XID: 0x9 Program: glusterd mgmt v3, ProgVers: 3,        Proc: 6) to rpc-transport (management)

2) when rpc_clnt_submit failed with some reason, the function will enter the below line (file  rpc-clnt.c)
01597                         cbkfn (rpcreq, NULL, 0, frame);

3) the cbkfun is gd_syncop_mgmt_v3_unlock_cbk_fn
and in source code file "glusterd_syncop.c" line
0417         GF_ASSERT(iov);

4) iov is the second parameter which called by cbkfn with NULL, so coredump happen.

Comment 1 Atin Mukherjee 2016-08-01 04:42:27 UTC
This is not a security bug, not going to fix this in 3.6.x because of
http://www.gluster.org/pipermail/gluster-users/2016-July/027682.html

Comment 2 Atin Mukherjee 2016-08-01 04:43:47 UTC
If the issue persists in the latest releases, please feel free to clone them