Bug 783913

Summary: [glusterfs-3.2.6qa1]: glusterd crashed when parallel operations were executed
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: glusterdAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 815038 (view as bug list) Environment:
Last Closed: 2013-07-24 18:04:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: glusterfs-3.2.6qa2 Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 811632, 815038, 817967    

Description Raghavendra Bhat 2012-01-23 08:40:27 UTC
Description of problem:
On a glusterfs peer, volume set commands were being executed in a loop. On the same machine, executed a command to remove the quota limit set on a directory, and then again tried to set a limit on the same directory. glusterd crashed with the below backtrace.

Core was generated by `glusterd'.
Program terminated with signal 6, Aborted.
#0  0x00000034c2232905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x00000034c2232905 in raise () from /lib64/libc.so.6
#1  0x00000034c22340e5 in abort () from /lib64/libc.so.6
#2  0x00000034c222b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x00000034c222ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fc2e1d04042 in glusterd_op_send_cli_response (op=GD_OP_SET_VOLUME, op_ret=16, op_errno=0, req=0x7fc2e1bfe560, op_ctx=0x0, 
    op_errstr=0x7fc2e1d1ed17 "operation failed") at ../../../../../xlators/mgmt/glusterd/src/glusterd-rpc-ops.c:61
#5  0x00007fc2e1cd513c in glusterd_handle_set_volume (req=0x7fc2e1bfe560) at ../../../../../xlators/mgmt/glusterd/src/glusterd-handler.c:1975
#6  0x00007fc2e3982890 in rpcsvc_handle_rpc_call (svc=0x6ada90, trans=0x6b2c40, msg=0x6b5dc0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:480
#7  0x00007fc2e3982c33 in rpcsvc_notify (trans=0x6b2c40, mydata=0x6ada90, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x6b5dc0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:576
#8  0x00007fc2e39886ac in rpc_transport_notify (this=0x6b2c40, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x6b5dc0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:919
#9  0x00007fc2e19f2af9 in socket_event_poll_in (this=0x6b2c40) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1647
#10 0x00007fc2e19f307d in socket_event_handler (fd=15, idx=9, data=0x6b2c40, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1762
#11 0x00007fc2e3be57e8 in event_dispatch_epoll_handler (event_pool=0x6a63f0, events=0x6b0cb0, i=0) at ../../../libglusterfs/src/event.c:794
#12 0x00007fc2e3be5a0b in event_dispatch_epoll (event_pool=0x6a63f0) at ../../../libglusterfs/src/event.c:856
#13 0x00007fc2e3be5d96 in event_dispatch (event_pool=0x6a63f0) at ../../../libglusterfs/src/event.c:956
#14 0x000000000040700c in main (argc=1, argv=0x7fff0dabbf18) at ../../../glusterfsd/src/glusterfsd.c:1509
(gdb) f 4
#4  0x00007fc2e1d04042 in glusterd_op_send_cli_response (op=GD_OP_SET_VOLUME, op_ret=16, op_errno=0, req=0x7fc2e1bfe560, op_ctx=0x0, 
    op_errstr=0x7fc2e1d1ed17 "operation failed") at ../../../../../xlators/mgmt/glusterd/src/glusterd-rpc-ops.c:61
61              GF_ASSERT (op_ctx);
(gdb) p op_ctx
$1 = (void *) 0x0
(gdb) l
56              void            *cli_rsp = NULL;
57              dict_t          *ctx = NULL;
58              char            *free_ptr = NULL;
59              glusterd_conf_t *conf = NULL;
60
61              GF_ASSERT (op_ctx);
62              GF_ASSERT (THIS);
63
64              conf = THIS->private;
65
(gdb) up
#5  0x00007fc2e1cd513c in glusterd_handle_set_volume (req=0x7fc2e1bfe560) at ../../../../../xlators/mgmt/glusterd/src/glusterd-handler.c:1975
1975                    ret = glusterd_op_send_cli_response (cli_op, ret, 0, req,
(gdb) l
1970            glusterd_op_sm ();
1971
1972            if (ret) {
1973                    if (dict)
1974                            dict_unref (dict);
1975                    ret = glusterd_op_send_cli_response (cli_op, ret, 0, req,
1976                                                         NULL, "operation failed");
1977                    if (!lock_fail)
1978                            (void) glusterd_opinfo_unlock ();
1979            }
(gdb) l glusterd_handle_set_volume
1885            return ret;
1886    }
1887
1888    int
1889    glusterd_handle_set_volume (rpcsvc_request_t *req)
1890    {
1891            int32_t                         ret = -1;
1892            gf1_cli_set_vol_req             cli_req = {0,};
1893            dict_t                          *dict = NULL;
1894            int                             lock_fail = 0;
(gdb) 
1895            glusterd_op_t                   cli_op = GD_OP_SET_VOLUME;
1896            char                            *key = NULL;
1897            char                            *value = NULL;
1898            char                            *volname = NULL;
1899
1900            GF_ASSERT (req);
1901
1902            ret = glusterd_op_set_cli_op (cli_op);
1903            if (ret) {
1904                    gf_log ("", GF_LOG_ERROR, "Unable to set cli op: %d",
(gdb) l
1905                            ret);
1906                    lock_fail = 1;
1907                    goto out;
1908            }
1909
1910            ret = -1;
1911            if (!gf_xdr_to_cli_set_vol_req (req->msg[0], &cli_req)) {
1912                    //failed to decode msg;
1913                    req->rpc_err = GARBAGE_ARGS;
1914                    goto out;
(gdb)  p ret
$1 = 16
(gdb) p cli_op
$2 = GD_OP_SET_VOLUME
(gdb) p lock_fail
$3 = 1
(gdb) 





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

[2012-01-23 03:31:18.791635] I [glusterd-utils.c:2297:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2012-01-23 03:31:18.791816] I [glusterd-utils.c:2302:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2012-01-23 03:31:18.792040] I [glusterd-utils.c:2307:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2012-01-23 03:31:18.810502] I [glusterd-op-sm.c:6854:glusterd_op_ac_send_commit_op] 0-glusterd: Sent op req to 2 peers
[2012-01-23 03:31:18.810850] E [glusterd-handler.c:1905:glusterd_handle_set_volume] 0-: Unable to set cli op: 16
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-01-23 03:31:18
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.2.6qa1
/lib64/libc.so.6[0x34c2232980]
/lib64/libc.so.6(gsignal+0x35)[0x34c2232905]
/lib64/libc.so.6(abort+0x175)[0x34c22340e5]
/lib64/libc.so.6[0x34c222b9be]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x34c222ba80]
/usr/local/lib/glusterfs/3.2.6qa1/xlator/mgmt/glusterd.so(glusterd_op_send_cli_response+0x8b)[0x7fc2e1d04042]
/usr/local/lib/glusterfs/3.2.6qa1/xlator/mgmt/glusterd.so(glusterd_handle_set_volume+0x4a0)[0x7fc2e1cd513c]
/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x344)[0x7fc2e3982890]
/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x181)[0x7fc2e3982c33]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x130)[0x7fc2e39886ac]
/usr/local/lib/glusterfs/3.2.6qa1/rpc-transport/socket.so(socket_event_poll_in+0x54)[0x7fc2e19f2af9]
/usr/local/lib/glusterfs/3.2.6qa1/rpc-transport/socket.so(socket_event_handler+0x21d)[0x7fc2e19f307d]
/usr/local/lib/libglusterfs.so.0(+0x4f7e8)[0x7fc2e3be57e8]
/usr/local/lib/libglusterfs.so.0(+0x4fa0b)[0x7fc2e3be5a0b]
/usr/local/lib/libglusterfs.so.0(event_dispatch+0x88)[0x7fc2e3be5d96]
glusterd(main+0x1b7)[0x40700c]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x34c221ecdd]
glusterd[0x403709]

Comment 1 Anand Avati 2012-01-24 16:37:09 UTC
CHANGE: http://review.gluster.com/2678 (mgmt/glusterd: do not assert if op_ctx is NULL) merged in release-3.2 by Anand Avati (avati)

Comment 2 Raghavendra Bhat 2012-02-13 05:56:03 UTC
This is fixed now. Parallel gluster cli operations on the same machine is not crashing it. Tested with glusterfs-3.2.6qa2.