Bug 799265 - [glusterfs-3.3.0qa25]: glustershd process asserted since the dictionary for sending the reply was NULL
[glusterfs-3.3.0qa25]: glustershd process asserted since the dictionary for s...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
mainline
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Pranith Kumar K
:
Depends On:
Blocks: 817967
  Show dependency treegraph
 
Reported: 2012-03-02 05:18 EST by Raghavendra Bhat
Modified: 2015-12-01 11:45 EST (History)
2 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:09:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Raghavendra Bhat 2012-03-02 05:18:09 EST
Description of problem:

Same setup as the bug 799262 (i.e. 2 replica volume, added 2 more bricks to make it 2x2 dist-repl volume). 1 fuse client and 1 nfs client. On fuse ran fs-perf-test with 4444 fds as th argument. While the test was going on brought a brick down and brought it up after some time. Gave gluster volume heal <volname> command to trigger self-heal. glustershd crashed with the following backtrace.

Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /etc/'.
Program terminated with signal 6, Aborted.
#0  0x000000390f432905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x000000390f432905 in raise () from /lib64/libc.so.6
#1  0x000000390f4340e5 in abort () from /lib64/libc.so.6
#2  0x000000390f42b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000390f42ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000408d5b in glusterfs_xlator_op_response_send (req=0x1623d9c, op_ret=-1, msg=0x40e2d8 "", output=0x0)
    at ../../../glusterfsd/src/glusterfsd-mgmt.c:328
#5  0x000000000040a26d in glusterfs_handle_translator_op (data=0x1623d9c) at ../../../glusterfsd/src/glusterfsd-mgmt.c:731
#6  0x00007f87d3b18753 in synctask_wrap (old_task=0x17302a0) at ../../../libglusterfs/src/syncop.c:144
#7  0x000000390f443690 in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()
(gdb)f 4
#4  0x0000000000408d5b in glusterfs_xlator_op_response_send (req=0x1623d9c, op_ret=-1, msg=0x40e2d8 "", output=0x0)
    at ../../../glusterfsd/src/glusterfsd-mgmt.c:328
328             GF_ASSERT (output);
(gdb) p output
$1 = (dict_t *) 0x0
(gdb) f 5
#5  0x000000000040a26d in glusterfs_handle_translator_op (data=0x1623d9c) at ../../../glusterfsd/src/glusterfsd-mgmt.c:731
731             glusterfs_xlator_op_response_send (req, ret, "", output);
(gdb) p output
$2 = (dict_t *) 0x0
(gdb) (gdb) l glusterfs_handle_translator_op 
651             return NULL;
652     }
653
654     int
655     glusterfs_handle_translator_op (void *data)
656     {
657             int32_t                  ret     = -1;
658             gd1_mgmt_brick_op_req    xlator_req = {0,};
659             dict_t                   *input    = NULL;
660             xlator_t                 *xlator = NULL;
(gdb) 
661             xlator_t                 *any = NULL;
662             dict_t                   *output = NULL;
663             char                     key[2048] = {0};
664             char                    *xname = NULL;
665             glusterfs_ctx_t          *ctx = NULL;
666             glusterfs_graph_t        *active = NULL;
667             xlator_t                 *this = NULL;
668             int                      i = 0;
669             int                      count = 0;
670             rpcsvc_request_t         *req = data;
(gdb) 
671
672             GF_ASSERT (req);
673             this = THIS;
674             GF_ASSERT (this);
675
676             if (!xdr_to_generic (req->msg[0], &xlator_req,
677                                  (xdrproc_t)xdr_gd1_mgmt_brick_op_req)) {
678                     //failed to decode msg;
679                     req->rpc_err = GARBAGE_ARGS;
680                     goto out;
(gdb) 
681             }
682
683             ctx = glusterfs_ctx_get ();
684             active = ctx->active;
685             any = active->first;
686             input = dict_new ();
687             ret = dict_unserialize (xlator_req.input.input_val,
688                                     xlator_req.input.input_len,
689                                     &input);
690             if (ret < 0) {
(gdb) 
691                     gf_log (this->name, GF_LOG_ERROR,
692                             "failed to "
693                             "unserialize req-buffer to dictionary");
694                     goto out;
695             } else {
696                     input->extra_stdfree = xlator_req.input.input_val;
697             }
698
699             ret = dict_get_int32 (input, "count", &count);
700
(gdb) 

 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a replicate volume, start it and mount it.
2. Run fs-perf-test with number of fds as argument (4444 in this case)
3. Bring a brick down, bring it up after some time and trigger self-heal via gluster cli command
  
Actual results:

gluster self-heal daemon crashed.

Expected results:

gluster self-heal-daemon should not crash.

Additional info:

In the function glusterfs_handle_translator_op, suppose we cannot unserialize the dictionary that glusterd has sent, then we are going out to call glusterfs_translator_info_response_send, which expects the output dictionary to be present.(glusterfs_handle_translator_op creates the new output dictionary after unserializing the input dictionary it has received from glusterd).

[2012-03-02 02:43:27.951724] I [afr-self-heal-common.c:2028:afr_self_heal_completion_cbk] 0-mirror-replicate-1: background  meta-data data sel
f-heal completed on
[2012-03-02 02:43:27.953170] I [afr-common.c:1290:afr_launch_self_heal] 0-mirror-replicate-1: background  meta-data data self-heal triggered.
path: , reason: lookup detected pending operations
[2012-03-02 02:43:33.736881] I [afr-self-heal-algorithm.c:131:sh_loop_driver_done] 0-mirror-replicate-1: diff self-heal on : completed. (134 b
locks of 278 were different (48.20%))
[2012-03-02 02:43:33.739047] I [afr-self-heal-common.c:2028:afr_self_heal_completion_cbk] 0-mirror-replicate-1: background  meta-data data self-heal completed on
[2012-03-02 02:43:33.742508] I [afr-common.c:1290:afr_launch_self_heal] 0-mirror-replicate-1: background  meta-data data self-heal triggered. path: , reason: lookup detected pending operations
[2012-03-02 02:43:39.836988] I [afr-self-heal-algorithm.c:131:sh_loop_driver_done] 0-mirror-replicate-1: diff self-heal on : completed. (134 blocks of 278 were different (48.20%))
[2012-03-02 02:43:39.839189] I [afr-self-heal-common.c:2028:afr_self_heal_completion_cbk] 0-mirror-replicate-1: background  meta-data data self-heal completed on
[2012-03-02 02:43:39.842079] I [afr-common.c:1290:afr_launch_self_heal] 0-mirror-replicate-1: background  meta-data data self-heal triggered. path: , reason: lookup detected pending operations
[2012-03-02 02:43:42.506148] I [afr-self-heald.c:890:afr_find_child_position] 0-mirror-replicate-0: child mirror-client-1 is remote
[2012-03-02 02:43:42.522395] W [dict.c:2578:dict_unserialize] (-->/lib64/libc.so.6() [0x390f443690] (-->/usr/local/lib/libglusterfs.so.0(synctask_wrap+0x38) [0x7f87d3b18753] (-->/usr/local/sbin/glusterfs(glusterfs_handle_translator_op+0x1a8) [0x409f55]))) 0-dict: buf is null!
[2012-03-02 02:43:42.522427] E [glusterfsd-mgmt.c:693:glusterfs_handle_translator_op] 0-glusterfs: failed to unserialize req-buffer to dictionary
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-03-02 02:43:42
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa25
/lib64/libc.so.6[0x390f432980]
/lib64/libc.so.6(gsignal+0x35)[0x390f432905]
/lib64/libc.so.6(abort+0x175)[0x390f4340e5]
/lib64/libc.so.6[0x390f42b9be]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x390f42ba80]
Comment 1 Amar Tumballi 2012-03-12 05:46:14 EDT
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.
Comment 2 Anand Avati 2012-03-18 02:24:27 EDT
CHANGE: http://review.gluster.com/2961 (glusterfsd: Handle errors in response send) merged in master by Anand Avati (avati@redhat.com)
Comment 3 Raghavendra Bhat 2012-04-05 06:33:25 EDT
Not seen with glusterfs-3.3.0qa33.

Note You need to log in before you can comment on or make changes to this bug.