Bug 851109

Summary: glusterd core dumps when statedump is taken under heavy load
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sachidananda Urs <sac>
Component: glusterdAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED ERRATA QA Contact: Sudhir D <sdharane>
Severity: urgent Docs Contact:
Priority: high    
Version: 2.0CC: amarts, gluster-bugs, kparthas, shaines, vbellur, vinaraya
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-11 14:24:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 844802    
Attachments:
Description Flags
Gluster core dump
none
New core
none
Logs none

Description Sachidananda Urs 2012-08-23 09:28:49 UTC
Version-Release number of selected component (if applicable):
2.0.z 

How reproducible:
Rarely, but reproducible.

Steps to Reproduce:
1. Load the filesystem with multiple kernel compiles and benchmarks.
2. run `sosreport gluster' repeatedly on the servers.


Additional info: 

backtrace of the dump

(gdb) bt
#0  0x00007f1d9653d6b7 in glusterd_serialize_reply (req=0x13e8faa0, arg=0x7fff0d15ed30, outmsg=0x7fff0d15ecc0, 
    xdrproc=0x7f1d9a43bfa0 <xdr_gd1_mgmt_commit_op_rsp>) at glusterd-utils.c:435
#1  0x00007f1d9653d80a in glusterd_submit_reply (req=0x13e8faa0, arg=<value optimized out>, payload=0x0, payloadcount=0, iobref=0x13e8f020, 
    xdrproc=<value optimized out>) at glusterd-utils.c:489
#2  0x00007f1d96514185 in glusterd_op_commit_send_resp (req=0x13e8faa0, op=0, status=-1, op_errstr=<value optimized out>, rsp_dict=0x0)
    at glusterd-handler.c:1454
#3  0x00007f1d965291ee in glusterd_op_ac_send_commit_failed (event=<value optimized out>, ctx=0xd898a10) at glusterd-op-sm.c:2841
#4  0x00007f1d9652e906 in glusterd_op_sm () at glusterd-op-sm.c:4620
#5  0x00007f1d9654af6e in glusterd3_1_brick_op_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7f1d9891b008) at glusterd-rpc-ops.c:1865
#6  0x00007f1d9a653ea4 in saved_frames_unwind (saved_frames=0x204b380) at rpc-clnt.c:378
#7  0x00007f1d9a653f3e in saved_frames_destroy (frames=0x204b380) at rpc-clnt.c:396
#8  0x00007f1d9a6544d0 in rpc_clnt_connection_cleanup (conn=0x204b480) at rpc-clnt.c:558
#9  0x00007f1d9a654818 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x204b480, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:861
#10 0x00007f1d9a650018 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:489
#11 0x00007f1d95cae9e4 in socket_event_poll_err (this=0x204c970) at socket.c:685
#12 0x00007f1d95cb5a9e in socket_event_handler (fd=<value optimized out>, idx=5, data=0x204c970, poll_in=1, poll_out=0, 
    poll_err=<value optimized out>) at socket.c:1799
#13 0x00007f1d9a89ad14 in event_dispatch_epoll_handler (event_pool=0x200f630) at event.c:785
#14 event_dispatch_epoll (event_pool=0x200f630) at event.c:847
#15 0x00000000004073ca in main (argc=<value optimized out>, argv=0x7fff0d15f2d8) at glusterfsd.c:1689

Comment 1 Sachidananda Urs 2012-08-23 09:31:45 UTC
Created attachment 606486 [details]
Gluster core dump

Comment 3 krishnan parthasarathi 2012-08-29 08:49:46 UTC
Fixed on master - http://review.gluster.com/3864

Comment 4 Vijay Bellur 2012-08-30 10:30:01 UTC
CHANGE: http://review.gluster.org/3876 (glusterd: Fixed incorrect assumptions in rpcsvc actors of glusterd) merged in release-3.3 by Vijay Bellur (vbellur)

Comment 6 krishnan parthasarathi 2012-09-03 06:47:00 UTC
Root Cause:
------------
Some rpc procedure handlers (esp. statedump) of glusterd and glusterfsd return -1
even when a reply may have been submitted to the remote caller. This has a side-effect of rpc layer sending a error reply using the _same_ request object potentially resulting in double free.

Fix:
-----
RPC handler functions are modified to return 0 (success for rpc layer) once reply is submitted irrespective of other kinds of failures (ENOMEM, dict set failing etc) that may happen prior to return.

Comment 7 Sachidananda Urs 2012-09-07 15:55:16 UTC
Crashes yet again. This time sooner than expected, under moderate load.


#0  0x00007f99a9491f7b in getmntent_r () from /lib64/libc.so.6
#1  0x00007f99a5e0c1d3 in glusterd_add_brick_mount_details (volinfo=<value optimized out>, 
    brickinfo=0x1aef4a0, dict=0x7f99a80585c0, count=0) at glusterd-utils.c:3788
#2  glusterd_add_brick_detail_to_dict (volinfo=<value optimized out>, brickinfo=0x1aef4a0, 
    dict=0x7f99a80585c0, count=0) at glusterd-utils.c:3906
#3  0x00007f99a5df8c1a in glusterd_op_status_volume (op=<value optimized out>, 
    dict=<value optimized out>, op_errstr=<value optimized out>, rsp_dict=<value optimized out>)
    at glusterd-op-sm.c:1529
#4  glusterd_op_commit_perform (op=<value optimized out>, dict=<value optimized out>, 
    op_errstr=<value optimized out>, rsp_dict=<value optimized out>) at glusterd-op-sm.c:3039
#5  0x00007f99a5df9d34 in glusterd_op_ac_send_commit_op (event=<value optimized out>, 
    ctx=<value optimized out>) at glusterd-op-sm.c:2348
#6  0x00007f99a5df6906 in glusterd_op_sm () at glusterd-op-sm.c:4620
#7  0x00007f99a5e1400f in glusterd3_1_stage_op_cbk (req=<value optimized out>, 
    iov=<value optimized out>, count=<value optimized out>, myframe=0x7f99a81e1fc0)
    at glusterd-rpc-ops.c:923
#8  0x00007f99a9f1c0c5 in rpc_clnt_handle_reply (clnt=0x1adee20, pollin=0x1b06be0)
    at rpc-clnt.c:788
#9  0x00007f99a9f1c8c0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1adee50, 
    event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:907
#10 0x00007f99a9f18018 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#11 0x00007f99a557d954 in socket_event_poll_in (this=0x1aebd40) at socket.c:1677
#12 0x00007f99a557da37 in socket_event_handler (fd=<value optimized out>, idx=4, data=0x1aebd40, 
    poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#13 0x00007f99aa162d84 in event_dispatch_epoll_handler (event_pool=0x1ac1630) at event.c:785
#14 event_dispatch_epoll (event_pool=0x1ac1630) at event.c:847
#15 0x00000000004073ca in main (argc=<value optimized out>, argv=0x7fff27810308)
    at glusterfsd.c:1689
(gdb)

Comment 8 Sachidananda Urs 2012-09-07 15:56:59 UTC
Created attachment 610787 [details]
New core

Comment 9 Sachidananda Urs 2012-09-07 15:58:35 UTC
Created attachment 610788 [details]
Logs

Comment 10 Raghavendra Bhat 2012-09-08 09:03:06 UTC
In glusterd_add_brick_mount_details setmntent is called on /etc/mtab to get the mount options. It returns a file pointer which is used by getmntent and it gets the all the mounted filesystems related information. If setmntent fails, then it returns a NULL file pointer and if it is passed to getmntent without being checked, then the process segfaults. Thats what happened in this case. I was able to reproduced the bug by renaming the /etc/mtab file. It produced the same backtrace.

It can be fixed by checking the file pointer to be NULL after setmntent is called if its NULL then return from the function (-1 ret value). Also the errno can be logged.

The below diff can fix the issue.
git diff
diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c b/xlators/mgmt/glusterd/src/glusterd-utils.c
index 9112458..ce7e74a 100644
--- a/xlators/mgmt/glusterd/src/glusterd-utils.c
+++ b/xlators/mgmt/glusterd/src/glusterd-utils.c
@@ -3785,6 +3785,12 @@ glusterd_add_brick_mount_details (glusterd_brickinfo_t *brickinfo,
                 goto out;
 
         mtab = setmntent (_PATH_MOUNTED, "r");
+        if (!mtab) {
+                gf_log ("glusterd", GF_LOG_ERROR, "setmnt of %s failed (%s)",
+                        _PATH_MOUNTED, strerror (errno));
+                ret = -1;
+                goto out;
+        }
         entry = getmntent (mtab);
 
         while (1) {
@@ -5239,8 +5245,8 @@ glusterd_nfs_statedump (char *options, int option_cnt, char **op_errstr)
 
         pidfile = fopen (pidfile_path, "r");
         if (!pidfile) {
-                gf_log ("", GF_LOG_ERROR, "Unable to open pidfile: %s",
-                        pidfile_path);
+                gf_log ("", GF_LOG_ERROR, "Unable to open pidfile: %s (%s)",
+                        pidfile_path, strerror (errno));
                 ret = -1;
                 goto out;
         }


But the question is why setmntent failed.

Comment 11 krishnan parthasarathi 2012-09-08 16:36:46 UTC
Raghavendra Bhat,

The reason why setmntent failed is most likely too many open fds. setmntent(3) 'opens' /etc/mtab everytime glusterd_add_brick_detail_mount_details is called. We should have 'closed' the fd on exit using endmntent(3). On missing this, running "volume status" in a loop would result in too many open fds, which can cause setmntent(3) to return NULL (FILE*) and eventually result in glusterd crashing with the backtrace observed.

See http://review.gluster.com/3920

Comment 12 Amar Tumballi 2012-09-10 16:19:38 UTC
pushed into downstream, need one more round of testing on 3.3.0rhs-28

Comment 14 errata-xmlrpc 2012-09-11 14:24:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1253.html