Bug 1395245 - glusterd crash
Summary: glusterd crash
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.7.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact:
URL:
Whiteboard:
Depends On: 1322262
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-15 14:03 UTC by mail
Modified: 2016-12-13 07:20 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.7.18
Clone Of:
Environment:
Last Closed: 2016-12-13 07:20:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Core dump (15.42 MB, application/x-xz)
2016-11-23 08:44 UTC, mail
no flags Details

Description mail 2016-11-15 14:03:07 UTC
Description of problem:
Sometimes glusterd crashes with a coredump. Like once every month. Bricks keep running OK. As far as i'm aware nothing special is done to trigger this.

Version-Release number of selected component (if applicable):
glusterfs-server-3.7.6-1.el7.x86_64

How reproducible:
Not aware of a way to trigger this, hits like once a month.


Steps to Reproduce:
N/A

Actual results:
Glusterd crash

Expected results:
Glusterd keeps running :)

Additional info:

logs:
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 
2016-11-09 23:23:24
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.6
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f788bb45012]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f788bb614dd]
/lib64/libc.so.6(+0x35670)[0x7f788a233670]
/lib64/libc.so.6(gsignal+0x37)[0x7f788a2335f7]
/lib64/libc.so.6(abort+0x148)[0x7f788a234ce8]
/lib64/libc.so.6(+0x75317)[0x7f788a273317]
/lib64/libc.so.6(+0x7d023)[0x7f788a27b023]
/lib64/libglusterfs.so.0(data_destroy+0x55)[0x7f788bb3ce55]
/lib64/libglusterfs.so.0(dict_destroy+0x40)[0x7f788bb3d5d0]
/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(_gd_syncop_commit_op_cbk+0x187)[0x7f7880743d37]
/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7f78806e948c]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f788b913b80]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f788b913e3f]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f788b90f983]
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f787db3c506]
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f787db3f3f4]
/lib64/libglusterfs.so.0(+0x878ea)[0x7f788bba68ea]
/lib64/libpthread.so.0(+0x7dc5)[0x7f788a9addc5]
/lib64/libc.so.6(clone+0x6d)[0x7f788a2f428d]

Coredump available if needed

Comment 1 mail 2016-11-23 08:44:16 UTC
Created attachment 1223060 [details]
Core dump

Comment 2 mail 2016-11-23 08:55:42 UTC
Exact dates of crashes:
21 mar
13 jul
19 sep
10 nov

Comment 3 Atin Mukherjee 2016-11-23 12:04:02 UTC
I think we have missed to backport http://review.gluster.org/13854 in 3.7 release branch which is causing.

Rafi - could you please backport this patch?

Comment 4 Worker Ant 2016-11-23 14:00:59 UTC
REVIEW: http://review.gluster.org/15917 (glusterd/syncop: double free of frame stack) posted (#1) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 5 Mohammed Rafi KC 2016-11-23 14:03:41 UTC
For now I have backported the patch to 3.7 release branch http://review.gluster.org/15917

To make sure this is causing the reported issue, can you please paste the bt, I have some trouble in extracting the core file. If you don't have the backtrace I will try again with the attached core file .

Comment 6 Worker Ant 2016-11-28 10:17:27 UTC
COMMIT: http://review.gluster.org/15917 committed in release-3.7 by Atin Mukherjee (amukherj) 
------
commit 1acb99bc78e827a34592dd1c41f3fd4cea11b14f
Author: Mohammed Rafi KC <rkavunga>
Date:   Wed Mar 30 17:42:44 2016 +0530

    glusterd/syncop: double free of frame stack
    
    Backport of http://review.gluster.org/13854
    
    If rpc message from glusterd during brick op phase
    fails without sending, then frame was freed from
    the caller function and call back function.
    
    >Change-Id: I63cb3be30074e9a074f6895faa25b3d091f5b6a5
    >BUG: 1322262
    >Signed-off-by: Mohammed Rafi KC <rkavunga>
    >Reviewed-on: http://review.gluster.org/13854
    >Smoke: Gluster Build System <jenkins.com>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.com>
    >Reviewed-by: Jeff Darcy <jdarcy>
    
    Change-Id: I39b32f64fd66ee8a6d30c60bb0a42faa45e78814
    BUG: 1395245
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/15917
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 7 mail 2016-11-28 10:25:01 UTC
I have no BT available.
If you do have problems with the attached core file please contact me and i'll get the core to you in some other way.

Comment 8 Kaushal 2016-12-13 07:20:01 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.18, please open a new bug report.

glusterfs-3.7.18 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-December/029427.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.