Description of problem: Sometimes glusterd crashes with a coredump. Like once every month. Bricks keep running OK. As far as i'm aware nothing special is done to trigger this. Version-Release number of selected component (if applicable): glusterfs-server-3.7.6-1.el7.x86_64 How reproducible: Not aware of a way to trigger this, hits like once a month. Steps to Reproduce: N/A Actual results: Glusterd crash Expected results: Glusterd keeps running :) Additional info: logs: pending frames: frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2016-11-09 23:23:24 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.6 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f788bb45012] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f788bb614dd] /lib64/libc.so.6(+0x35670)[0x7f788a233670] /lib64/libc.so.6(gsignal+0x37)[0x7f788a2335f7] /lib64/libc.so.6(abort+0x148)[0x7f788a234ce8] /lib64/libc.so.6(+0x75317)[0x7f788a273317] /lib64/libc.so.6(+0x7d023)[0x7f788a27b023] /lib64/libglusterfs.so.0(data_destroy+0x55)[0x7f788bb3ce55] /lib64/libglusterfs.so.0(dict_destroy+0x40)[0x7f788bb3d5d0] /usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(_gd_syncop_commit_op_cbk+0x187)[0x7f7880743d37] /usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7f78806e948c] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f788b913b80] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f788b913e3f] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f788b90f983] /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f787db3c506] /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f787db3f3f4] /lib64/libglusterfs.so.0(+0x878ea)[0x7f788bba68ea] /lib64/libpthread.so.0(+0x7dc5)[0x7f788a9addc5] /lib64/libc.so.6(clone+0x6d)[0x7f788a2f428d] Coredump available if needed
Created attachment 1223060 [details] Core dump
Exact dates of crashes: 21 mar 13 jul 19 sep 10 nov
I think we have missed to backport http://review.gluster.org/13854 in 3.7 release branch which is causing. Rafi - could you please backport this patch?
REVIEW: http://review.gluster.org/15917 (glusterd/syncop: double free of frame stack) posted (#1) for review on release-3.7 by mohammed rafi kc (rkavunga)
For now I have backported the patch to 3.7 release branch http://review.gluster.org/15917 To make sure this is causing the reported issue, can you please paste the bt, I have some trouble in extracting the core file. If you don't have the backtrace I will try again with the attached core file .
COMMIT: http://review.gluster.org/15917 committed in release-3.7 by Atin Mukherjee (amukherj) ------ commit 1acb99bc78e827a34592dd1c41f3fd4cea11b14f Author: Mohammed Rafi KC <rkavunga> Date: Wed Mar 30 17:42:44 2016 +0530 glusterd/syncop: double free of frame stack Backport of http://review.gluster.org/13854 If rpc message from glusterd during brick op phase fails without sending, then frame was freed from the caller function and call back function. >Change-Id: I63cb3be30074e9a074f6895faa25b3d091f5b6a5 >BUG: 1322262 >Signed-off-by: Mohammed Rafi KC <rkavunga> >Reviewed-on: http://review.gluster.org/13854 >Smoke: Gluster Build System <jenkins.com> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.com> >Reviewed-by: Jeff Darcy <jdarcy> Change-Id: I39b32f64fd66ee8a6d30c60bb0a42faa45e78814 BUG: 1395245 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: http://review.gluster.org/15917 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj>
I have no BT available. If you do have problems with the attached core file please contact me and i'll get the core to you in some other way.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.18, please open a new bug report. glusterfs-3.7.18 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-December/029427.html [2] https://www.gluster.org/pipermail/gluster-users/