Description of problem: During a rebalance a random brick process dies. The backtrace and behavior seems to be identical to what was reported in bug 1536294 , which should have been fixed in the 3.13.2 release. pending frames: frame : type(0) op(36) frame : type(0) op(45) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(13) frame : type(0) op(27) frame : type(0) op(13) frame : type(0) op(29) frame : type(0) op(17) frame : type(0) op(27) frame : type(0) op(14) frame : type(0) op(14) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2018-03-06 06:34:00 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.13.2 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xaa)[0x7f02360f81ba] /lib64/libglusterfs.so.0(gf_print_trace+0x2f7)[0x7f0236101e57] /lib64/libc.so.6(+0x35950)[0x7f0234755950] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f0234f52e00] /lib64/libglusterfs.so.0(dict_rename_key+0x6e)[0x7f02360f2fae] /usr/lib64/glusterfs/3.13.2/xlator/features/selinux.so(+0x1f4d)[0x7f0224d2ff4d] /usr/lib64/glusterfs/3.13.2/xlator/features/marker.so(+0x11db7)[0x7f0224b1adb7] /lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625] /lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625] /usr/lib64/glusterfs/3.13.2/xlator/features/quota.so(+0xe067)[0x7f02244cf067] /usr/lib64/glusterfs/3.13.2/xlator/debug/io-stats.so(+0x7376)[0x7f0224297376] /lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0x2ddee)[0x7f021fddfdee] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbd49)[0x7f021fdbdd49] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbde5)[0x7f021fdbdde5] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc78c)[0x7f021fdbe78c] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbe2e)[0x7f021fdbde2e] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc62e)[0x7f021fdbe62e] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc742)[0x7f021fdbe742] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbe0e)[0x7f021fdbde0e] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc834)[0x7f021fdbe834] /usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0x2721e)[0x7f021fdd921e] /lib64/libgfrpc.so.0(rpcsvc_request_handler+0x9a)[0x7f0235ebba8a] /lib64/libpthread.so.0(+0x773a)[0x7f0234f5073a] /lib64/libc.so.6(clone+0x3f)[0x7f0234827e7f] Version-Release number of selected component (if applicable): GlusterFS 3.13.2 How reproducible: We are running Gluster 3.13.2 on a ZFS raidz filesystem with 18 raidz bricks. Steps to Reproduce: 1. gluster volume <volname> rebalance start 2. 3. Actual results: After ~36 hours one of the brick processes will die and the rebalance will be in a failed state. Expected results: The rebalance should complete successfully. Additional info:
I was looking at the commit that closed bug 1536294 here: https://review.gluster.org/19240 Should line 150 and 192 xlators/features/selinux/src/selinux.c of be if (!priv->selinux_enabled || !dict) instead of if (!priv->selinux_enabled && !dict)
It looks like this can result in a NULL ptr deref when LOCK (&this->lock); is called in dict_rename_key. https://github.com/gluster/glusterfs/blob/303cc2b54797bc5371be742543ccb289010c92f2/libglusterfs/src/dict.c#L2553
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.