Bug 1552228 - Gluster brick process dies during rebalance
Summary: Gluster brick process dies during rebalance
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 3.13
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-06 18:26 UTC by matt.adams
Modified: 2018-06-20 18:28 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description matt.adams 2018-03-06 18:26:09 UTC
Description of problem:

During a rebalance a random brick process dies.  The backtrace and behavior seems to be identical to what was reported in bug 1536294 , which should have been fixed in the 3.13.2 release.  

pending frames:
frame : type(0) op(36)
frame : type(0) op(45)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(13)
frame : type(0) op(27)
frame : type(0) op(13)
frame : type(0) op(29)
frame : type(0) op(17)
frame : type(0) op(27)
frame : type(0) op(14)
frame : type(0) op(14)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2018-03-06 06:34:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.13.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xaa)[0x7f02360f81ba]
/lib64/libglusterfs.so.0(gf_print_trace+0x2f7)[0x7f0236101e57]
/lib64/libc.so.6(+0x35950)[0x7f0234755950]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f0234f52e00]
/lib64/libglusterfs.so.0(dict_rename_key+0x6e)[0x7f02360f2fae]
/usr/lib64/glusterfs/3.13.2/xlator/features/selinux.so(+0x1f4d)[0x7f0224d2ff4d]
/usr/lib64/glusterfs/3.13.2/xlator/features/marker.so(+0x11db7)[0x7f0224b1adb7]
/lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625]
/lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625]
/usr/lib64/glusterfs/3.13.2/xlator/features/quota.so(+0xe067)[0x7f02244cf067]
/usr/lib64/glusterfs/3.13.2/xlator/debug/io-stats.so(+0x7376)[0x7f0224297376]
/lib64/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f023616b625]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0x2ddee)[0x7f021fddfdee]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbd49)[0x7f021fdbdd49]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbde5)[0x7f021fdbdde5]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc78c)[0x7f021fdbe78c]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbe2e)[0x7f021fdbde2e]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc62e)[0x7f021fdbe62e]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc742)[0x7f021fdbe742]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xbe0e)[0x7f021fdbde0e]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0xc834)[0x7f021fdbe834]
/usr/lib64/glusterfs/3.13.2/xlator/protocol/server.so(+0x2721e)[0x7f021fdd921e]
/lib64/libgfrpc.so.0(rpcsvc_request_handler+0x9a)[0x7f0235ebba8a]
/lib64/libpthread.so.0(+0x773a)[0x7f0234f5073a]
/lib64/libc.so.6(clone+0x3f)[0x7f0234827e7f]


Version-Release number of selected component (if applicable):
GlusterFS 3.13.2


How reproducible:

We are running Gluster 3.13.2 on a ZFS raidz filesystem with 18 raidz bricks.  

Steps to Reproduce:
1. gluster volume <volname> rebalance start
2. 
3.

Actual results:
After ~36 hours one of the brick processes will die and the rebalance will be in a failed state.

Expected results:

The rebalance should complete successfully.

Additional info:

Comment 1 matt.adams 2018-03-07 18:26:03 UTC
I was looking at the commit that closed bug 1536294 here:

https://review.gluster.org/19240

Should line 150 and 192 xlators/features/selinux/src/selinux.c of be

if (!priv->selinux_enabled || !dict)

instead of

if (!priv->selinux_enabled && !dict)

Comment 2 Steve McDaniel 2018-03-07 18:35:04 UTC
It looks like this can result in a NULL ptr deref when LOCK (&this->lock); is called in dict_rename_key.

https://github.com/gluster/glusterfs/blob/303cc2b54797bc5371be742543ccb289010c92f2/libglusterfs/src/dict.c#L2553

Comment 3 Shyamsundar 2018-06-20 18:28:08 UTC
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.

Comment 4 Shyamsundar 2018-06-20 18:28:10 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.


Note You need to log in before you can comment on or make changes to this bug.