1671556 – glusterfs FUSE client crashing every few days with 'Failed to dispatch handler'

Bug 1671556 - glusterfs FUSE client crashing every few days with 'Failed to dispatch handler'

Summary: glusterfs FUSE client crashing every few days with 'Failed to dispatch handler'

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	fuse
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Nithya Balachandran
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1671014 (view as bug list)
Depends On:
Blocks:	glusterfs-5.4 1674406 1676356 Gluster_5_Affecting_oVirt_4.3 1678570 1691292
TreeView+	depends on / blocked

Reported:	2019-01-31 21:59 UTC by David E. Smith
Modified:	2019-03-27 13:44 UTC (History)
CC List:	12 users (show)
Fixed In Version:	glusterfs-5.5
Clone Of:
Clones:	1674406 (view as bug list)
Environment:
Last Closed:	2019-03-27 06:07:03 UTC
Regression:	---
Mount Type:	fuse
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22228	0	None	Open	performance/write-behind: fix use-after-free in readdirp	2019-02-22 14:50:29 UTC
Gluster.org Gerrit	22229	0	None	Merged	performance/write-behind: handle call-stub leaks	2019-02-22 14:49:42 UTC

Description David E. Smith 2019-01-31 21:59:49 UTC

This is a re-post of my FUSE crash report from BZ1651246. That issue is for a crash in the FUSE client. Mine is too, but I was asked in that bug to open a new issue, so here you go. :)



My servers (two, in a 'replica 2' setup) publish two volumes. One is Web site content, about 110GB; the other is Web config files, only a few megabytes. (Wasn't worth building extra servers for that second volume.) FUSE clients have been crashing on the larger volume every three or four days. I can't reproduce this on-demand, unfortunately, but I've got several cores from previous crashes that may be of value to you.

I'm using Gluster 5.3 from the RPMs provided by the CentOS Storage SIG, on a Red Hat Enterprise Linux 7.x system.


The client's logs show many hundreds of instances of this (I don't know if it's related):
[2019-01-29 08:14:16.542674] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384) [0x7fa171ead384] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e) [0x7fa1720bee3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7fa1809cc2ad] ) 0-dict: dict is NULL [Invalid argument]

Then, when the client's glusterfs process crashes, this is logged:

The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 871 times between [2019-01-29 08:12:48.390535] and [2019-01-29 08:14:17.100279]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-01-29 08:14:17
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84]
/lib64/libc.so.6(+0x36280)[0x7fa17f03c280]
/lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d]
/lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2]
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a]
/lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5]
/lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead]
---------



Info on the volumes themselves, gathered from one of my servers:

[davidsmith@wuit-s-10889 ~]$ sudo gluster volume info all

Volume Name: web-config
Type: Replicate
Volume ID: 6c5dce6e-e64e-4a6d-82b3-f526744b463d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-config
Brick2: 172.23.128.27:/data/web-config
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
server.event-threads: 4
client.event-threads: 4
cluster.min-free-disk: 1
cluster.quorum-count: 2
cluster.quorum-type: fixed
network.ping-timeout: 10
auth.allow: *
performance.readdir-ahead: on

Volume Name: web-content
Type: Replicate
Volume ID: fcabc15f-0cec-498f-93c4-2d75ad915730
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-content
Brick2: 172.23.128.27:/data/web-content
Options Reconfigured:
network.ping-timeout: 10
cluster.quorum-type: fixed
cluster.quorum-count: 2
performance.readdir-ahead: on
auth.allow: *
cluster.min-free-disk: 1
client.event-threads: 4
server.event-threads: 4
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
performance.cache-size: 4GB



gluster> volume status all detail
Status of volume: web-config
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5612
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5540
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277

Status of volume: web-content
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5649
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5567
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277



I'll attach a couple of the core files generated by the crashing glusterfs instances, size limits permitting (they range from 3 to 8 GB). If I can't attach them, I'll find somewhere to host them.

Comment 1 Artem Russakovskii 2019-01-31 22:26:25 UTC

Also reposting my comment from https://bugzilla.redhat.com/show_bug.cgi?id=1651246.



I wish I saw this bug report before I updated from rock solid 4.1 to 5.3. Less than 24 hours after upgrading, I already got a crash and had to unmount, kill gluster, and remount:


[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-01-31 09:38:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
---------

Do the pending patches fix the crash or only the repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, not too sure how to make it core dump.

If it's not fixed by the patches above, has anyone already opened a ticket for the crashes that I can join and monitor? This is going to create a massive problem for us since production systems are crashing.

Thanks.

Comment 2 David E. Smith 2019-01-31 22:31:47 UTC

Actually, I ran the cores through strings and grepped for a few things like passwords -- as you'd expect from a memory dump from a Web server, there's a log of sensitive information in there. Is there a safe/acceptable way to send the cores only to developers that can use them, or otherwise not have to make them publicly available while still letting the Gluster devs benefit from analyzing them?

Comment 3 Ravishankar N 2019-02-01 05:51:19 UTC

(In reply to David E. Smith from comment #2)
> Actually, I ran the cores through strings and grepped for a few things like
> passwords -- as you'd expect from a memory dump from a Web server, there's a
> log of sensitive information in there. Is there a safe/acceptable way to
> send the cores only to developers that can use them, or otherwise not have
> to make them publicly available while still letting the Gluster devs benefit
> from analyzing them?

Perhaps you could upload it to a shared Dropbox folder with view/download access to the red hat email IDs I've CC'ed to this email (including me) to begin with. 

Note: I upgraded a 1x2 replica volume with 1 fuse client from v4.1.7 to v5.3 and did some basic I/O (kernel untar and iozone) and did not observe any crashes, so maybe this this something that is hit under extreme I/O or memory pressure. :-(

Comment 4 Artem Russakovskii 2019-02-02 20:17:15 UTC

The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this?

In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround.

By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process.


monit check:
check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
  start program  = "/bin/mount  /mnt/glusterfs_data1"
  stop program  = "/bin/umount /mnt/glusterfs_data1"
  if space usage > 90% for 5 times within 15 cycles
    then alert else if succeeded for 10 cycles then alert


stack trace:
[2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-01 23:22:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]

Comment 5 David E. Smith 2019-02-05 02:59:24 UTC

I've added the five of you to our org's Box account; all of you should have invitations to a shared folder, and I'm uploading a few of the cores now. I hope they're of value to you.

The binaries are all from the CentOS Storage SIG repo at https://buildlogs.centos.org/centos/7/storage/x86_64/gluster-5/ . They're all current as of a few days ago:

[davidsmith@wuit-s-10882 ~]$ rpm -qa | grep gluster
glusterfs-5.3-1.el7.x86_64
glusterfs-client-xlators-5.3-1.el7.x86_64
glusterfs-fuse-5.3-1.el7.x86_64
glusterfs-libs-5.3-1.el7.x86_64

Comment 6 Nithya Balachandran 2019-02-05 11:00:04 UTC

(In reply to David E. Smith from comment #5)
> I've added the five of you to our org's Box account; all of you should have
> invitations to a shared folder, and I'm uploading a few of the cores now. I
> hope they're of value to you.
> 
> The binaries are all from the CentOS Storage SIG repo at
> https://buildlogs.centos.org/centos/7/storage/x86_64/gluster-5/ . They're
> all current as of a few days ago:
> 
> [davidsmith@wuit-s-10882 ~]$ rpm -qa | grep gluster
> glusterfs-5.3-1.el7.x86_64
> glusterfs-client-xlators-5.3-1.el7.x86_64
> glusterfs-fuse-5.3-1.el7.x86_64
> glusterfs-libs-5.3-1.el7.x86_64

Thanks. We will take a look and get back to you.

Comment 7 Nithya Balachandran 2019-02-05 16:43:45 UTC

David,

Can you try mounting the volume with the option  lru-limit=0 and let us know if you still see the crashes?

Regards,
Nithya

Comment 8 Nithya Balachandran 2019-02-06 07:23:49 UTC

Initial analysis of one of the cores:

[root@rhgs313-7 gluster-5.3]# gdb -c core.6014 /usr/sbin/glusterfs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --direct-io-mode=disable --fuse-mountopts=noatime,context="'.

Program terminated with signal 11, Segmentation fault.
#0  __inode_ctx_free (inode=inode@entry=0x7fa0d0349af8) at inode.c:410
410	            if (!xl->call_cleanup && xl->cbks->forget)

(gdb) bt
#0  __inode_ctx_free (inode=inode@entry=0x7fa0d0349af8) at inode.c:410
#1  0x00007fa1809e90a2 in __inode_destroy (inode=0x7fa0d0349af8) at inode.c:432
#2  inode_table_prune (table=table@entry=0x7fa15800c3c0) at inode.c:1696
#3  0x00007fa1809e9f96 in inode_forget_with_unref (inode=0x7fa0d0349af8, nlookup=128) at inode.c:1273
#4  0x00007fa177dae4e1 in do_forget (this=<optimized out>, unique=<optimized out>, nodeid=<optimized out>, nlookup=<optimized out>) at fuse-bridge.c:726
#5  0x00007fa177dae5bd in fuse_forget (this=<optimized out>, finh=0x7fa0a41da500, msg=<optimized out>, iobuf=<optimized out>) at fuse-bridge.c:741
#6  0x00007fa177dc5d7a in fuse_thread_proc (data=0x557a0e8ffe20) at fuse-bridge.c:5125
#7  0x00007fa17f83bdd5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fa17f103ead in msync () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()
(gdb) f 0
#0  __inode_ctx_free (inode=inode@entry=0x7fa0d0349af8) at inode.c:410
410	            if (!xl->call_cleanup && xl->cbks->forget)
(gdb) l
405	    for (index = 0; index < inode->table->xl->graph->xl_count; index++) {
406	        if (inode->_ctx[index].value1 || inode->_ctx[index].value2) {
407	            xl = (xlator_t *)(long)inode->_ctx[index].xl_key;
408	            old_THIS = THIS;
409	            THIS = xl;
410	            if (!xl->call_cleanup && xl->cbks->forget)
411	                xl->cbks->forget(xl, inode);
412	            THIS = old_THIS;
413	        }
414	    }
(gdb) p *xl
Cannot access memory at address 0x0

(gdb) p index
$1 = 6

(gdb) p inode->table->xl->graph->xl_count
$3 = 13
(gdb) p inode->_ctx[index].value1
$4 = 0
(gdb) p inode->_ctx[index].value2
$5 = 140327960119304
(gdb) p/x inode->_ctx[index].value2
$6 = 0x7fa0a6370808


Based on the graph, the xlator with index = 6 is
(gdb) p ((xlator_t*)  inode->table->xl->graph->top)->next->next->next->next->next->next->next->name
$31 = 0x7fa16c0122e0 "web-content-read-ahead"
(gdb) p ((xlator_t*)  inode->table->xl->graph->top)->next->next->next->next->next->next->next->xl_id
$32 = 6

But read-ahead does not update the inode_ctx at all. There seems to be some sort of memory corruption happening here but that needs further analysis.

Comment 9 David E. Smith 2019-02-07 17:41:17 UTC

As of this morning, I've added the lru-limit mount option to /etc/fstab on my servers. Was on vacation, didn't see the request until this morning. For the sake of reference, here's the full fstab lines, edited only to remove hostnames and add placeholders. (Note that I've never had a problem with the 'web-config' volume, which is very low-traffic and only a few megabytes in size; the problems always are the much more heavily-used 'web-content' volume.)

gluster-server-1:/web-config  /etc/httpd/conf.d  glusterfs  defaults,_netdev,noatime,context=unconfined_u:object_r:httpd_config_t:s0,backupvolfile-server=gluster-server-2,direct-io-mode=disable,lru-limit=0 0 0
gluster-server-1:/web-content /var/www/html      glusterfs  defaults,_netdev,noatime,context=unconfined_u:object_r:httpd_sys_rw_content_t:s0,backupvolfile-server=gluster-server-2,direct-io-mode=disable,lru-limit=0 0 0

Comment 10 David E. Smith 2019-02-07 17:58:26 UTC

Ran a couple of the glusterfs logs through the print-backtrace script. They all start with what you'd normally expect (clone, start_thread) and all end with (_gf_msg_backtrace_nomem) but they're all doing different things in the middle. It looks sorta like a memory leak or other memory corruption. Since it started happening on both of my servers after upgrading to 5.2 (and continued with 5.3), I really doubt it's a hardware issue -- the FUSE clients are both VMs, on hosts a few miles apart, so the odds of host RAM going wonky in both places at exactly that same time are ridiculous.

Bit of a stretch, but do you think there would be value in my rebuilding the RPMs locally, to try to rule out anything on CentOS' end?


/lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610] _gf_msg_backtrace_nomem ??:0
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84] gf_print_trace ??:0
/lib64/libc.so.6(+0x36280)[0x7fa17f03c280] __restore_rt ??:0
/lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d] __inode_ctx_free ??:0
/lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2] inode_table_prune ??:0
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96] inode_forget_with_unref ??:0
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd] fuse_forget ??:0
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a] fuse_thread_proc ??:0
/lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5] start_thread ??:0
/lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead] __clone ??:0


/lib64/libglusterfs.so.0(+0x26610)[0x7f36aff72610] _gf_msg_backtrace_nomem ??:0
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f36aff7cb84] gf_print_trace ??:0
/lib64/libc.so.6(+0x36280)[0x7f36ae5d6280] __restore_rt ??:0
/lib64/libglusterfs.so.0(+0x36779)[0x7f36aff82779] __inode_unref ??:0
/lib64/libglusterfs.so.0(inode_unref+0x23)[0x7f36aff83203] inode_unref ??:0
/lib64/libglusterfs.so.0(gf_dirent_entry_free+0x2b)[0x7f36aff9ec4b] gf_dirent_entry_free ??:0
/lib64/libglusterfs.so.0(gf_dirent_free+0x2b)[0x7f36aff9ecab] gf_dirent_free ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x7480)[0x7f36a215b480] afr_readdir_cbk ??:0
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x60bca)[0x7f36a244dbca] client4_0_readdirp_cbk ??:0
/lib64/libgfrpc.so.0(+0xec70)[0x7f36afd3ec70] rpc_clnt_handle_reply ??:0
/lib64/libgfrpc.so.0(+0xf043)[0x7f36afd3f043] rpc_clnt_notify ??:0
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f36afd3af23] rpc_transport_notify ??:0
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7f36a492737b] socket_event_handler ??:0
/lib64/libglusterfs.so.0(+0x8aa49)[0x7f36affd6a49] event_dispatch_epoll_worker ??:0
/lib64/libpthread.so.0(+0x7dd5)[0x7f36aedd5dd5] start_thread ??:0
/lib64/libc.so.6(clone+0x6d)[0x7f36ae69dead] __clone ??:0


/lib64/libglusterfs.so.0(+0x26610)[0x7f7e13de0610] _gf_msg_backtrace_nomem ??:0
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f7e13deab84] gf_print_trace ??:0
/lib64/libc.so.6(+0x36280)[0x7f7e12444280] __restore_rt ??:0
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f7e12c45c30] pthread_mutex_lock ??:0
/lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f7e13e0bc3c] __gf_free ??:0
/lib64/libglusterfs.so.0(+0x368ed)[0x7f7e13df08ed] __dentry_unset ??:0
/lib64/libglusterfs.so.0(+0x36b2b)[0x7f7e13df0b2b] __inode_retire ??:0
/lib64/libglusterfs.so.0(+0x36885)[0x7f7e13df0885] __inode_unref ??:0
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x36)[0x7f7e13df1f86] inode_forget_with_unref ??:0
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x857a)[0x7f7e0b1b657a] fuse_batch_forget ??:0
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7f7e0b1cdd7a] fuse_thread_proc ??:0
/lib64/libpthread.so.0(+0x7dd5)[0x7f7e12c43dd5] start_thread ??:0
/lib64/libc.so.6(clone+0x6d)[0x7f7e1250bead] __clone ??:0

Comment 11 Nithya Balachandran 2019-02-08 03:03:20 UTC

(In reply to David E. Smith from comment #10)
> Ran a couple of the glusterfs logs through the print-backtrace script. They
> all start with what you'd normally expect (clone, start_thread) and all end
> with (_gf_msg_backtrace_nomem) but they're all doing different things in the
> middle. It looks sorta like a memory leak or other memory corruption. Since
> it started happening on both of my servers after upgrading to 5.2 (and
> continued with 5.3), I really doubt it's a hardware issue -- the FUSE
> clients are both VMs, on hosts a few miles apart, so the odds of host RAM
> going wonky in both places at exactly that same time are ridiculous.
> 
> Bit of a stretch, but do you think there would be value in my rebuilding the
> RPMs locally, to try to rule out anything on CentOS' end?

I don't think so. My guess is there is an error somewhere in the client code when handling inodes. It was never hit earlier because we never freed the inodes before 5.3. With the new inode invalidation feature, we appear to be accessing inodes that were already freed.

Did you see the same crashes in 5.2? If yes, something else might be going wrong.

I had a look at the coredumps you sent - most don't have any symbols (strangely). Of the ones that do, it looks like memory corruption and accessing already freed inodes. There are a few people looking at it but this going to take a while to figure out. In the meantime, let me know if you still see crashes with the lru-limit option.

Comment 12 Nithya Balachandran 2019-02-08 03:18:00 UTC

Another user has just reported that the lru-limit did not help with the crashes - let me know if that is your experience as well.

Comment 13 Nithya Balachandran 2019-02-08 12:57:50 UTC

We have found the cause of one crash but that has a different backtrace. Unfortunately we have not managed to reproduce the one you reported so we don't know if it is the same cause.

Can you disable write-behind on the volume and let us know if it solves the problem? If yes, it is likely to be the same issue.

Comment 14 David E. Smith 2019-02-09 16:07:08 UTC

I did have some crashes with 5.2. (I went from 3.something straight to 5.2, so I'm not going to be too helpful in terms of narrowing down exactly when this issue first appeared, sorry.) I'll see if I still have any of those cores; they all were from several weeks ago, so I may have already cleaned them up.

This morning, one of my clients core dumped with the lru-limit option. It looks like it might be a different crash (in particular, this morning's crash was a SIGABRT, whereas previous crashes were SIGSEGV). I've uploaded that core to the same Box folder, in case it's useful. I'll paste its backtrace in below.

For the write-behind request, do you want me to set 'performance.flush-behind off' or so you mean something else?

Comment 15 David E. Smith 2019-02-09 16:07:49 UTC

Backtrace for 2/9/19 crash (as promised above, put it in a separate comment for clarity):

/lib64/libglusterfs.so.0(+0x26610)[0x7f3b31456610] _gf_msg_backtrace_nomem ??:0
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f3b31460b84] gf_print_trace ??:0
/lib64/libc.so.6(+0x36280)[0x7f3b2faba280] __restore_rt ??:0
/lib64/libc.so.6(gsignal+0x37)[0x7f3b2faba207] raise ??:0
/lib64/libc.so.6(abort+0x148)[0x7f3b2fabb8f8] abort ??:0
/lib64/libc.so.6(+0x78d27)[0x7f3b2fafcd27] __libc_message ??:0
/lib64/libc.so.6(+0x81489)[0x7f3b2fb05489] _int_free ??:0
/lib64/libglusterfs.so.0(+0x1a6e9)[0x7f3b3144a6e9] dict_destroy ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x8cf9)[0x7f3b23388cf9] dht_local_wipe ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x4ab90)[0x7f3b233cab90] dht_revalidate_cbk ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x709e5)[0x7f3b236a89e5] afr_lookup_done ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x71198)[0x7f3b236a9198] afr_lookup_metadata_heal_check ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x71cbb)[0x7f3b236a9cbb] afr_lookup_entry_heal ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x71f99)[0x7f3b236a9f99] afr_lookup_cbk ??:0
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x616d2)[0x7f3b239326d2] client4_0_lookup_cbk ??:0
/lib64/libgfrpc.so.0(+0xec70)[0x7f3b31222c70] rpc_clnt_handle_reply ??:0
/lib64/libgfrpc.so.0(+0xf043)[0x7f3b31223043] rpc_clnt_notify ??:0
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f3b3121ef23] rpc_transport_notify ??:0
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7f3b25e0b37b] socket_event_handler ??:0
/lib64/libglusterfs.so.0(+0x8aa49)[0x7f3b314baa49] event_dispatch_epoll_worker ??:0
/lib64/libpthread.so.0(+0x7dd5)[0x7f3b302b9dd5] start_thread ??:0
/lib64/libc.so.6(clone+0x6d)[0x7f3b2fb81ead] __clone ??:0
[d

Comment 16 Raghavendra G 2019-02-09 17:15:55 UTC

(In reply to David E. Smith from comment #14)
> I did have some crashes with 5.2. (I went from 3.something straight to 5.2,
> so I'm not going to be too helpful in terms of narrowing down exactly when
> this issue first appeared, sorry.) I'll see if I still have any of those
> cores; they all were from several weeks ago, so I may have already cleaned
> them up.
> 
> This morning, one of my clients core dumped with the lru-limit option. It
> looks like it might be a different crash (in particular, this morning's
> crash was a SIGABRT, whereas previous crashes were SIGSEGV). I've uploaded
> that core to the same Box folder, in case it's useful. I'll paste its
> backtrace in below.
> 
> For the write-behind request, do you want me to set
> 'performance.flush-behind off' or so you mean something else?

gluster volume set <volname> performance.write-behind off

Comment 17 Nithya Balachandran 2019-02-11 04:44:08 UTC

Thanks David. I'm going to hold off on looking at the coredump until we hear back from you on whether disabling performance.write-behind works. The different backtraces could be symptoms of the same underlying issue where gluster tries to access already freed memory.

Comment 18 David E. Smith 2019-02-11 14:13:39 UTC

I've updated my volumes with that option, and shortly will be scheduling a reboot of the clients (so that we can be sure we have a clean slate, so to speak). After that, I suppose there's nothing to do except wait a few days to see if it crashes.

Sorry about not understanding that you wanted me to set write-behind; I did a quick scan of the docs at https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/ and didn't even see an option named that, so I went for the next-closest one.

Comment 19 Raghavendra G 2019-02-12 03:24:05 UTC

*** Bug 1671014 has been marked as a duplicate of this bug. ***

Comment 20 David E. Smith 2019-02-15 17:45:57 UTC

Obviously, in this case, the absence of a crash isn't necessarily proof of anything. But for what it's worth, my FUSE clients have been up for four days without crashing, which is probably the longest they've made it since I updated to 5.2 and 5.3. I'll report back in a few more days.

Comment 21 Amgad 2019-02-16 23:38:18 UTC

Hopefully a resolution would be available for R5.4. Assuming that setting "performance.write-behind off" is a workaround, what is the performance impacts for setting the option "off"?

Comment 22 Nithya Balachandran 2019-02-18 04:32:48 UTC

(In reply to Amgad from comment #21)
> Hopefully a resolution would be available for R5.4. Assuming that setting
> "performance.write-behind off" is a workaround, what is the performance
> impacts for setting the option "off"?


If the workload is write heavy, you will most likely see a perf regression.

Comment 23 Worker Ant 2019-02-19 05:57:05 UTC

REVIEW: https://review.gluster.org/22228 (performance/write-behind: fix use-after-free in readdirp) posted (#1) for review on release-5 by Raghavendra G

Comment 24 Worker Ant 2019-02-19 05:58:09 UTC

REVIEW: https://review.gluster.org/22229 (performance/write-behind: handle call-stub leaks) posted (#1) for review on release-5 by Raghavendra G

Comment 25 David E. Smith 2019-02-20 15:49:07 UTC

My two FUSE clients have now been up for nine days without a crash, after disabling the write-behind option. I'm pretty confident at this point that my crashes were related to the write-behind bugs being fixed here. Thank you!

Comment 26 Artem Russakovskii 2019-02-21 00:02:34 UTC

I wish I could say the same thing. Unfortunately, I had another crash on the same server yesterday, with performance.write-behind still set to off. I've emailed the core file privately to the relevant people.


[2019-02-19 19:50:39.511743] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7f9598991329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7f9598ba2af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7f95a137d218] ) 2-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 95 times between [2019-02-19 19:49:07.655620] and [2019-02-19 19:50:39.499284]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-<SNIP>_data3-replicate-0: selecting local read_child <SNIP>_data3-client-3" repeated 56 times between [2019-02-19 19:49:07.602370] and [2019-02-19 19:50:42.912766]
pending frames:   
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-19 19:50:43
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f95a138864c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f95a1392cb6]
/lib64/libc.so.6(+0x36160)[0x7f95a054f160]
/lib64/libc.so.6(gsignal+0x110)[0x7f95a054f0e0]
/lib64/libc.so.6(abort+0x151)[0x7f95a05506c1]
/lib64/libc.so.6(+0x2e6fa)[0x7f95a05476fa]
/lib64/libc.so.6(+0x2e772)[0x7f95a0547772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f95a08dd0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f95994f0c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f9599503ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f9599788f3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f95a1153820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f95a1153b6f]  
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f95a1150063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f959aea00b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f95a13e64c3]
/lib64/libpthread.so.0(+0x7559)[0x7f95a08da559]
/lib64/libc.so.6(clone+0x3f)[0x7f95a061181f]
---------
[2019-02-19 19:51:34.425106] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data3 /mnt/<SNIP>_data3)
[2019-02-19 19:51:34.435206] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-02-19 19:51:34.450272] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2019-02-19 19:51:34.450394] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
[2019-02-19 19:51:34.450488] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3

Comment 27 Nithya Balachandran 2019-02-21 03:57:27 UTC

(In reply to Artem Russakovskii from comment #26)
> I wish I could say the same thing. Unfortunately, I had another crash on the
> same server yesterday, with performance.write-behind still set to off. I've
> emailed the core file privately to the relevant people.
> 


Thanks Artem. I have downloaded the coredump and will take a look sometime this week. Can you share any details about the workload that might help us narrow it down?

Comment 28 Artem Russakovskii 2019-02-22 04:47:20 UTC

(In reply to Nithya Balachandran from comment #27)
> (In reply to Artem Russakovskii from comment #26)
> > I wish I could say the same thing. Unfortunately, I had another crash on the
> > same server yesterday, with performance.write-behind still set to off. I've
> > emailed the core file privately to the relevant people.
> > 
> 
> 
> Thanks Artem. I have downloaded the coredump and will take a look sometime
> this week. Can you share any details about the workload that might help us
> narrow it down?

We have a blog with images stored on gluster. Let's say 400k average views per day, split across 4 web servers, but the majority is cached by Cloudflare. Probably a hundred file writes per day, just images for blog posts.

The 2nd site hosts Android apks and has quite a bit more traffic than the blog, but all apks are also cached by Cloudflare. Several hundred apks written per day.

4.1 didn't break a sweat with our setup, so I really don't know why I'm having so much trouble. And it's always the same web server of the 4 too, and they're supposed to be roughly identical.

Comment 29 Worker Ant 2019-02-22 14:49:43 UTC

REVIEW: https://review.gluster.org/22229 (performance/write-behind: handle call-stub leaks) merged (#2) on release-5 by Shyamsundar Ranganathan

Comment 30 Worker Ant 2019-02-22 14:50:30 UTC

REVIEW: https://review.gluster.org/22228 (performance/write-behind: fix use-after-free in readdirp) merged (#2) on release-5 by Shyamsundar Ranganathan

Comment 31 Artem Russakovskii 2019-03-19 14:57:36 UTC

I upgraded the node that was crashing to 5.5 yesterday. Today, it got another crash. This is a 1x4 replicate cluster, you can find the config mentioned in my previous reports, and Amar should have it as well. Here's the log:

==> mnt-<SNIP>_data1.log <==
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: selecting local read_child <SNIP>_data1-client-3" repeated 4 times between [2019-03-19 14:40:50.741147] and [2019-03-19 14:40:56.874832]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-03-19 14:40:57
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.5
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7ff841f8364c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7ff841f8dd26]
/lib64/libc.so.6(+0x36160)[0x7ff84114a160]
/lib64/libc.so.6(gsignal+0x110)[0x7ff84114a0e0]
/lib64/libc.so.6(abort+0x151)[0x7ff84114b6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7ff8411426fa]
/lib64/libc.so.6(+0x2e772)[0x7ff841142772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7ff8414d80b8]
/usr/lib64/glusterfs/5.5/xlator/cluster/replicate.so(+0x5de3d)[0x7ff839fbae3d]
/usr/lib64/glusterfs/5.5/xlator/cluster/replicate.so(+0x70d51)[0x7ff839fcdd51]
/usr/lib64/glusterfs/5.5/xlator/protocol/client.so(+0x58e1f)[0x7ff83a252e1f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7ff841d4e820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7ff841d4eb6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff841d4b063]
/usr/lib64/glusterfs/5.5/rpc-transport/socket.so(+0xa0ce)[0x7ff83b9690ce]
/usr/lib64/libglusterfs.so.0(+0x85519)[0x7ff841fe1519]
/lib64/libpthread.so.0(+0x7559)[0x7ff8414d5559]
/lib64/libc.so.6(clone+0x3f)[0x7ff84120c81f]
---------

Comment 32 Nithya Balachandran 2019-03-27 06:07:03 UTC

Artem, I am closing this BZ as the crash the others reported (the one fixed by turning off write-behind) has been fixed. We will continue to track the crash you are seeing in https://bugzilla.redhat.com/show_bug.cgi?id=1690769 as that looks like a different problem.

Comment 33 Shyamsundar 2019-03-27 13:44:02 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.5, please open a new bug report.

glusterfs-5.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.