1674406 – glusterfs FUSE client crashing every few days with 'Failed to dispatch handler'

Bug 1674406 - glusterfs FUSE client crashing every few days with 'Failed to dispatch handler'

Summary: glusterfs FUSE client crashing every few days with 'Failed to dispatch handler'

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	write-behind
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1671556
Blocks:	glusterfs-5.4 1676356 Gluster_5_Affecting_oVirt_4.3 1678570 1691292
TreeView+	depends on / blocked

Reported:	2019-02-11 09:46 UTC by Raghavendra G
Modified:	2019-03-21 11:04 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:	1671556
Clones:	1676356 1678570 1691292 (view as bug list)
Environment:
Last Closed:	2019-02-19 05:54:08 UTC
Regression:	---
Mount Type:	fuse
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22189	0	None	Open	performance/write-behind: fix use-after-free in readdirp	2019-02-19 05:53:44 UTC
Gluster.org Gerrit	22227	0	None	Merged	performance/write-behind: handle call-stub leaks	2019-02-19 05:54:07 UTC

Description Raghavendra G 2019-02-11 09:46:26 UTC

+++ This bug was initially created as a clone of Bug #1671556 +++

This is a re-post of my FUSE crash report from BZ1651246. That issue is for a crash in the FUSE client. Mine is too, but I was asked in that bug to open a new issue, so here you go. :)



My servers (two, in a 'replica 2' setup) publish two volumes. One is Web site content, about 110GB; the other is Web config files, only a few megabytes. (Wasn't worth building extra servers for that second volume.) FUSE clients have been crashing on the larger volume every three or four days. I can't reproduce this on-demand, unfortunately, but I've got several cores from previous crashes that may be of value to you.

I'm using Gluster 5.3 from the RPMs provided by the CentOS Storage SIG, on a Red Hat Enterprise Linux 7.x system.


The client's logs show many hundreds of instances of this (I don't know if it's related):
[2019-01-29 08:14:16.542674] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384) [0x7fa171ead384] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e) [0x7fa1720bee3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7fa1809cc2ad] ) 0-dict: dict is NULL [Invalid argument]

Then, when the client's glusterfs process crashes, this is logged:

The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 871 times between [2019-01-29 08:12:48.390535] and [2019-01-29 08:14:17.100279]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-01-29 08:14:17
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84]
/lib64/libc.so.6(+0x36280)[0x7fa17f03c280]
/lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d]
/lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2]
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a]
/lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5]
/lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead]
---------



Info on the volumes themselves, gathered from one of my servers:

[davidsmith@wuit-s-10889 ~]$ sudo gluster volume info all

Volume Name: web-config
Type: Replicate
Volume ID: 6c5dce6e-e64e-4a6d-82b3-f526744b463d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-config
Brick2: 172.23.128.27:/data/web-config
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
server.event-threads: 4
client.event-threads: 4
cluster.min-free-disk: 1
cluster.quorum-count: 2
cluster.quorum-type: fixed
network.ping-timeout: 10
auth.allow: *
performance.readdir-ahead: on

Volume Name: web-content
Type: Replicate
Volume ID: fcabc15f-0cec-498f-93c4-2d75ad915730
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-content
Brick2: 172.23.128.27:/data/web-content
Options Reconfigured:
network.ping-timeout: 10
cluster.quorum-type: fixed
cluster.quorum-count: 2
performance.readdir-ahead: on
auth.allow: *
cluster.min-free-disk: 1
client.event-threads: 4
server.event-threads: 4
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
performance.cache-size: 4GB



gluster> volume status all detail
Status of volume: web-config
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5612
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5540
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277

Status of volume: web-content
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5649
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5567
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277



I'll attach a couple of the core files generated by the crashing glusterfs instances, size limits permitting (they range from 3 to 8 GB). If I can't attach them, I'll find somewhere to host them.

--- Additional comment from Artem Russakovskii on 2019-01-31 22:26:25 UTC ---

Also reposting my comment from https://bugzilla.redhat.com/show_bug.cgi?id=1651246.



I wish I saw this bug report before I updated from rock solid 4.1 to 5.3. Less than 24 hours after upgrading, I already got a crash and had to unmount, kill gluster, and remount:


[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-01-31 09:38:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
---------

Do the pending patches fix the crash or only the repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, not too sure how to make it core dump.

If it's not fixed by the patches above, has anyone already opened a ticket for the crashes that I can join and monitor? This is going to create a massive problem for us since production systems are crashing.

Thanks.

--- Additional comment from David E. Smith on 2019-01-31 22:31:47 UTC ---

Actually, I ran the cores through strings and grepped for a few things like passwords -- as you'd expect from a memory dump from a Web server, there's a log of sensitive information in there. Is there a safe/acceptable way to send the cores only to developers that can use them, or otherwise not have to make them publicly available while still letting the Gluster devs benefit from analyzing them?

--- Additional comment from Ravishankar N on 2019-02-01 05:51:19 UTC ---

(In reply to David E. Smith from comment #2)
> Actually, I ran the cores through strings and grepped for a few things like
> passwords -- as you'd expect from a memory dump from a Web server, there's a
> log of sensitive information in there. Is there a safe/acceptable way to
> send the cores only to developers that can use them, or otherwise not have
> to make them publicly available while still letting the Gluster devs benefit
> from analyzing them?

Perhaps you could upload it to a shared Dropbox folder with view/download access to the red hat email IDs I've CC'ed to this email (including me) to begin with. 

Note: I upgraded a 1x2 replica volume with 1 fuse client from v4.1.7 to v5.3 and did some basic I/O (kernel untar and iozone) and did not observe any crashes, so maybe this this something that is hit under extreme I/O or memory pressure. :-(

--- Additional comment from Artem Russakovskii on 2019-02-02 20:17:15 UTC ---

The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this?

In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround.

By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process.


monit check:
check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
  start program  = "/bin/mount  /mnt/glusterfs_data1"
  stop program  = "/bin/umount /mnt/glusterfs_data1"
  if space usage > 90% for 5 times within 15 cycles
    then alert else if succeeded for 10 cycles then alert


stack trace:
[2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-01 23:22:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]

--- Additional comment from David E. Smith on 2019-02-05 02:59:24 UTC ---

I've added the five of you to our org's Box account; all of you should have invitations to a shared folder, and I'm uploading a few of the cores now. I hope they're of value to you.

The binaries are all from the CentOS Storage SIG repo at https://buildlogs.centos.org/centos/7/storage/x86_64/gluster-5/ . They're all current as of a few days ago:

[davidsmith@wuit-s-10882 ~]$ rpm -qa | grep gluster
glusterfs-5.3-1.el7.x86_64
glusterfs-client-xlators-5.3-1.el7.x86_64
glusterfs-fuse-5.3-1.el7.x86_64
glusterfs-libs-5.3-1.el7.x86_64

--- Additional comment from Nithya Balachandran on 2019-02-05 11:00:04 UTC ---

(In reply to David E. Smith from comment #5)
> I've added the five of you to our org's Box account; all of you should have
> invitations to a shared folder, and I'm uploading a few of the cores now. I
> hope they're of value to you.
> 
> The binaries are all from the CentOS Storage SIG repo at
> https://buildlogs.centos.org/centos/7/storage/x86_64/gluster-5/ . They're
> all current as of a few days ago:
> 
> [davidsmith@wuit-s-10882 ~]$ rpm -qa | grep gluster
> glusterfs-5.3-1.el7.x86_64
> glusterfs-client-xlators-5.3-1.el7.x86_64
> glusterfs-fuse-5.3-1.el7.x86_64
> glusterfs-libs-5.3-1.el7.x86_64

Thanks. We will take a look and get back to you.

--- Additional comment from Nithya Balachandran on 2019-02-05 16:43:45 UTC ---

David,

Can you try mounting the volume with the option  lru-limit=0 and let us know if you still see the crashes?

Regards,
Nithya

--- Additional comment from Nithya Balachandran on 2019-02-06 07:23:49 UTC ---

Initial analysis of one of the cores:

[root@rhgs313-7 gluster-5.3]# gdb -c core.6014 /usr/sbin/glusterfs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --direct-io-mode=disable --fuse-mountopts=noatime,context="'.

Program terminated with signal 11, Segmentation fault.
#0  __inode_ctx_free (inode=inode@entry=0x7fa0d0349af8) at inode.c:410
410	            if (!xl->call_cleanup && xl->cbks->forget)

(gdb) bt
#0  __inode_ctx_free (inode=inode@entry=0x7fa0d0349af8) at inode.c:410
#1  0x00007fa1809e90a2 in __inode_destroy (inode=0x7fa0d0349af8) at inode.c:432
#2  inode_table_prune (table=table@entry=0x7fa15800c3c0) at inode.c:1696
#3  0x00007fa1809e9f96 in inode_forget_with_unref (inode=0x7fa0d0349af8, nlookup=128) at inode.c:1273
#4  0x00007fa177dae4e1 in do_forget (this=<optimized out>, unique=<optimized out>, nodeid=<optimized out>, nlookup=<optimized out>) at fuse-bridge.c:726
#5  0x00007fa177dae5bd in fuse_forget (this=<optimized out>, finh=0x7fa0a41da500, msg=<optimized out>, iobuf=<optimized out>) at fuse-bridge.c:741
#6  0x00007fa177dc5d7a in fuse_thread_proc (data=0x557a0e8ffe20) at fuse-bridge.c:5125
#7  0x00007fa17f83bdd5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007fa17f103ead in msync () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()
(gdb) f 0
#0  __inode_ctx_free (inode=inode@entry=0x7fa0d0349af8) at inode.c:410
410	            if (!xl->call_cleanup && xl->cbks->forget)
(gdb) l
405	    for (index = 0; index < inode->table->xl->graph->xl_count; index++) {
406	        if (inode->_ctx[index].value1 || inode->_ctx[index].value2) {
407	            xl = (xlator_t *)(long)inode->_ctx[index].xl_key;
408	            old_THIS = THIS;
409	            THIS = xl;
410	            if (!xl->call_cleanup && xl->cbks->forget)
411	                xl->cbks->forget(xl, inode);
412	            THIS = old_THIS;
413	        }
414	    }
(gdb) p *xl
Cannot access memory at address 0x0

(gdb) p index
$1 = 6

(gdb) p inode->table->xl->graph->xl_count
$3 = 13
(gdb) p inode->_ctx[index].value1
$4 = 0
(gdb) p inode->_ctx[index].value2
$5 = 140327960119304
(gdb) p/x inode->_ctx[index].value2
$6 = 0x7fa0a6370808


Based on the graph, the xlator with index = 6 is
(gdb) p ((xlator_t*)  inode->table->xl->graph->top)->next->next->next->next->next->next->next->name
$31 = 0x7fa16c0122e0 "web-content-read-ahead"
(gdb) p ((xlator_t*)  inode->table->xl->graph->top)->next->next->next->next->next->next->next->xl_id
$32 = 6

But read-ahead does not update the inode_ctx at all. There seems to be some sort of memory corruption happening here but that needs further analysis.

--- Additional comment from David E. Smith on 2019-02-07 17:41:17 UTC ---

As of this morning, I've added the lru-limit mount option to /etc/fstab on my servers. Was on vacation, didn't see the request until this morning. For the sake of reference, here's the full fstab lines, edited only to remove hostnames and add placeholders. (Note that I've never had a problem with the 'web-config' volume, which is very low-traffic and only a few megabytes in size; the problems always are the much more heavily-used 'web-content' volume.)

gluster-server-1:/web-config  /etc/httpd/conf.d  glusterfs  defaults,_netdev,noatime,context=unconfined_u:object_r:httpd_config_t:s0,backupvolfile-server=gluster-server-2,direct-io-mode=disable,lru-limit=0 0 0
gluster-server-1:/web-content /var/www/html      glusterfs  defaults,_netdev,noatime,context=unconfined_u:object_r:httpd_sys_rw_content_t:s0,backupvolfile-server=gluster-server-2,direct-io-mode=disable,lru-limit=0 0 0

--- Additional comment from David E. Smith on 2019-02-07 17:58:26 UTC ---

Ran a couple of the glusterfs logs through the print-backtrace script. They all start with what you'd normally expect (clone, start_thread) and all end with (_gf_msg_backtrace_nomem) but they're all doing different things in the middle. It looks sorta like a memory leak or other memory corruption. Since it started happening on both of my servers after upgrading to 5.2 (and continued with 5.3), I really doubt it's a hardware issue -- the FUSE clients are both VMs, on hosts a few miles apart, so the odds of host RAM going wonky in both places at exactly that same time are ridiculous.

Bit of a stretch, but do you think there would be value in my rebuilding the RPMs locally, to try to rule out anything on CentOS' end?


/lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610] _gf_msg_backtrace_nomem ??:0
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84] gf_print_trace ??:0
/lib64/libc.so.6(+0x36280)[0x7fa17f03c280] __restore_rt ??:0
/lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d] __inode_ctx_free ??:0
/lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2] inode_table_prune ??:0
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96] inode_forget_with_unref ??:0
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd] fuse_forget ??:0
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a] fuse_thread_proc ??:0
/lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5] start_thread ??:0
/lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead] __clone ??:0


/lib64/libglusterfs.so.0(+0x26610)[0x7f36aff72610] _gf_msg_backtrace_nomem ??:0
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f36aff7cb84] gf_print_trace ??:0
/lib64/libc.so.6(+0x36280)[0x7f36ae5d6280] __restore_rt ??:0
/lib64/libglusterfs.so.0(+0x36779)[0x7f36aff82779] __inode_unref ??:0
/lib64/libglusterfs.so.0(inode_unref+0x23)[0x7f36aff83203] inode_unref ??:0
/lib64/libglusterfs.so.0(gf_dirent_entry_free+0x2b)[0x7f36aff9ec4b] gf_dirent_entry_free ??:0
/lib64/libglusterfs.so.0(gf_dirent_free+0x2b)[0x7f36aff9ecab] gf_dirent_free ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x7480)[0x7f36a215b480] afr_readdir_cbk ??:0
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x60bca)[0x7f36a244dbca] client4_0_readdirp_cbk ??:0
/lib64/libgfrpc.so.0(+0xec70)[0x7f36afd3ec70] rpc_clnt_handle_reply ??:0
/lib64/libgfrpc.so.0(+0xf043)[0x7f36afd3f043] rpc_clnt_notify ??:0
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f36afd3af23] rpc_transport_notify ??:0
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7f36a492737b] socket_event_handler ??:0
/lib64/libglusterfs.so.0(+0x8aa49)[0x7f36affd6a49] event_dispatch_epoll_worker ??:0
/lib64/libpthread.so.0(+0x7dd5)[0x7f36aedd5dd5] start_thread ??:0
/lib64/libc.so.6(clone+0x6d)[0x7f36ae69dead] __clone ??:0


/lib64/libglusterfs.so.0(+0x26610)[0x7f7e13de0610] _gf_msg_backtrace_nomem ??:0
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f7e13deab84] gf_print_trace ??:0
/lib64/libc.so.6(+0x36280)[0x7f7e12444280] __restore_rt ??:0
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f7e12c45c30] pthread_mutex_lock ??:0
/lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f7e13e0bc3c] __gf_free ??:0
/lib64/libglusterfs.so.0(+0x368ed)[0x7f7e13df08ed] __dentry_unset ??:0
/lib64/libglusterfs.so.0(+0x36b2b)[0x7f7e13df0b2b] __inode_retire ??:0
/lib64/libglusterfs.so.0(+0x36885)[0x7f7e13df0885] __inode_unref ??:0
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x36)[0x7f7e13df1f86] inode_forget_with_unref ??:0
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x857a)[0x7f7e0b1b657a] fuse_batch_forget ??:0
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7f7e0b1cdd7a] fuse_thread_proc ??:0
/lib64/libpthread.so.0(+0x7dd5)[0x7f7e12c43dd5] start_thread ??:0
/lib64/libc.so.6(clone+0x6d)[0x7f7e1250bead] __clone ??:0

--- Additional comment from Nithya Balachandran on 2019-02-08 03:03:20 UTC ---

(In reply to David E. Smith from comment #10)
> Ran a couple of the glusterfs logs through the print-backtrace script. They
> all start with what you'd normally expect (clone, start_thread) and all end
> with (_gf_msg_backtrace_nomem) but they're all doing different things in the
> middle. It looks sorta like a memory leak or other memory corruption. Since
> it started happening on both of my servers after upgrading to 5.2 (and
> continued with 5.3), I really doubt it's a hardware issue -- the FUSE
> clients are both VMs, on hosts a few miles apart, so the odds of host RAM
> going wonky in both places at exactly that same time are ridiculous.
> 
> Bit of a stretch, but do you think there would be value in my rebuilding the
> RPMs locally, to try to rule out anything on CentOS' end?

I don't think so. My guess is there is an error somewhere in the client code when handling inodes. It was never hit earlier because we never freed the inodes before 5.3. With the new inode invalidation feature, we appear to be accessing inodes that were already freed.

Did you see the same crashes in 5.2? If yes, something else might be going wrong.

I had a look at the coredumps you sent - most don't have any symbols (strangely). Of the ones that do, it looks like memory corruption and accessing already freed inodes. There are a few people looking at it but this going to take a while to figure out. In the meantime, let me know if you still see crashes with the lru-limit option.

--- Additional comment from Nithya Balachandran on 2019-02-08 03:18:00 UTC ---

Another user has just reported that the lru-limit did not help with the crashes - let me know if that is your experience as well.

--- Additional comment from Nithya Balachandran on 2019-02-08 12:57:50 UTC ---

We have found the cause of one crash but that has a different backtrace. Unfortunately we have not managed to reproduce the one you reported so we don't know if it is the same cause.

Can you disable write-behind on the volume and let us know if it solves the problem? If yes, it is likely to be the same issue.

--- Additional comment from David E. Smith on 2019-02-09 16:07:08 UTC ---

I did have some crashes with 5.2. (I went from 3.something straight to 5.2, so I'm not going to be too helpful in terms of narrowing down exactly when this issue first appeared, sorry.) I'll see if I still have any of those cores; they all were from several weeks ago, so I may have already cleaned them up.

This morning, one of my clients core dumped with the lru-limit option. It looks like it might be a different crash (in particular, this morning's crash was a SIGABRT, whereas previous crashes were SIGSEGV). I've uploaded that core to the same Box folder, in case it's useful. I'll paste its backtrace in below.

For the write-behind request, do you want me to set 'performance.flush-behind off' or so you mean something else?

--- Additional comment from David E. Smith on 2019-02-09 16:07:49 UTC ---

Backtrace for 2/9/19 crash (as promised above, put it in a separate comment for clarity):

/lib64/libglusterfs.so.0(+0x26610)[0x7f3b31456610] _gf_msg_backtrace_nomem ??:0
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f3b31460b84] gf_print_trace ??:0
/lib64/libc.so.6(+0x36280)[0x7f3b2faba280] __restore_rt ??:0
/lib64/libc.so.6(gsignal+0x37)[0x7f3b2faba207] raise ??:0
/lib64/libc.so.6(abort+0x148)[0x7f3b2fabb8f8] abort ??:0
/lib64/libc.so.6(+0x78d27)[0x7f3b2fafcd27] __libc_message ??:0
/lib64/libc.so.6(+0x81489)[0x7f3b2fb05489] _int_free ??:0
/lib64/libglusterfs.so.0(+0x1a6e9)[0x7f3b3144a6e9] dict_destroy ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x8cf9)[0x7f3b23388cf9] dht_local_wipe ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x4ab90)[0x7f3b233cab90] dht_revalidate_cbk ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x709e5)[0x7f3b236a89e5] afr_lookup_done ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x71198)[0x7f3b236a9198] afr_lookup_metadata_heal_check ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x71cbb)[0x7f3b236a9cbb] afr_lookup_entry_heal ??:0
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x71f99)[0x7f3b236a9f99] afr_lookup_cbk ??:0
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x616d2)[0x7f3b239326d2] client4_0_lookup_cbk ??:0
/lib64/libgfrpc.so.0(+0xec70)[0x7f3b31222c70] rpc_clnt_handle_reply ??:0
/lib64/libgfrpc.so.0(+0xf043)[0x7f3b31223043] rpc_clnt_notify ??:0
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f3b3121ef23] rpc_transport_notify ??:0
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7f3b25e0b37b] socket_event_handler ??:0
/lib64/libglusterfs.so.0(+0x8aa49)[0x7f3b314baa49] event_dispatch_epoll_worker ??:0
/lib64/libpthread.so.0(+0x7dd5)[0x7f3b302b9dd5] start_thread ??:0
/lib64/libc.so.6(clone+0x6d)[0x7f3b2fb81ead] __clone ??:0
[d

--- Additional comment from Raghavendra G on 2019-02-09 17:15:55 UTC ---

(In reply to David E. Smith from comment #14)
> I did have some crashes with 5.2. (I went from 3.something straight to 5.2,
> so I'm not going to be too helpful in terms of narrowing down exactly when
> this issue first appeared, sorry.) I'll see if I still have any of those
> cores; they all were from several weeks ago, so I may have already cleaned
> them up.
> 
> This morning, one of my clients core dumped with the lru-limit option. It
> looks like it might be a different crash (in particular, this morning's
> crash was a SIGABRT, whereas previous crashes were SIGSEGV). I've uploaded
> that core to the same Box folder, in case it's useful. I'll paste its
> backtrace in below.
> 
> For the write-behind request, do you want me to set
> 'performance.flush-behind off' or so you mean something else?

gluster volume set <volname> performance.write-behind off

--- Additional comment from Nithya Balachandran on 2019-02-11 04:44:08 UTC ---

Thanks David. I'm going to hold off on looking at the coredump until we hear back from you on whether disabling performance.write-behind works. The different backtraces could be symptoms of the same underlying issue where gluster tries to access already freed memory.

Comment 1 Worker Ant 2019-02-11 09:53:16 UTC

REVIEW: https://review.gluster.org/22189 (performance/write-behind: fix use-after-free in readdirp) posted (#1) for review on master by Raghavendra G

Comment 2 Worker Ant 2019-02-19 02:40:41 UTC

REVIEW: https://review.gluster.org/22227 (performance/write-behind: handle call-stub leaks) posted (#1) for review on master by Raghavendra G

Comment 3 Worker Ant 2019-02-19 05:53:46 UTC

REVIEW: https://review.gluster.org/22189 (performance/write-behind: fix use-after-free in readdirp) merged (#10) on master by Raghavendra G

Comment 4 Worker Ant 2019-02-19 05:54:08 UTC

REVIEW: https://review.gluster.org/22227 (performance/write-behind: handle call-stub leaks) merged (#2) on master by Raghavendra G

Note You need to log in before you can comment on or make changes to this bug.