1651246 – Failed to dispatch handler

Bug 1651246 - Failed to dispatch handler

Summary: Failed to dispatch handler

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	---
Assignee:	Milind Changire
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	glusterfs-5.4 Gluster_5_Affecting_oVirt_4.3 1683900
TreeView+	depends on / blocked

Reported:	2018-11-19 14:15 UTC by waza123
Modified:	2019-03-27 13:44 UTC (History)
CC List:	21 users (show)
Fixed In Version:	glusterfs-5.5
Clone Of:
Clones:	1683900 (view as bug list)
Environment:
Last Closed:	2019-02-25 15:23:43 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Mount Log (212.37 KB, text/plain) 2019-01-30 16:08 UTC, Digiteyes	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22134	None	Open	socket: fix issue when socket write return with EAGAIN	2019-02-04 14:50:29 UTC
Gluster.org Gerrit	22135	None	Open	socket: don't pass return value from protocol handler to event handler	2019-02-04 14:48:48 UTC
Gluster.org Gerrit	22221	None	Merged	socket: socket event handlers now return void	2019-02-18 02:46:11 UTC
Gluster.org Gerrit	22237	None	Merged	socket: socket event handlers now return void	2019-02-25 15:23:41 UTC

Description waza123 2018-11-19 14:15:22 UTC

I have this errors in logs, and glusterfs randomlly crashes with mem_pool.c 330 line (Free memory)
Gluster v5.1


The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 465 times between [2018-11-19 13:58:04.457516] and [2018-11-19 14:00:02.137868]
[2018-11-19 14:00:04.392558] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 389 times between [2018-11-19 14:00:04.392558] and [2018-11-19 14:01:58.224754]
[2018-11-19 14:02:04.412850] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 130 times between [2018-11-19 14:02:04.412850] and [2018-11-19 14:03:58.232651]
[2018-11-19 14:04:04.418550] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 110 times between [2018-11-19 14:04:04.418550] and [2018-11-19 14:06:04.468980]
[2018-11-19 14:06:04.627504] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 140 times between [2018-11-19 14:06:04.627504] and [2018-11-19 14:08:04.600312]
[2018-11-19 14:08:07.794521] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 95 times between [2018-11-19 14:08:07.794521] and [2018-11-19 14:10:04.345444]
[2018-11-19 14:10:07.569899] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 37 times between [2018-11-19 14:10:07.569899] and [2018-11-19 14:12:04.636673]
[2018-11-19 14:12:07.198290] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler

Comment 1 waza123 2018-11-19 14:27:46 UTC

Also maybe this can help, I have many errors, about 100 with "no such file"
how this can be fixed manually ?


[2018-11-19 13:32:08.503733] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:6c0a8545-a83c-499c-a834-959b554e0094> (6c0a8545-a83c-499c-a834-959b554e0094) [No such file or directory]
[2018-11-19 13:32:08.504201] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:0fed92d4-fd54-462b-b658-fed053849237> (0fed92d4-fd54-462b-b658-fed053849237) [No such file or directory]
[2018-11-19 13:32:08.505653] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:6c0a8545-a83c-499c-a834-959b554e0094> (6c0a8545-a83c-499c-a834-959b554e0094) [No such file or directory]
[2018-11-19 13:32:08.505912] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-0: remote operation failed. Path: <gfid:6c0a8545-a83c-499c-a834-959b554e0094> (6c0a8545-a83c-499c-a834-959b554e0094) [No such file or directory]
[2018-11-19 13:32:08.506628] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-0: remote operation failed. Path: <gfid:0fed92d4-fd54-462b-b658-fed053849237> (0fed92d4-fd54-462b-b658-fed053849237) [No such file or directory]
[2018-11-19 13:32:08.509315] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:0fed92d4-fd54-462b-b658-fed053849237> (0fed92d4-fd54-462b-b658-fed053849237) [No such file or directory]
[2018-11-19 13:37:38.681111] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:b4bf758f-9a7a-412f-b3bc-25ce3c43ea00> (b4bf758f-9a7a-412f-b3bc-25ce3c43ea00) [No such file or directory]
[2018-11-19 13:37:38.681199] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-0: remote operation failed. Path: <gfid:b4bf758f-9a7a-412f-b3bc-25ce3c43ea00> (b4bf758f-9a7a-412f-b3bc-25ce3c43ea00) [No such file or directory]
[2018-11-19 13:37:38.681388] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:b4bf758f-9a7a-412f-b3bc-25ce3c43ea00> (b4bf758f-9a7a-412f-b3bc-25ce3c43ea00) [No such file or directory]
[2018-11-19 13:37:38.685464] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-0: remote operation failed. Path: <gfid:f32ea22c-6705-4b33-9eeb-b6d6f9a03911> (f32ea22c-6705-4b33-9eeb-b6d6f9a03911) [No such file or directory]
[2018-11-19 13:37:38.685467] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:f32ea22c-6705-4b33-9eeb-b6d6f9a03911> (f32ea22c-6705-4b33-9eeb-b6d6f9a03911) [No such file or directory]
[2018-11-19 13:37:38.685727] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:f32ea22c-6705-4b33-9eeb-b6d6f9a03911> (f32ea22c-6705-4b33-9eeb-b6d6f9a03911) [No such file or directory]
[2018-11-19 13:37:38.686705] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:a168e718-68cb-4e66-83e6-128cd0b7374d> (a168e718-68cb-4e66-83e6-128cd0b7374d) [No such file or directory]
[2018-11-19 13:37:38.686740] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:a168e718-68cb-4e66-83e6-128cd0b7374d> (a168e718-68cb-4e66-83e6-128cd0b7374d) [No such file or directory]

Comment 2 waza123 2018-11-19 14:28:59 UTC

# gluster volume info

Volume Name: hadoop_volume
Type: Disperse
Volume ID: 5ad3899f-cbc0-4032-b5ea-a0cf7c775d73
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: hdd1:/hadoop
Brick2: hdd2:/hadoop
Brick3: hdd3:/hadoop
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on

Comment 3 waza123 2018-11-19 14:59:21 UTC

here it crashed:


[2018-11-19 14:50:09.900071] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 152 times between [2018-11-19 14:50:09.900071] and [2018-11-19 14:52:07.480234]

[2018-11-19 14:52:09.382050] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler

[2018-11-19 14:53:39.492132] E [mem-pool.c:322:__gf_free] (-->/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4) [0x7f0e23aec8a4] -->/usr/lib/libglusterfs.so.0(+0x1a24e) [0x7f0e2e3b424e] -->/usr/lib/libglusterfs.so.0(__gf_free+0x9b) [0x7f0e2e3e867b] ) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == header->magic
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 124 times between [2018-11-19 14:52:09.382050] and [2018-11-19 14:53:39.233214]
pending frames:
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(READDIRP)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(FLUSH)
frame : type(1) op(CREATE)
frame : type(1) op(STAT)
frame : type(1) op(FSTAT)
frame : type(1) op(OPEN)
frame : type(1) op(FLUSH)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(FSTAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(OPENDIR)
frame : type(1) op(STAT)
frame : type(1) op(OPENDIR)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(FLUSH)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(FSTAT)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(FSTAT)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(FLUSH)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(FLUSH)
frame : type(1) op(STAT)
frame : type(1) op(FSTAT)
frame : type(1) op(READDIRP)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(FLUSH)
frame : type(1) op(READDIRP)
frame : type(1) op(OPENDIR)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(STAT)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(FLUSH)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(FLUSH)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(READDIRP)
frame : type(1) op(FLUSH)
frame : type(1) op(OPENDIR)
frame : type(1) op(OPENDIR)
frame : type(1) op(FSTAT)
frame : type(1) op(FLUSH)
frame : type(1) op(OPENDIR)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2018-11-19 14:53:39
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.1
/usr/lib/libglusterfs.so.0(+0x256ea)[0x7f0e2e3bf6ea]
/usr/lib/libglusterfs.so.0(gf_print_trace+0x2e7)[0x7f0e2e3c9ab7]
/lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f0e2d7ab4b0]
/usr/lib/libglusterfs.so.0(__gf_free+0xb0)[0x7f0e2e3e8690]
/usr/lib/libglusterfs.so.0(+0x1a24e)[0x7f0e2e3b424e]
/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4)[0x7f0e23aec8a4]
/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0x14528)[0x7f0e23af1528]
/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0x18b52)[0x7f0e23af5b52]
/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0x11fa3)[0x7f0e23aeefa3]
/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xb3c1)[0x7f0e23ae83c1]
/usr/lib/glusterfs/5.1/xlator/cluster/distribute.so(+0x7e70a)[0x7f0e238a070a]
/usr/lib/glusterfs/5.1/xlator/performance/write-behind.so(+0x563a)[0x7f0e2802963a]
/usr/lib/libglusterfs.so.0(call_resume_keep_stub+0x75)[0x7f0e2e3e51f5]
/usr/lib/glusterfs/5.1/xlator/performance/write-behind.so(+0x9289)[0x7f0e2802d289]
/usr/lib/glusterfs/5.1/xlator/performance/write-behind.so(+0x939b)[0x7f0e2802d39b]
/usr/lib/glusterfs/5.1/xlator/performance/write-behind.so(+0xa4d8)[0x7f0e2802e4d8]
/usr/lib/glusterfs/5.1/xlator/performance/read-ahead.so(+0x4113)[0x7f0e23616113]
/usr/lib/libglusterfs.so.0(default_flush+0xb2)[0x7f0e2e4470c2]
/usr/lib/libglusterfs.so.0(default_flush+0xb2)[0x7f0e2e4470c2]
/usr/lib/libglusterfs.so.0(default_flush+0xb2)[0x7f0e2e4470c2]
/usr/lib/libglusterfs.so.0(default_flush_resume+0x1e5)[0x7f0e2e45df35]
/usr/lib/libglusterfs.so.0(call_resume+0x75)[0x7f0e2e3e5035]
/usr/lib/glusterfs/5.1/xlator/performance/open-behind.so(+0x4950)[0x7f0e22dd5950]
/usr/lib/glusterfs/5.1/xlator/performance/open-behind.so(+0x4db8)[0x7f0e22dd5db8]
/usr/lib/libglusterfs.so.0(default_flush+0xb2)[0x7f0e2e4470c2]
/usr/lib/libglusterfs.so.0(default_flush_resume+0x1e5)[0x7f0e2e45df35]
/usr/lib/libglusterfs.so.0(call_resume+0x75)[0x7f0e2e3e5035]
/usr/lib/glusterfs/5.1/xlator/performance/io-threads.so(+0x5a18)[0x7f0e229a6a18]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f0e2db476ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0e2d87d41d]

Comment 4 waza123 2018-11-19 15:00:25 UTC

gdb report:

Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=hdd1 --volfile-id=/had'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f0e2e3e8690 in __gf_free (free_ptr=0x7f0e104b0758) at mem-pool.c:330
330         GF_ASSERT(GF_MEM_TRAILER_MAGIC ==
[Current thread is 1 (Thread 0x7f0e201e7700 (LWP 11458))]
(gdb)

Comment 5 waza123 2018-11-19 15:04:38 UTC

again crashed but some diff lines:

[2018-11-19 14:58:08.327343] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2018-11-19 14:59:29.845899] E [mem-pool.c:322:__gf_free] (-->/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4) [0x7f825d9d38a4] -->/usr/lib/libglusterfs.so.0(+0x1a24e) [0x7f826406024e] -->/usr/lib/libglusterfs.so.0(__gf_free+0x9b) [0x7f826409467b] ) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == header->magic
[2018-11-19 14:59:29.845965] E [mem-pool.c:331:__gf_free] (-->/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4) [0x7f825d9d38a4] -->/usr/lib/libglusterfs.so.0(+0x1a24e) [0x7f826406024e] -->/usr/lib/libglusterfs.so.0(__gf_free+0xf6) [0x7f82640946d6] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size)
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 280 times between [2018-11-19 14:58:08.327343] and [2018-11-19 14:59:28.480961]
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
etc..

Comment 6 waza123 2018-11-20 13:56:29 UTC

I downgraded to 3.12.15 because 5.1 is not stable at all (clear install)

Documentation for downgrade for someone who need this:

backup your cluster data somewhere..

remove all instalation files

gluster volume stop hadoop_volume
gluster volume delete hadoop_volume
killall glusterfs glusterfsd glusterd glustereventsd python

# remove all files from bricks:

rm -rf /hadoop/* && rm -rf /hadoop/.glusterfs

# remove all configs
rm -rf /usr/var/lib/glusterd && rm -rf /usr/var/log/glusterfs && rm -rf /usr/var/run/gluster && rm -rf /usr/etc/glusterfs

# install new gluster, mount, copy all files to new cluster from backup.

Comment 7 vanessa.haro 2019-01-08 21:56:47 UTC

We saw this as well in V5.1.1, The stack back traces were:
(gdb) t a a bt

Thread 24 (LWP 20898):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0xc)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000000 in ?? ()

Thread 23 (LWP 20894):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0x8)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000004 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 22 (LWP 20897):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0xb)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000004 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 21 (LWP 20885):
#0  0x00007effe124da82 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:347
#1  0x00007effe2e02430 in ?? ()
#2  0x00007effe2e06050 in ?? ()
#3  0x00007effd7fbde60 in ?? ()
#4  0x00007effe2e06098 in ?? ()
#5  0x00007effe24258a8 in syncenv_task () from /lib64/libglusterfs.so.0
#6  0x00007effe24267f0 in syncenv_processor () from /lib64/libglusterfs.so.0
#7  0x00007effe1249dc5 in start_thread (arg=0x7effd7fbe700) at pthread_create.c:308
#8  0x00007effe0b1776d in putspent (p=0x0, stream=0x7effd7fbe700) at putspent.c:60
#9  0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 20 (LWP 20880):
#0  0x00007effe124aef7 in pthread_join (threadid=139637260523264, thread_return=0x0) at pthread_join.c:64
#1  0x00007effe2449968 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2  0x00007effe28f94cb in main ()

Thread 19 (LWP 20888):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0x2)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000000 in ?? ()

Thread 18 (LWP 20883):
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x100000007:

Thread 17 (LWP 20890):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=3, result=0x4)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000007 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 16 (LWP 20886):
#0  0x00007effe124da82 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:347
#1  0x00007effe2e02430 in ?? ()
#2  0x00007effe2e06050 in ?? ()
#3  0x00007effd77bce60 in ?? ()
#4  0x00007effe2e06098 in ?? ()
#5  0x00007effe24258a8 in syncenv_task () from /lib64/libglusterfs.so.0
#6  0x00007effe24267f0 in syncenv_processor () from /lib64/libglusterfs.so.0
#7  0x00007effe1249dc5 in start_thread (arg=0x7effd77bd700) at pthread_create.c:308
#8  0x00007effe0b1776d in putspent (p=0x0, stream=0x7effd77bd700) at putspent.c:60
#9  0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 15 (LWP 20892):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=5, result=0x6)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000001 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 14 (LWP 20889):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0x3)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000000 in ?? ()

Thread 13 (LWP 20896):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0xa)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000004 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 12 (LWP 20895):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0x9)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000004 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 11 (LWP 20900):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0xe)
    at ../nss/getXXbyYY_r.c:297
---Type <return> to continue, or q <return> to quit---
#1  0x0000000000000004 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 10 (LWP 20906):
#0  0x00007effe124d6d5 in __pthread_cond_init (cond=0x7effe2e00ef4, cond_attr=0x80) at pthread_cond_init.c:40
#1  0x0000000000000000 in ?? ()

Thread 9 (LWP 20893):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=3, result=0x7)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000007 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 8 (LWP 20881):
#0  0x00007effe1250bdd in __recvmsg_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000000000000 in ?? ()

Thread 7 (LWP 20891):
#0  0x00007effe12501bd in unwind_stop (version=2013313424, actions=<optimized out>, exc_class=2, exc_obj=0xffffffffffffffff,
    context=0x7eff7800b990, stop_parameter=0x519b) at unwind.c:98
#1  0x0000000000000000 in ?? ()

Thread 6 (LWP 20882):
#0  0x00007effe1251101 in __libc_tcdrain (fd=32511) at ../sysdeps/unix/sysv/linux/tcdrain.c:34
#1  0x0000000000000000 in ?? ()

Thread 5 (LWP 20905):
#0  0x00007effe0b0e5c0 in tdestroy_recurse (freefct=0x7effe2df4d70, root=0x7eff7802dec0) at tsearch.c:640
#1  tdestroy_recurse (freefct=0x7effe2df4d70, root=0x7effbe7fbe60) at tsearch.c:641
#2  tdestroy_recurse (freefct=0x7effe2df4d70, root=0x7effe2df4e00) at tsearch.c:639
#3  tdestroy_recurse (freefct=0x7effe2df4d70, root=0x7effe2dee590) at tsearch.c:641
---Type <return> to continue, or q <return> to quit---
#4  tdestroy_recurse (root=0x7effe2de52d8, freefct=0x7effe2df4d70) at tsearch.c:641
#5  0x00007effe2df4e00 in ?? ()
#6  0x00007effbe7fbe60 in ?? ()
#7  0x00007effd97e1b40 in fuse_thread_proc () from /usr/lib64/glusterfs/5.1/xlator/mount/fuse.so
#8  0x00007effe1249dc5 in start_thread (arg=0x7effbe7fc700) at pthread_create.c:308
#9  0x00007effe0b1776d in putspent (p=0x0, stream=0x7effbe7fc700) at putspent.c:60
#10 0x0000000000000000 in ?? ()

Thread 4 (LWP 20901):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0xf)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000000 in ?? ()

Thread 3 (LWP 20899):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=5, result=0xd)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000001 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 2 (LWP 20902):
#0  0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0,
    buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0x10)
    at ../nss/getXXbyYY_r.c:297
#1  0x0000000000000000 in ?? ()

Thread 1 (LWP 20887):
#0  0x00007effe2411775 in __gf_free () from /lib64/libglusterfs.so.0
#1  0x00007effe23da649 in dict_destroy () from /lib64/libglusterfs.so.0
#2  0x00007effd48288b4 in afr_local_cleanup () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#3  0x00007effd4802ab4 in afr_transaction_done () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
---Type <return> to continue, or q <return> to quit---
#4  0x00007effd480919a in afr_unlock () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#5  0x00007effd4800819 in afr_changelog_post_op_done () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#6  0x00007effd480362c in afr_changelog_post_op_now () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#7  0x00007effd4804f1b in afr_transaction_start () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#8  0x00007effd480537a in afr_transaction () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#9  0x00007effd47fd562 in afr_fsync () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#10 0x00007effd45971b8 in dht_fsync () from /usr/lib64/glusterfs/5.1/xlator/cluster/distribute.so
#11 0x00007effd42fd093 in wb_fsync_helper () from /usr/lib64/glusterfs/5.1/xlator/performance/write-behind.so
#12 0x00007effe240e1b5 in call_resume_keep_stub () from /lib64/libglusterfs.so.0
#13 0x00007effd43038b9 in wb_do_winds () from /usr/lib64/glusterfs/5.1/xlator/performance/write-behind.so
#14 0x00007effd43039cb in wb_process_queue () from /usr/lib64/glusterfs/5.1/xlator/performance/write-behind.so
#15 0x00007effd4303b5f in wb_fulfill_cbk () from /usr/lib64/glusterfs/5.1/xlator/performance/write-behind.so
#16 0x00007effd45855f9 in dht_writev_cbk () from /usr/lib64/glusterfs/5.1/xlator/cluster/distribute.so
#17 0x00007effd47f020e in afr_writev_unwind () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#18 0x00007effd47f07be in afr_writev_wind_cbk () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so
#19 0x00007effd4abdbc5 in client4_0_writev_cbk () from /usr/lib64/glusterfs/5.1/xlator/protocol/client.so
#20 0x00007effe21b2c70 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#21 0x00007effe21b3043 in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#22 0x00007effe21aef23 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#23 0x00007effd6da937b in socket_event_handler () from /usr/lib64/glusterfs/5.1/rpc-transport/socket.so
#24 0x00007effe244a5f9 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#25 0x00007effe1249dc5 in start_thread (arg=0x7effd54f9700) at pthread_create.c:308
#26 0x00007effe0b1776d in putspent (p=0x0, stream=0x7effd54f9700) at putspent.c:60
#27 0x0000000000000000 in ?? ()

Comment 8 Amgad 2019-01-14 02:28:09 UTC

Any update on resolution?
Is there any fix included in 5.3? or 5.1.??

Comment 9 Guillaume Pavese 2019-01-15 10:43:33 UTC

Similar problem on a newly provisioned ovirt 4.3 cluster (centos 7.6, gluster 5.2-1) :


[2019-01-15 09:32:02.558598] I [MSGID: 100030] [glusterfsd.c:2691:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.2 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=ps-inf-int-kvm-fr-306-210.hostics.fr --volfile-server=10.199.211.7 --volfile-server=10.199.211.5 --volfile-id=/vmstore /rhev/data-center/mnt/glusterSD/ps-inf-int-kvm-fr-306-210.hostics.fr:_vmstore)
[2019-01-15 09:32:02.566701] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-15 09:32:02.581138] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2019-01-15 09:32:02.581272] I [MSGID: 114020] [client.c:2354:notify] 0-vmstore-client-0: parent translators are ready, attempting connect on transport
[2019-01-15 09:32:02.583283] I [MSGID: 114020] [client.c:2354:notify] 0-vmstore-client-1: parent translators are ready, attempting connect on transport
[2019-01-15 09:32:02.583911] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-vmstore-client-0: changing port to 49155 (from 0)
[2019-01-15 09:32:02.585505] I [MSGID: 114020] [client.c:2354:notify] 0-vmstore-client-2: parent translators are ready, attempting connect on transport
[2019-01-15 09:32:02.587413] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-15 09:32:02.587441] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-vmstore-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2019-01-15 09:32:02.587951] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-15 09:32:02.588685] I [MSGID: 114046] [client-handshake.c:1107:client_setvolume_cbk] 0-vmstore-client-0: Connected to vmstore-client-0, attached to remote volume '/gluster_bricks/vmstore/vmstore'.
[2019-01-15 09:32:02.588708] I [MSGID: 108005] [afr-common.c:5237:__afr_handle_child_up_event] 0-vmstore-replicate-0: Subvolume 'vmstore-client-0' came back up; going online.
Final graph:
+------------------------------------------------------------------------------+
  1: volume vmstore-client-0
  2:     type protocol/client
  3:     option opversion 50000
  4:     option clnt-lk-version 1
  5:     option volfile-checksum 0
  6:     option volfile-key /vmstore
  7:     option client-version 5.2
  8:     option process-name fuse
  9:     option process-uuid CTX_ID:e5dad97f-5289-4464-9e2f-36e9bb115118-GRAPH_ID:0-PID:39987-HOST:ps-inf-int-kvm-fr-307-210.hostics.fr-PC_NAME:vmstore-client-0-RECON_NO:-0
 10:     option fops-version 1298437
 11:     option ping-timeout 30
 12:     option remote-host 10.199.211.6
 13:     option remote-subvolume /gluster_bricks/vmstore/vmstore
 14:     option transport-type socket
 15:     option transport.address-family inet
 16:     option filter-O_DIRECT off
 17:     option transport.tcp-user-timeout 0
 18:     option transport.socket.keepalive-time 20
 19:     option transport.socket.keepalive-interval 2
 20:     option transport.socket.keepalive-count 9
 21:     option send-gids true
 22: end-volume
 23:  
 24: volume vmstore-client-1
 25:     type protocol/client
 26:     option ping-timeout 30
 27:     option remote-host 10.199.211.7
 28:     option remote-subvolume /gluster_bricks/vmstore/vmstore
 29:     option transport-type socket
 30:     option transport.address-family inet
 31:     option filter-O_DIRECT off
 32:     option transport.tcp-user-timeout 0
 33:     option transport.socket.keepalive-time 20
 34:     option transport.socket.keepalive-interval 2
 35:     option transport.socket.keepalive-count 9
 36:     option send-gids true
 37: end-volume
 38:  
 39: volume vmstore-client-2
 40:     type protocol/client
 41:     option ping-timeout 30
 42:     option remote-host 10.199.211.5
 43:     option remote-subvolume /gluster_bricks/vmstore/vmstore
 44:     option transport-type socket
 45:     option transport.address-family inet
 46:     option filter-O_DIRECT off
 47:     option transport.tcp-user-timeout 0
 48:     option transport.socket.keepalive-time 20
 49:     option transport.socket.keepalive-interval 2
 50:     option transport.socket.keepalive-count 9
 51:     option send-gids true
 52: end-volume
 53:  
 54: volume vmstore-replicate-0
 55:     type cluster/replicate
 56:     option afr-pending-xattr vmstore-client-0,vmstore-client-1,vmstore-client-2
 57:     option arbiter-count 1
 58:     option data-self-heal-algorithm full
 59:     option eager-lock enable
 60:     option quorum-type auto
 61:     option choose-local off
 62:     option shd-max-threads 8
 63:     option shd-wait-qlength 10000
 64:     option locking-scheme granular
 65:     option granular-entry-heal enable
 66:     option use-compound-fops off
 67:     subvolumes vmstore-client-0 vmstore-client-1 vmstore-client-2
 68: end-volume
 69:  
 70: volume vmstore-dht
 71:     type cluster/distribute
 72:     option lock-migration off
 73:     option force-migration off
 74:     subvolumes vmstore-replicate-0
 75: end-volume
 76:  
 77: volume vmstore-shard
 78:     type features/shard
 79:     subvolumes vmstore-dht
 80: end-volume
 81:  
 82: volume vmstore-write-behind
 83:     type performance/write-behind
 84:     option strict-O_DIRECT on
 85:     subvolumes vmstore-shard
 86: end-volume
 87:  
 88: volume vmstore-readdir-ahead
 89:     type performance/readdir-ahead
 90:     option parallel-readdir off
 91:     option rda-request-size 131072
 92:     option rda-cache-limit 10MB
 93:     subvolumes vmstore-write-behind
 94: end-volume
 95:  
 96: volume vmstore-open-behind
 97:     type performance/open-behind
 98:     subvolumes vmstore-readdir-ahead
 99: end-volume
100:  
101: volume vmstore-md-cache
102:     type performance/md-cache
103:     subvolumes vmstore-open-behind
104: end-volume
105:  
106: volume vmstore
107:     type debug/io-stats
108:     option log-level INFO
109:     option latency-measurement off
110:     option count-fop-hits off
111:     subvolumes vmstore-md-cache
112: end-volume
113:  
114: volume meta-autoload
115:     type meta
116:     subvolumes vmstore
117: end-volume
118:  
+------------------------------------------------------------------------------+
[2019-01-15 09:32:02.590376] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-vmstore-client-2: changing port to 49155 (from 0)
[2019-01-15 09:32:02.592649] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-15 09:32:02.593512] I [MSGID: 114046] [client-handshake.c:1107:client_setvolume_cbk] 0-vmstore-client-2: Connected to vmstore-client-2, attached to remote volume '/gluster_bricks/vmstore/vmstore'.
[2019-01-15 09:32:02.593528] I [MSGID: 108002] [afr-common.c:5588:afr_notify] 0-vmstore-replicate-0: Client-quorum is met
[2019-01-15 09:32:02.594714] I [fuse-bridge.c:4259:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2019-01-15 09:32:02.594746] I [fuse-bridge.c:4870:fuse_graph_sync] 0-fuse: switched to graph 0
[2019-01-15 09:32:06.562678] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-15 09:32:09.435695] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.2/xlator/performance/open-behind.so(+0x3d7c) [0x7f5c279cfd7c] -->/usr/lib64/glusterfs/5.2/xlator/performance/open-behind.so(+0x3bd6) [0x7f5c279cfbd6] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7f5c340ae20d] ) 0-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 7 times between [2019-01-15 09:32:06.562678] and [2019-01-15 09:32:27.578753]
[2019-01-15 09:32:29.966249] W [glusterfsd.c:1481:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f5c32f1ddd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55af1f5bad45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55af1f5babbb] ) 0-: received signum (15), shutting down
[2019-01-15 09:32:29.966265] I [fuse-bridge.c:5897:fini] 0-fuse: Unmounting '/rhev/data-center/mnt/glusterSD/ps-inf-int-kvm-fr-306-210.hostics.fr:_vmstore'.
[2019-01-15 09:32:29.985157] I [fuse-bridge.c:5134:fuse_thread_proc] 0-fuse: initating unmount of /rhev/data-center/mnt/glusterSD/ps-inf-int-kvm-fr-306-210.hostics.fr:_vmstore
[2019-01-15 09:32:29.985434] I [fuse-bridge.c:5902:fini] 0-fuse: Closing fuse connection to '/rhev/data-center/mnt/glusterSD/ps-inf-int-kvm-fr-306-210.hostics.fr:_vmstore'.

Comment 10 Amgad 2019-01-18 05:28:52 UTC

Per 5.2 release note:

     NOTE: Next minor release tentative date: Week of 10th January, 2019

This issue is urgent and impacting customer deployment. Any projection on 5.3 availability and whether a fix will be available.

Comment 11 Emerson Gomes 2019-01-20 08:39:44 UTC

Still happening in 5.3.

Comment 12 Amgad 2019-01-21 01:52:54 UTC

(In reply to waza123 from comment #6)
> I downgraded to 3.12.15 because 5.1 is not stable at all (clear install)
> 
> Documentation for downgrade for someone who need this:
> 
> backup your cluster data somewhere..
> 
> remove all instalation files
> 
> gluster volume stop hadoop_volume
> gluster volume delete hadoop_volume
> killall glusterfs glusterfsd glusterd glustereventsd python
> 
> # remove all files from bricks:
> 
> rm -rf /hadoop/* && rm -rf /hadoop/.glusterfs
> 
> # remove all configs
> rm -rf /usr/var/lib/glusterd && rm -rf /usr/var/log/glusterfs && rm -rf
> /usr/var/run/gluster && rm -rf /usr/etc/glusterfs
> 
> # install new gluster, mount, copy all files to new cluster from backup.

3.12.13 has a memory leak in "readdir-ahead.C". I saw it fixed in 5.3, is it fixed in 3.12.15?

Comment 13 Amgad 2019-01-21 01:54:37 UTC

(In reply to Emerson Gomes from comment #11)
> Still happening in 5.3.

Is anybody looking at it in 5.3? This is a release waited for!!!

Comment 14 Emerson Gomes 2019-01-21 06:56:22 UTC

(In reply to Amgad from comment #13)
> (In reply to Emerson Gomes from comment #11)
> > Still happening in 5.3.
> 
> Is anybody looking at it in 5.3? This is a release waited for!!!

Yes, I have updated to 5.3 yesterday, and issue is still there.

Comment 15 David E. Smith 2019-01-29 15:08:37 UTC

I'm having what appears to be the same issue. Started when I upgraded from 3.12 to 5.2 a few weeks back, and the subsequent upgrade to 5.3 did not resolve the problem.

My servers (two, in a 'replica 2' setup) publish two volumes. One is Web site content, about 110GB; the other is Web config files, only a few megabytes. (Wasn't worth building extra servers for that second volume.) FUSE clients have been crashing on the larger volume every three or four days.

The client's logs show many hundreds of instances of this (I don't know if it's related):
[2019-01-29 08:14:16.542674] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384) [0x7fa171ead384] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e) [0x7fa1720bee3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7fa1809cc2ad] ) 0-dict: dict is NULL [Invalid argument]

Then, when the client's glusterfs process crashes, this is logged:

The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 871 times between [2019-01-29 08:12:48.390535] and [2019-01-29 08:14:17.100279]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-01-29 08:14:17
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84]
/lib64/libc.so.6(+0x36280)[0x7fa17f03c280]
/lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d]
/lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2]
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a]
/lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5]
/lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead]
---------



Info on the volumes themselves, gathered from one of my servers:

[davidsmith@wuit-s-10889 ~]$ sudo gluster volume info all

Volume Name: web-config
Type: Replicate
Volume ID: 6c5dce6e-e64e-4a6d-82b3-f526744b463d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-config
Brick2: 172.23.128.27:/data/web-config
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
server.event-threads: 4
client.event-threads: 4
cluster.min-free-disk: 1
cluster.quorum-count: 2
cluster.quorum-type: fixed
network.ping-timeout: 10
auth.allow: *
performance.readdir-ahead: on

Volume Name: web-content
Type: Replicate
Volume ID: fcabc15f-0cec-498f-93c4-2d75ad915730
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-content
Brick2: 172.23.128.27:/data/web-content
Options Reconfigured:
network.ping-timeout: 10
cluster.quorum-type: fixed
cluster.quorum-count: 2
performance.readdir-ahead: on
auth.allow: *
cluster.min-free-disk: 1
client.event-threads: 4
server.event-threads: 4
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
performance.cache-size: 4GB



gluster> volume status all detail
Status of volume: web-config
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5612
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5540
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277

Status of volume: web-content
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5649
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5567
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277


I have a couple of core files that appear to be from this, but I'm not much of a developer (haven't touched C in fifteen years) so I don't know what to do with them that would be of value in this case.

Comment 16 Digiteyes 2019-01-30 16:03:47 UTC

I have same issue , and my server crash 4-5 time per day , we need urgent bug fix , we cant work any more

Comment 17 Digiteyes 2019-01-30 16:07:45 UTC

[2019-01-30 15:50:39.219564] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8853076771410540308.tmp (ba250583-e103-473e-92de-3e0d87afe8be) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0086.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:50:44.206312] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:50:44.350266] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-6758755102184008102.tmp (32dbb8cb-aec9-4bae-992b-fbd86cd50828) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0017.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:50:45.489090] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-6687721062137662117.tmp (62bbb010-16ff-462c-b0dd-718b0e62a8c7) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0018.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:50:45.551349] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 5 times between [2019-01-30 15:50:45.551349] and [2019-01-30 15:50:56.559333]
[2019-01-30 15:51:02.317536] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8497906571178810383.tmp (15ba641a-cb3f-42d2-b9b5-b17f10e027c8) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0081.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:51:07.031853] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0086.exr (ba250583-e103-473e-92de-3e0d87afe8be) (hash=mothervolume-client-0/cache=mothervolume-client-1) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Copy #2 of Seq_A_A_Sh010_comp_SH030208_v001.0086.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:51:07.109087] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8853076771410540308.tmp (d514f600-f3e6-4639-822a-05e057e1d83c) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0086.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:51:07.620516] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8568940611225156368.tmp (f7efca88-3886-4750-ad2f-4f793fd8487d) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0082.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:51:12.458961] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8711008691317848338.tmp (92ddf1a3-50a9-48b1-be22-7bfa359b5b65) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0084.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:51:15.629779] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 53 times between [2019-01-30 15:51:15.629779] and [2019-01-30 15:51:45.695496]
[2019-01-30 15:51:45.700709] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8497906571178810383.tmp (f853a226-70fb-4537-a629-e1e2cefdcfe7) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0081.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:51:47.398973] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:51:47.588670] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:51:51.885883] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8568940611225156368.tmp (8f5bcab2-e3ba-478a-957c-fdf243216a4e) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0082.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:51:53.453191] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 13 times between [2019-01-30 15:51:53.453191] and [2019-01-30 15:51:56.196530]
[2019-01-30 15:51:56.510824] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8711008691317848338.tmp (45aaff0e-15b5-4b6b-8b41-c4caed57f881) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0084.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:51:57.207664] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 81 times between [2019-01-30 15:51:57.207664] and [2019-01-30 15:52:19.002777]
[2019-01-30 15:52:19.183448] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr (504238f8-7918-496e-835e-3246d16cf35e) (hash=mothervolume-client-1/cache=mothervolume-client-0) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:52:19.257335] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/1446751043051559505.tmp (3dabe7b2-9682-4b33-842e-b144533e97d4) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:52:19.574477] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 4 times between [2019-01-30 15:52:19.574477] and [2019-01-30 15:52:24.127146]
[2019-01-30 15:52:24.656623] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/nuke/Seq_A_A_Sh010_comp_scene_208.v002.nk.autosavet (9ef65443-4745-448d-a8b4-fa3f3bbf7487) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/nuke/Seq_A_A_Sh010_comp_scene_208.v002.nk.autosave ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:52:24.899131] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:52:27.431451] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8497906571178810383.tmp (749b5a12-80a7-47d5-b5d2-2e60cbea57aa) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0081.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:52:30.891799] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:52:31.047076] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:52:32.939577] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8568940611225156368.tmp (952bcc2f-a486-4d61-ade5-329e1a6165a8) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0082.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:52:37.606502] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8711008691317848338.tmp (83ebc0c0-ba5c-4fdc-808e-f343e1ae28e2) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0084.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:52:43.967857] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:52:55.087185] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/nuke/Seq_A_A_Sh010_comp_scene_208.v002.nk.autosavet (4f7b2158-c4b7-4579-8e8a-7ab3dc8d9b0d) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/nuke/Seq_A_A_Sh010_comp_scene_208.v002.nk.autosave ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:53:17.204114] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2765564699243796980.tmp (e72fb096-c4ef-490a-a39c-47531608dd63) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0080.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:53:17.396151] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:53:22.458305] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2694530659197450995.tmp (e02e5460-3557-470b-a7f3-109a9914c692) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0081.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:53:26.229226] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2623496619151105010.tmp (2fa4c400-77a0-459e-b734-8e4f28926859) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0082.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:53:32.149207] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2552462579104759025.tmp (cb9b9130-fc7e-4c04-89e9-7b925c955669) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0083.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 18 times between [2019-01-30 15:53:17.396151] and [2019-01-30 15:53:34.167757]
[2019-01-30 15:53:37.062257] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0098.exr (f20e23cd-76ff-4371-a31b-0a9cf9022860) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Copy #1 of Seq_A_A_Sh010_comp_SH030208_v001.0098.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:53:37.149778] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/1730887203236943445.tmp (f4b57fa7-4de1-4de7-bc06-00e2741c5129) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0098.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:53:37.306807] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2481428539058413040.tmp (27286f68-e368-48c1-b497-0300cc3af4c7) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0084.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:53:38.961986] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr (3dabe7b2-9682-4b33-842e-b144533e97d4) (hash=mothervolume-client-1/cache=mothervolume-client-0) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Copy #1 of Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:53:39.053762] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/1446751043051559505.tmp (3055dc73-8b71-4437-9dab-8219f7ea6189) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:53:43.220690] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2410394499012067055.tmp (41462049-f2d4-4638-b137-c7921219820b) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0085.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:53:44.188358] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 4 times between [2019-01-30 15:53:44.188358] and [2019-01-30 15:53:45.698529]
[2019-01-30 15:53:47.773401] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8497906571178810383.tmp (def787fb-4c77-4049-9629-15db1b4acd36) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0081.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:53:48.345901] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2339360458965721070.tmp (53f63fd7-904e-412d-a789-c2170735a61f) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0086.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
[2019-01-30 15:53:49.291189] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:53:49.450504] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:53:53.495085] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2268326418919375085.tmp (5a339407-94ab-456e-9670-9488c43e5a9e) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0087.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:53:54.919809] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:53:56.335023] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-30 15:53:58.191979] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2197292378873029100.tmp (12a7836b-4e9d-46db-9f03-acd30edee2f1) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0088.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:53:58.920443] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 7 times between [2019-01-30 15:53:58.920443] and [2019-01-30 15:54:00.336410]
[2019-01-30 15:54:00.519418] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr (3055dc73-8b71-4437-9dab-8219f7ea6189) (hash=mothervolume-client-1/cache=mothervolume-client-0) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Copy #2 of Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) 
[2019-01-30 15:54:00.601804] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/1446751043051559505.tmp (1ed49b72-c1d5-4704-9ebf-00814afdcb43) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) 
pending frames:
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash: 
2019-01-30 15:54:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/lib64/libglusterfs.so.0(+0x26610)[0x7f30187de610]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f30187e8b84]
/lib64/libc.so.6(+0x36280)[0x7f3016e42280]
/lib64/libc.so.6(gsignal+0x37)[0x7f3016e42207]
/lib64/libc.so.6(abort+0x148)[0x7f3016e438f8]
/lib64/libc.so.6(+0x78d27)[0x7f3016e84d27]
/lib64/libc.so.6(+0x81489)[0x7f3016e8d489]
/lib64/libglusterfs.so.0(+0x1a6e9)[0x7f30187d26e9]
/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x8cf9)[0x7f300a9a7cf9]
/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x4ab90)[0x7f300a9e9b90]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x616d2)[0x7f300acb86d2]
/lib64/libgfrpc.so.0(+0xec70)[0x7f30185aac70]
/lib64/libgfrpc.so.0(+0xf043)[0x7f30185ab043]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f30185a6f23]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7f300d19337b]
/lib64/libglusterfs.so.0(+0x8aa49)[0x7f3018842a49]
/lib64/libpthread.so.0(+0x7dd5)[0x7f3017641dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f3016f09ead]

Comment 18 Digiteyes 2019-01-30 16:08:55 UTC

Created attachment 1525090 [details]
Mount Log

Comment 19 tavis.paquette 2019-01-30 18:20:32 UTC

I'm also experiencing this issue, began after an upgrade to 5.1, continued to occur through upgrades to 5.3

The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 447 times between [2019-01-30 18:13:29.742333] and [2019-01-30 18:15:27.890656]
[2019-01-30 18:15:34.980908] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 27 times between [2019-01-30 18:15:34.980908] and [2019-01-30 18:17:23.626256]
[2019-01-30 18:17:31.085125] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 31 times between [2019-01-30 18:17:31.085125] and [2019-01-30 18:19:27.231000]
[2019-01-30 18:19:38.782441] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler

Comment 20 Artem Russakovskii 2019-01-30 20:41:23 UTC

Got a ton of these in my logs after upgrading from 4.1 to 5.3, in addition to a lot of repeated messages here https://bugzilla.redhat.com/show_bug.cgi?id=1313567.



==> mnt-SITE_data1.log <==
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

==> mnt-SITE_data3.log <==
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 413 times between [2019-01-30 20:36:23.881090] and [2019-01-30 20:38:20.015593]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0" repeated 42 times between [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]

==> mnt-SITE_data1.log <==
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0" repeated 50 times between [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and [2019-01-30 20:38:20.546355]
[2019-01-30 20:38:21.492319] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0

==> mnt-SITE_data3.log <==
[2019-01-30 20:38:22.349689] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0

==> mnt-SITE_data1.log <==
[2019-01-30 20:38:22.762941] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler

Comment 21 tavis.paquette 2019-01-30 21:15:46 UTC

I've seen this issue in about 20 different environments (large and small, all of which were upgraded from 3.x)

Comment 22 Digiteyes 2019-01-31 09:44:41 UTC

We have not upgraded from 3.x , we have fresh install of 5.x and have same issue

Comment 23 Nithya Balachandran 2019-01-31 09:52:00 UTC

Corrected the version and assigned this to Milind to backport the relevant patches to release-5. As per an email discussion, he confirmed that the following patches are required to fix the flood of "Failed to dispatch handler" logs.

https://review.gluster.org/#/c/glusterfs/+/22044
https://review.gluster.org/#/c/glusterfs/+/22046/

Comment 24 Nithya Balachandran 2019-01-31 09:55:22 UTC

(In reply to David E. Smith from comment #15)
> I'm having what appears to be the same issue. Started when I upgraded from
> 3.12 to 5.2 a few weeks back, and the subsequent upgrade to 5.3 did not
> resolve the problem.
> 
> My servers (two, in a 'replica 2' setup) publish two volumes. One is Web
> site content, about 110GB; the other is Web config files, only a few
> megabytes. (Wasn't worth building extra servers for that second volume.)
> FUSE clients have been crashing on the larger volume every three or four
> days.
> 
> The client's logs show many hundreds of instances of this (I don't know if
> it's related):
> [2019-01-29 08:14:16.542674] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384)
> [0x7fa171ead384]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e)
> [0x7fa1720bee3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7fa1809cc2ad]
> ) 0-dict: dict is NULL [Invalid argument]
> 
> Then, when the client's glusterfs process crashes, this is logged:
> 
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 871 times between [2019-01-29 08:12:48.390535] and
> [2019-01-29 08:14:17.100279]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : type(1) op(LOOKUP)
> frame : type(0) op(0)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 11
> time of crash:
> 2019-01-29 08:14:17
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610]
> /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84]
> /lib64/libc.so.6(+0x36280)[0x7fa17f03c280]
> /lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d]
> /lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2]
> /lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96]
> /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd]
> /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a]
> /lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5]
> /lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead]
> ---------
> 
> 
> 
> Info on the volumes themselves, gathered from one of my servers:
> 
> [davidsmith@wuit-s-10889 ~]$ sudo gluster volume info all
> 
> Volume Name: web-config
> Type: Replicate
> Volume ID: 6c5dce6e-e64e-4a6d-82b3-f526744b463d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 172.23.128.26:/data/web-config
> Brick2: 172.23.128.27:/data/web-config
> Options Reconfigured:
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> server.event-threads: 4
> client.event-threads: 4
> cluster.min-free-disk: 1
> cluster.quorum-count: 2
> cluster.quorum-type: fixed
> network.ping-timeout: 10
> auth.allow: *
> performance.readdir-ahead: on
> 
> Volume Name: web-content
> Type: Replicate
> Volume ID: fcabc15f-0cec-498f-93c4-2d75ad915730
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 172.23.128.26:/data/web-content
> Brick2: 172.23.128.27:/data/web-content
> Options Reconfigured:
> network.ping-timeout: 10
> cluster.quorum-type: fixed
> cluster.quorum-count: 2
> performance.readdir-ahead: on
> auth.allow: *
> cluster.min-free-disk: 1
> client.event-threads: 4
> server.event-threads: 4
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
> performance.cache-size: 4GB
> 
> 
> 
> gluster> volume status all detail
> Status of volume: web-config
> -----------------------------------------------------------------------------
> -
> Brick                : Brick 172.23.128.26:/data/web-config
> TCP Port             : 49152
> RDMA Port            : 0
> Online               : Y
> Pid                  : 5612
> File System          : ext3
> Device               : /dev/sdb1
> Mount Options        : rw,seclabel,relatime,data=ordered
> Inode Size           : 256
> Disk Space Free      : 135.9GB
> Total Disk Space     : 246.0GB
> Inode Count          : 16384000
> Free Inodes          : 14962279
> -----------------------------------------------------------------------------
> -
> Brick                : Brick 172.23.128.27:/data/web-config
> TCP Port             : 49152
> RDMA Port            : 0
> Online               : Y
> Pid                  : 5540
> File System          : ext3
> Device               : /dev/sdb1
> Mount Options        : rw,seclabel,relatime,data=ordered
> Inode Size           : 256
> Disk Space Free      : 135.9GB
> Total Disk Space     : 246.0GB
> Inode Count          : 16384000
> Free Inodes          : 14962277
> 
> Status of volume: web-content
> -----------------------------------------------------------------------------
> -
> Brick                : Brick 172.23.128.26:/data/web-content
> TCP Port             : 49153
> RDMA Port            : 0
> Online               : Y
> Pid                  : 5649
> File System          : ext3
> Device               : /dev/sdb1
> Mount Options        : rw,seclabel,relatime,data=ordered
> Inode Size           : 256
> Disk Space Free      : 135.9GB
> Total Disk Space     : 246.0GB
> Inode Count          : 16384000
> Free Inodes          : 14962279
> -----------------------------------------------------------------------------
> -
> Brick                : Brick 172.23.128.27:/data/web-content
> TCP Port             : 49153
> RDMA Port            : 0
> Online               : Y
> Pid                  : 5567
> File System          : ext3
> Device               : /dev/sdb1
> Mount Options        : rw,seclabel,relatime,data=ordered
> Inode Size           : 256
> Disk Space Free      : 135.9GB
> Total Disk Space     : 246.0GB
> Inode Count          : 16384000
> Free Inodes          : 14962277
> 
> 
> I have a couple of core files that appear to be from this, but I'm not much
> of a developer (haven't touched C in fifteen years) so I don't know what to
> do with them that would be of value in this case.

Please file a separate BZ for the crashes and provide the bt and corefiles.

Comment 25 Worker Ant 2019-01-31 11:29:46 UTC

REVIEW: https://review.gluster.org/22134 (socket: fix issue when socket write return with EAGAIN) posted (#1) for review on release-5 by Milind Changire

Comment 26 Worker Ant 2019-01-31 11:31:03 UTC

REVIEW: https://review.gluster.org/22135 (socket: don't pass return value from protocol handler to event handler) posted (#1) for review on release-5 by Milind Changire

Comment 27 Artem Russakovskii 2019-01-31 18:08:42 UTC

I wish I saw this bug report before I updated from rock solid 4.1 to 5.3. Less than 24 hours after upgrading, I already got a crash and had to unmount, kill gluster, and remount:


[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-01-31 09:38:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
---------

Do the pending patches fix the crash or only the repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, not too sure how to make it core dump.

If it's not fixed by the patches above, has anyone already opened a ticket for the crashes that I can join and monitor? This is going to create a massive problem for us since production systems are crashing.

Thanks.

Comment 28 David E. Smith 2019-01-31 22:15:29 UTC

As requested, opened a new bug report for my crashes, https://bugzilla.redhat.com/show_bug.cgi?id=1671556 . Links to cores will be added there Really Soon.

Comment 29 Artem Russakovskii 2019-02-02 20:16:52 UTC

The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this?

In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround.

By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process.


monit check:
check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
  start program  = "/bin/mount  /mnt/glusterfs_data1"
  stop program  = "/bin/umount /mnt/glusterfs_data1"
  if space usage > 90% for 5 times within 15 cycles
    then alert else if succeeded for 10 cycles then alert


stack trace:
[2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-01 23:22:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]

Comment 30 Milind Changire 2019-02-03 03:07:11 UTC

the following line the backtrace which is the topmost line pointing to gluster bits:

/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]

resolves to:

afr-common.c:2203
    intersection = alloca0(priv->child_count);                                                                                                    


-----
NOTE:
print-backtrace.sh isn't helping here because the naming convention of rpms have changed

Comment 31 Worker Ant 2019-02-04 14:48:49 UTC

REVIEW: https://review.gluster.org/22135 (socket: don't pass return value from protocol handler to event handler) merged (#2) on release-5 by Shyamsundar Ranganathan

Comment 32 Worker Ant 2019-02-04 14:50:30 UTC

REVIEW: https://review.gluster.org/22134 (socket: fix issue when socket write return with EAGAIN) merged (#2) on release-5 by Shyamsundar Ranganathan

Comment 33 James 2019-02-14 14:31:08 UTC

I'm also having problems with Gluster bricks going offline since upgrading to oVirt 4.3 yesterday (previously I've never had a single issue with gluster nor have had a brick ever go down).  I suspect this will continue to happen daily as some other users on this group have suggested.  I was able to pull some logs from engine and gluster from around the time the brick dropped.  My setup is 3 node HCI and I was previously running the latest 4.2 updates (before upgrading to 4.3).  My hardware is has a lot of overhead and I'm on 10Gbe gluster backend (the servers were certainly not under any significant amount of load when the brick went offline).  To recover I had to place the host in maintenance mode and reboot (although I suspect I could have simply unmounted and remounted gluster mounts). 

grep "2019-02-14" engine.log-20190214 | grep "GLUSTER_BRICK_STATUS_CHANGED"
2019-02-14 02:41:48,018-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from UP to DOWN via cli.
2019-02-14 03:20:11,189-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/engine/engine of volume engine of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:14,819-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/prod_b/prod_b of volume prod_b of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:19,692-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/isos/isos of volume isos of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:25,022-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/prod_a/prod_a of volume prod_a of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:29,088-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:34,099-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_a/non_prod_a of volume non_prod_a of cluster Default from DOWN to UP via cli

glusterd.log

# grep -B20 -A20 "2019-02-14 02:41" glusterd.log
[2019-02-14 02:36:49.585034] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:36:49.597788] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:36:49.597788] and [2019-02-14 02:36:49.900505]
[2019-02-14 02:36:53.437539] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:36:53.452816] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:36:53.864153] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:36:53.875835] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:36:30.958649] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:36:35.322129] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
[2019-02-14 02:36:39.639645] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:36:45.301275] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:36:53.875835] and [2019-02-14 02:36:54.180780]
[2019-02-14 02:37:59.193409] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:38:44.065560] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:38:44.072680] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:38:44.077841] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:38:44.082798] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:38:44.088237] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
[2019-02-14 02:38:44.093518] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:37:59.193409] and [2019-02-14 02:38:44.100494]
[2019-02-14 02:41:58.649683] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 6 times between [2019-02-14 02:41:58.649683] and [2019-02-14 02:43:00.286999]
[2019-02-14 02:43:46.366743] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:43:46.373587] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:43:46.378997] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:43:46.384324] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:43:46.390310] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
[2019-02-14 02:43:46.397031] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
[2019-02-14 02:43:46.404083] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:45:47.302884] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:45:47.309697] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:45:47.315149] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a
[2019-02-14 02:45:47.320806] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:45:47.326865] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
[2019-02-14 02:45:47.332192] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
[2019-02-14 02:45:47.338991] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-14 02:46:47.789575] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b
[2019-02-14 02:46:47.795276] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a
[2019-02-14 02:46:47.800584] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b
[2019-02-14 02:46:47.770601] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine
[2019-02-14 02:46:47.778161] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos
[2019-02-14 02:46:47.784020] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a

engine.log

# grep -B20 -A20 "2019-02-14 02:41:48" engine.log-20190214
2019-02-14 02:41:43,495-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 172c9ee8
2019-02-14 02:41:43,609-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@479fcb69, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6443e68f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2b4cf035, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5864f06a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6119ac8c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1a9549be, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5614cf81, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@290c9289, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5dd26e8, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@35355754, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@452deeb4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8f8b442, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@647e29d3, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7bee4dff, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@511c4478, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c0bb0bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@92e325e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@260731, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@33aaacc9, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@72657c59, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@aa10c89], log id: 172c9ee8
2019-02-14 02:41:43,610-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 3a0e9d63
2019-02-14 02:41:43,703-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@5ca4a20f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@57a8a76, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@7bd1b14], log id: 3a0e9d63
2019-02-14 02:41:43,704-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 49966b05
2019-02-14 02:41:44,213-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 49966b05
2019-02-14 02:41:44,214-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 30db0ce2
2019-02-14 02:41:44,311-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@61a309b5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@ea9cb2e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@749d57bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c49f9d0, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@655eb54d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@256ee273, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3bd079dc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6804900f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@78e0a49f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2acfbc8a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12e92e96, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5ea1502c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2398c33b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7464102e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2f221daa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7b561852, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1eb29d18, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4a030b80, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@75739027, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3eac8253, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@34fc82c3], log id: 30db0ce2
2019-02-14 02:41:44,312-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 6671d0d7
2019-02-14 02:41:44,329-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:44,345-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:44,374-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:44,405-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@f6a9696, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@558e3332, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@5b449da], log id: 6671d0d7
2019-02-14 02:41:44,406-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 6d2bc6d3
2019-02-14 02:41:44,908-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 6d2bc6d3
2019-02-14 02:41:44,909-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVolumeAdvancedDetailsVDSCommand(HostName = Host0, GlusterVolumeAdvancedDetailsVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5', volumeName='non_prod_b'}), log id: 36ae23c6
2019-02-14 02:41:47,336-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:47,351-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:47,379-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:47,979-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVolumeAdvancedDetailsVDSCommand, return: org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeAdvancedDetails@7a4a787b, log id: 36ae23c6
2019-02-14 02:41:48,018-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from UP to DOWN via cli.
2019-02-14 02:41:48,046-04 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_DOWN(4,151), Status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b on cluster Default is down.
2019-02-14 02:41:48,139-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler1) [5ff5b093] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:48,140-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] START, GlusterServersListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: e1fb23
2019-02-14 02:41:48,911-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] FINISH, GlusterServersListVDSCommand, return: [10.12.0.220/24:CONNECTED, host1.replaced.domain.com:CONNECTED, host2.replaced.domain.com:CONNECTED], log id: e1fb23
2019-02-14 02:41:48,930-04 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler1) [5ff5b093] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}'
2019-02-14 02:41:48,931-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] START, GlusterVolumesListVDSCommand(HostName = Host0, GlusterVolumesListVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 68f1aecc
2019-02-14 02:41:49,366-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] FINISH, GlusterVolumesListVDSCommand, return: {6c05dfc6-4dc0-41e3-a12f-55b4767f1d35=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@1952a85, 3f8f6a0f-aed4-48e3-9129-18a2a3f64eef=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@2f6688ae, 71ff56d9-79b8-445d-b637-72ffc974f109=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@730210fb, 752a9438-cd11-426c-b384-bc3c5f86ed07=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@c3be510c, c3e7447e-8514-4e4a-9ff5-a648fe6aa537=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@450befac, 79e8e93c-57c8-4541-a360-726cec3790cf=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@1926e392}, log id: 68f1aecc
2019-02-14 02:41:49,489-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 38debe74
2019-02-14 02:41:49,581-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5e5a7925, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2cdf5c9e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@443cb62, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@49a3e880, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@443d23c0, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1250bc75, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8d27d86, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5e6363f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@73ed78db, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@64c9d1c7, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7fecbe95, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3a551e5f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2266926e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@88b380c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1209279e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3c6466, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@16df63ed, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@47456262, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c2b88c3, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7f57c074, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12fa0478], log id: 38debe74
2019-02-14 02:41:49,582-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 7ec02237
2019-02-14 02:41:49,660-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@3eedd0bc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@7f78e375, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@3d63e126], log id: 7ec02237
2019-02-14 02:41:49,661-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 42cdad27
2019-02-14 02:41:50,142-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 42cdad27
2019-02-14 02:41:50,143-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 12f5fdf2
2019-02-14 02:41:50,248-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2aaed792, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8e66930, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@276d599e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1aca2aec, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@46846c60, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7d103269, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@30fc25fc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7baae445, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1ea8603c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@62578afa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@33d58089, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1f71d27a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4205e828, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c5bbac8, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@395a002, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12664008, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7f4faec4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3e03d61f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1038e46d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@307e8062, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@32453127], log id: 12f5fdf2
2019-02-14 02:41:50,249-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 1256aa5e
2019-02-14 02:41:50,338-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@459a2ff5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@123cab4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@1af41fbe], log id: 1256aa5e
2019-02-14 02:41:50,339-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 3dd752e4
2019-02-14 02:41:50,847-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 3dd752e4
2019-02-14 02:41:50,848-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 29a6272c
2019-02-14 02:41:50,954-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@364f3ec6, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@c7cce5e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@b3bed47, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@13bc244b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5cca81f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@36aeba0d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@62ab384a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1047d628, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@188a30f5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5bb79f3b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@60e5956f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4e3df9cd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7796567, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@60d06cf4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2cd2d36c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@d80a4aa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@411eaa20, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@22cac93b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@18b927bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@101465f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@246f927c], log id: 29a6272c
2019-02-14 02:41:50,955-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 501814db
2019-02-14 02:41:51,044-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@1cd55aa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@32c5aba2, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@6ae123f4], log id: 501814db
2019-02-14 02:41:51,045-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 7acf4cbf
2019-02-14 02:41:51,546-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 7acf4cbf
2019-02-14 02:41:51,547-04 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVolumeAdvancedDetailsVDSCommand(HostName = Host0, GlusterVolumeAdvancedDetailsVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5', volumeName='non_prod_a'}), log id: 11c42649

Comment 34 Worker Ant 2019-02-15 07:46:34 UTC

REVIEW: https://review.gluster.org/22221 (socket: socket event handlers now return void) posted (#1) for review on master by Milind Changire

Comment 35 Emerson Gomes 2019-02-16 09:57:51 UTC

Find below GDB output from crash.


  Id   Target Id         Frame
  12   Thread 0x7fea4ae43700 (LWP 26597) 0x00007fea530e2361 in sigwait () from /lib64/libpthread.so.0
  11   Thread 0x7fea54773780 (LWP 26595) 0x00007fea530dbf47 in pthread_join () from /lib64/libpthread.so.0
  10   Thread 0x7fea47392700 (LWP 26601) 0x00007fea530e14ed in __lll_lock_wait () from /lib64/libpthread.so.0
  9    Thread 0x7fea3f7fe700 (LWP 26604) 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6
  8    Thread 0x7fea3ffff700 (LWP 26603) 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6
  7    Thread 0x7fea3effd700 (LWP 26605) 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6
  6    Thread 0x7fea3dffb700 (LWP 26615) 0x00007fea530de965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7fea49640700 (LWP 26600) 0x00007fea530ded12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7fea4a642700 (LWP 26598) 0x00007fea52969e2d in nanosleep () from /lib64/libc.so.6
  3    Thread 0x7fea49e41700 (LWP 26599) 0x00007fea530ded12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7fea4b644700 (LWP 26596) 0x00007fea530e1e3d in nanosleep () from /lib64/libpthread.so.0
* 1    Thread 0x7fea3e7fc700 (LWP 26614) 0x00007fea45b62ff1 in ioc_inode_update () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so

Thread 12 (Thread 0x7fea4ae43700 (LWP 26597)):
#0  0x00007fea530e2361 in sigwait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x000055959d410e2b in glusterfs_sigwaiter ()
No symbol table info available.
#2  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 11 (Thread 0x7fea54773780 (LWP 26595)):
#0  0x00007fea530dbf47 in pthread_join () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fea542dadb8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
No symbol table info available.
#2  0x000055959d40d56b in main ()
No symbol table info available.

Thread 10 (Thread 0x7fea47392700 (LWP 26601)):
#0  0x00007fea530e14ed in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fea530dcdcb in _L_lock_883 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00007fea530dcc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fea45b62fb6 in ioc_inode_update () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so
No symbol table info available.
#4  0x00007fea45b6314a in ioc_lookup_cbk () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so
No symbol table info available.
#5  0x00007fea461a0343 in wb_lookup_cbk () from /usr/lib64/glusterfs/5.3/xlator/performance/write-behind.so
No symbol table info available.
#6  0x00007fea463f2b79 in dht_revalidate_cbk () from /usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so
No symbol table info available.
#7  0x00007fea466d09e5 in afr_lookup_done () from /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so
No symbol table info available.
#8  0x00007fea466d1198 in afr_lookup_metadata_heal_check () from /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so
No symbol table info available.
#9  0x00007fea466d1cbb in afr_lookup_entry_heal () from /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so
No symbol table info available.
#10 0x00007fea466d1f99 in afr_lookup_cbk () from /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so
No symbol table info available.
#11 0x00007fea4695a6d2 in client4_0_lookup_cbk () from /usr/lib64/glusterfs/5.3/xlator/protocol/client.so
No symbol table info available.
#12 0x00007fea54043c70 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
No symbol table info available.
#13 0x00007fea54044043 in rpc_clnt_notify () from /lib64/libgfrpc.so.0
No symbol table info available.
#14 0x00007fea5403ff23 in rpc_transport_notify () from /lib64/libgfrpc.so.0
No symbol table info available.
#15 0x00007fea48c2c37b in socket_event_handler () from /usr/lib64/glusterfs/5.3/rpc-transport/socket.so
No symbol table info available.
#16 0x00007fea542dba49 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
No symbol table info available.
#17 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#18 0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 9 (Thread 0x7fea3f7fe700 (LWP 26604)):
#0  0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fea542db790 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 8 (Thread 0x7fea3ffff700 (LWP 26603)):
#0  0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fea542db790 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 7 (Thread 0x7fea3effd700 (LWP 26605)):
#0  0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fea542db790 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 6 (Thread 0x7fea3dffb700 (LWP 26615)):
#0  0x00007fea530de965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fea4b64ddbb in notify_kernel_loop () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#2  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 5 (Thread 0x7fea49640700 (LWP 26600)):
#0  0x00007fea530ded12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fea542b6cf8 in syncenv_task () from /lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00007fea542b7c40 in syncenv_processor () from /lib64/libglusterfs.so.0
No symbol table info available.
#3  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 4 (Thread 0x7fea4a642700 (LWP 26598)):
#0  0x00007fea52969e2d in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fea52969cc4 in sleep () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fea542a2e7d in pool_sweeper () from /lib64/libglusterfs.so.0
No symbol table info available.
#3  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 3 (Thread 0x7fea49e41700 (LWP 26599)):
#0  0x00007fea530ded12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fea542b6cf8 in syncenv_task () from /lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00007fea542b7c40 in syncenv_processor () from /lib64/libglusterfs.so.0
No symbol table info available.
#3  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 2 (Thread 0x7fea4b644700 (LWP 26596)):
#0  0x00007fea530e1e3d in nanosleep () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fea54285f76 in gf_timer_proc () from /lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (Thread 0x7fea3e7fc700 (LWP 26614)):
#0  0x00007fea45b62ff1 in ioc_inode_update () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so
No symbol table info available.
#1  0x00007fea45b634cb in ioc_readdirp_cbk () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so
No symbol table info available.
#2  0x00007fea45d7a69f in rda_readdirp () from /usr/lib64/glusterfs/5.3/xlator/performance/readdir-ahead.so
No symbol table info available.
#3  0x00007fea45b5eb0e in ioc_readdirp () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so
No symbol table info available.
#4  0x00007fea4594f8e7 in qr_readdirp () from /usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so
No symbol table info available.
#5  0x00007fea5430bfb1 in default_readdirp () from /lib64/libglusterfs.so.0
No symbol table info available.
#6  0x00007fea455333e6 in mdc_readdirp () from /usr/lib64/glusterfs/5.3/xlator/performance/md-cache.so
No symbol table info available.
#7  0x00007fea452f7d32 in io_stats_readdirp () from /usr/lib64/glusterfs/5.3/xlator/debug/io-stats.so
No symbol table info available.
#8  0x00007fea5430bfb1 in default_readdirp () from /lib64/libglusterfs.so.0
No symbol table info available.
#9  0x00007fea450dc343 in meta_readdirp () from /usr/lib64/glusterfs/5.3/xlator/meta.so
No symbol table info available.
#10 0x00007fea4b659697 in fuse_readdirp_resume () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#11 0x00007fea4b64cc45 in fuse_resolve_all () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#12 0x00007fea4b64c958 in fuse_resolve () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#13 0x00007fea4b64cc8e in fuse_resolve_all () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#14 0x00007fea4b64bf23 in fuse_resolve_continue () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#15 0x00007fea4b64c8d6 in fuse_resolve () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#16 0x00007fea4b64cc6e in fuse_resolve_all () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#17 0x00007fea4b64ccb0 in fuse_resolve_and_resume () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#18 0x00007fea4b664d7a in fuse_thread_proc () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so
No symbol table info available.
#19 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#20 0x00007fea529a2ead in clone () from /lib64/libc.so.6
No symbol table info available.

Comment 36 Emerson Gomes 2019-02-16 10:02:26 UTC

Core dump: https://drive.google.com/open?id=1cEehuPAdXHIR7eG_-RsbJkmu8lJz80k6

Comment 37 Milind Changire 2019-02-16 10:48:48 UTC

1. crash listing in comment #3 points the disperse xlator
   /usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4)[0x7f0e23aec8a4]

2. crash listing in comment #15 points to inode_forget_with_unref from the fuse xlator
   /lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96]

3. crash listing in comment #17 points to the distribute xlator
   /usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x8cf9)[0x7f300a9a7cf9]

4. crash listing in comment #27 points to the replicate xlator
   /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]

5. crash listing in comment #29 points to the replicate xlator
   /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
   See comment #30 for a preliminary finding about this crash

Ravi, could you please take a look at item#5 above.

Comment 38 Worker Ant 2019-02-18 02:46:12 UTC

REVIEW: https://review.gluster.org/22221 (socket: socket event handlers now return void) merged (#4) on master by Amar Tumballi

Comment 39 Nithya Balachandran 2019-02-18 02:53:15 UTC

(In reply to Emerson Gomes from comment #35)
> Find below GDB output from crash.
> 


Please use BZ#1671556 to report any Fuse client crashes. These look similar to an issue in the write-behind translator that we are working to fix.  Try setting performance.write-behind to off and let us know if you still see the crashes.

Comment 40 Ravishankar N 2019-02-18 03:47:14 UTC

Clearing the need info on me based on comment #39.

Comment 41 Worker Ant 2019-02-20 06:44:40 UTC

REVIEW: https://review.gluster.org/22237 (socket: socket event handlers now return void) posted (#1) for review on release-5 by Milind Changire

Comment 42 Endre Karlson 2019-02-23 21:31:36 UTC

Any news on a patched version for ovirt 4.3 ? We keep seeing crashes like these too..

Comment 43 Worker Ant 2019-02-25 15:23:43 UTC

REVIEW: https://review.gluster.org/22237 (socket: socket event handlers now return void) merged (#4) on release-5 by Shyamsundar Ranganathan

Comment 44 Shyamsundar 2019-03-27 13:44:02 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.5, please open a new bug report.

glusterfs-5.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.

alexander
amgad.saleh
archon810
brice
bugs
desmith
emerson.gomes
endre.karlson
guillaume.pavese
jayme
joao.bauto
nbalacha
pasik
ravishankar
rob.dewit
sabose
tavis.paquette
timo
vanessa.haro
vnosov
ykaul