I have this errors in logs, and glusterfs randomlly crashes with mem_pool.c 330 line (Free memory) Gluster v5.1 The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 465 times between [2018-11-19 13:58:04.457516] and [2018-11-19 14:00:02.137868] [2018-11-19 14:00:04.392558] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 389 times between [2018-11-19 14:00:04.392558] and [2018-11-19 14:01:58.224754] [2018-11-19 14:02:04.412850] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 130 times between [2018-11-19 14:02:04.412850] and [2018-11-19 14:03:58.232651] [2018-11-19 14:04:04.418550] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 110 times between [2018-11-19 14:04:04.418550] and [2018-11-19 14:06:04.468980] [2018-11-19 14:06:04.627504] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 140 times between [2018-11-19 14:06:04.627504] and [2018-11-19 14:08:04.600312] [2018-11-19 14:08:07.794521] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 95 times between [2018-11-19 14:08:07.794521] and [2018-11-19 14:10:04.345444] [2018-11-19 14:10:07.569899] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 37 times between [2018-11-19 14:10:07.569899] and [2018-11-19 14:12:04.636673] [2018-11-19 14:12:07.198290] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
Also maybe this can help, I have many errors, about 100 with "no such file" how this can be fixed manually ? [2018-11-19 13:32:08.503733] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:6c0a8545-a83c-499c-a834-959b554e0094> (6c0a8545-a83c-499c-a834-959b554e0094) [No such file or directory] [2018-11-19 13:32:08.504201] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:0fed92d4-fd54-462b-b658-fed053849237> (0fed92d4-fd54-462b-b658-fed053849237) [No such file or directory] [2018-11-19 13:32:08.505653] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:6c0a8545-a83c-499c-a834-959b554e0094> (6c0a8545-a83c-499c-a834-959b554e0094) [No such file or directory] [2018-11-19 13:32:08.505912] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-0: remote operation failed. Path: <gfid:6c0a8545-a83c-499c-a834-959b554e0094> (6c0a8545-a83c-499c-a834-959b554e0094) [No such file or directory] [2018-11-19 13:32:08.506628] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-0: remote operation failed. Path: <gfid:0fed92d4-fd54-462b-b658-fed053849237> (0fed92d4-fd54-462b-b658-fed053849237) [No such file or directory] [2018-11-19 13:32:08.509315] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:0fed92d4-fd54-462b-b658-fed053849237> (0fed92d4-fd54-462b-b658-fed053849237) [No such file or directory] [2018-11-19 13:37:38.681111] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:b4bf758f-9a7a-412f-b3bc-25ce3c43ea00> (b4bf758f-9a7a-412f-b3bc-25ce3c43ea00) [No such file or directory] [2018-11-19 13:37:38.681199] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-0: remote operation failed. Path: <gfid:b4bf758f-9a7a-412f-b3bc-25ce3c43ea00> (b4bf758f-9a7a-412f-b3bc-25ce3c43ea00) [No such file or directory] [2018-11-19 13:37:38.681388] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:b4bf758f-9a7a-412f-b3bc-25ce3c43ea00> (b4bf758f-9a7a-412f-b3bc-25ce3c43ea00) [No such file or directory] [2018-11-19 13:37:38.685464] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-0: remote operation failed. Path: <gfid:f32ea22c-6705-4b33-9eeb-b6d6f9a03911> (f32ea22c-6705-4b33-9eeb-b6d6f9a03911) [No such file or directory] [2018-11-19 13:37:38.685467] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:f32ea22c-6705-4b33-9eeb-b6d6f9a03911> (f32ea22c-6705-4b33-9eeb-b6d6f9a03911) [No such file or directory] [2018-11-19 13:37:38.685727] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:f32ea22c-6705-4b33-9eeb-b6d6f9a03911> (f32ea22c-6705-4b33-9eeb-b6d6f9a03911) [No such file or directory] [2018-11-19 13:37:38.686705] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-1: remote operation failed. Path: <gfid:a168e718-68cb-4e66-83e6-128cd0b7374d> (a168e718-68cb-4e66-83e6-128cd0b7374d) [No such file or directory] [2018-11-19 13:37:38.686740] W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-hadoop_volume-client-2: remote operation failed. Path: <gfid:a168e718-68cb-4e66-83e6-128cd0b7374d> (a168e718-68cb-4e66-83e6-128cd0b7374d) [No such file or directory]
# gluster volume info Volume Name: hadoop_volume Type: Disperse Volume ID: 5ad3899f-cbc0-4032-b5ea-a0cf7c775d73 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: hdd1:/hadoop Brick2: hdd2:/hadoop Brick3: hdd3:/hadoop Options Reconfigured: nfs.disable: on transport.address-family: inet geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on
here it crashed: [2018-11-19 14:50:09.900071] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 152 times between [2018-11-19 14:50:09.900071] and [2018-11-19 14:52:07.480234] [2018-11-19 14:52:09.382050] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2018-11-19 14:53:39.492132] E [mem-pool.c:322:__gf_free] (-->/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4) [0x7f0e23aec8a4] -->/usr/lib/libglusterfs.so.0(+0x1a24e) [0x7f0e2e3b424e] -->/usr/lib/libglusterfs.so.0(__gf_free+0x9b) [0x7f0e2e3e867b] ) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == header->magic The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 124 times between [2018-11-19 14:52:09.382050] and [2018-11-19 14:53:39.233214] pending frames: frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(READDIRP) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(FLUSH) frame : type(1) op(CREATE) frame : type(1) op(STAT) frame : type(1) op(FSTAT) frame : type(1) op(OPEN) frame : type(1) op(FLUSH) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(FSTAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(OPENDIR) frame : type(1) op(STAT) frame : type(1) op(OPENDIR) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(FLUSH) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(FSTAT) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(FSTAT) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(FLUSH) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(FLUSH) frame : type(1) op(STAT) frame : type(1) op(FSTAT) frame : type(1) op(READDIRP) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(FLUSH) frame : type(1) op(READDIRP) frame : type(1) op(OPENDIR) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(STAT) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(FLUSH) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(FLUSH) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(READDIRP) frame : type(1) op(FLUSH) frame : type(1) op(OPENDIR) frame : type(1) op(OPENDIR) frame : type(1) op(FSTAT) frame : type(1) op(FLUSH) frame : type(1) op(OPENDIR) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(OPEN) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2018-11-19 14:53:39 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.1 /usr/lib/libglusterfs.so.0(+0x256ea)[0x7f0e2e3bf6ea] /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e7)[0x7f0e2e3c9ab7] /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f0e2d7ab4b0] /usr/lib/libglusterfs.so.0(__gf_free+0xb0)[0x7f0e2e3e8690] /usr/lib/libglusterfs.so.0(+0x1a24e)[0x7f0e2e3b424e] /usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4)[0x7f0e23aec8a4] /usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0x14528)[0x7f0e23af1528] /usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0x18b52)[0x7f0e23af5b52] /usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0x11fa3)[0x7f0e23aeefa3] /usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xb3c1)[0x7f0e23ae83c1] /usr/lib/glusterfs/5.1/xlator/cluster/distribute.so(+0x7e70a)[0x7f0e238a070a] /usr/lib/glusterfs/5.1/xlator/performance/write-behind.so(+0x563a)[0x7f0e2802963a] /usr/lib/libglusterfs.so.0(call_resume_keep_stub+0x75)[0x7f0e2e3e51f5] /usr/lib/glusterfs/5.1/xlator/performance/write-behind.so(+0x9289)[0x7f0e2802d289] /usr/lib/glusterfs/5.1/xlator/performance/write-behind.so(+0x939b)[0x7f0e2802d39b] /usr/lib/glusterfs/5.1/xlator/performance/write-behind.so(+0xa4d8)[0x7f0e2802e4d8] /usr/lib/glusterfs/5.1/xlator/performance/read-ahead.so(+0x4113)[0x7f0e23616113] /usr/lib/libglusterfs.so.0(default_flush+0xb2)[0x7f0e2e4470c2] /usr/lib/libglusterfs.so.0(default_flush+0xb2)[0x7f0e2e4470c2] /usr/lib/libglusterfs.so.0(default_flush+0xb2)[0x7f0e2e4470c2] /usr/lib/libglusterfs.so.0(default_flush_resume+0x1e5)[0x7f0e2e45df35] /usr/lib/libglusterfs.so.0(call_resume+0x75)[0x7f0e2e3e5035] /usr/lib/glusterfs/5.1/xlator/performance/open-behind.so(+0x4950)[0x7f0e22dd5950] /usr/lib/glusterfs/5.1/xlator/performance/open-behind.so(+0x4db8)[0x7f0e22dd5db8] /usr/lib/libglusterfs.so.0(default_flush+0xb2)[0x7f0e2e4470c2] /usr/lib/libglusterfs.so.0(default_flush_resume+0x1e5)[0x7f0e2e45df35] /usr/lib/libglusterfs.so.0(call_resume+0x75)[0x7f0e2e3e5035] /usr/lib/glusterfs/5.1/xlator/performance/io-threads.so(+0x5a18)[0x7f0e229a6a18] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f0e2db476ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0e2d87d41d]
gdb report: Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=hdd1 --volfile-id=/had'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f0e2e3e8690 in __gf_free (free_ptr=0x7f0e104b0758) at mem-pool.c:330 330 GF_ASSERT(GF_MEM_TRAILER_MAGIC == [Current thread is 1 (Thread 0x7f0e201e7700 (LWP 11458))] (gdb)
again crashed but some diff lines: [2018-11-19 14:58:08.327343] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2018-11-19 14:59:29.845899] E [mem-pool.c:322:__gf_free] (-->/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4) [0x7f825d9d38a4] -->/usr/lib/libglusterfs.so.0(+0x1a24e) [0x7f826406024e] -->/usr/lib/libglusterfs.so.0(__gf_free+0x9b) [0x7f826409467b] ) 0-: Assertion failed: GF_MEM_HEADER_MAGIC == header->magic [2018-11-19 14:59:29.845965] E [mem-pool.c:331:__gf_free] (-->/usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4) [0x7f825d9d38a4] -->/usr/lib/libglusterfs.so.0(+0x1a24e) [0x7f826406024e] -->/usr/lib/libglusterfs.so.0(__gf_free+0xf6) [0x7f82640946d6] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 280 times between [2018-11-19 14:58:08.327343] and [2018-11-19 14:59:28.480961] pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) etc..
I downgraded to 3.12.15 because 5.1 is not stable at all (clear install) Documentation for downgrade for someone who need this: backup your cluster data somewhere.. remove all instalation files gluster volume stop hadoop_volume gluster volume delete hadoop_volume killall glusterfs glusterfsd glusterd glustereventsd python # remove all files from bricks: rm -rf /hadoop/* && rm -rf /hadoop/.glusterfs # remove all configs rm -rf /usr/var/lib/glusterd && rm -rf /usr/var/log/glusterfs && rm -rf /usr/var/run/gluster && rm -rf /usr/etc/glusterfs # install new gluster, mount, copy all files to new cluster from backup.
We saw this as well in V5.1.1, The stack back traces were: (gdb) t a a bt Thread 24 (LWP 20898): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0xc) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000000 in ?? () Thread 23 (LWP 20894): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0x8) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000004 in ?? () #2 0x0000000000000000 in ?? () Thread 22 (LWP 20897): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0xb) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000004 in ?? () #2 0x0000000000000000 in ?? () Thread 21 (LWP 20885): #0 0x00007effe124da82 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:347 #1 0x00007effe2e02430 in ?? () #2 0x00007effe2e06050 in ?? () #3 0x00007effd7fbde60 in ?? () #4 0x00007effe2e06098 in ?? () #5 0x00007effe24258a8 in syncenv_task () from /lib64/libglusterfs.so.0 #6 0x00007effe24267f0 in syncenv_processor () from /lib64/libglusterfs.so.0 #7 0x00007effe1249dc5 in start_thread (arg=0x7effd7fbe700) at pthread_create.c:308 #8 0x00007effe0b1776d in putspent (p=0x0, stream=0x7effd7fbe700) at putspent.c:60 #9 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- Thread 20 (LWP 20880): #0 0x00007effe124aef7 in pthread_join (threadid=139637260523264, thread_return=0x0) at pthread_join.c:64 #1 0x00007effe2449968 in event_dispatch_epoll () from /lib64/libglusterfs.so.0 #2 0x00007effe28f94cb in main () Thread 19 (LWP 20888): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0x2) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000000 in ?? () Thread 18 (LWP 20883): Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x100000007: Thread 17 (LWP 20890): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=3, result=0x4) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000007 in ?? () #2 0x0000000000000000 in ?? () Thread 16 (LWP 20886): #0 0x00007effe124da82 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:347 #1 0x00007effe2e02430 in ?? () #2 0x00007effe2e06050 in ?? () #3 0x00007effd77bce60 in ?? () #4 0x00007effe2e06098 in ?? () #5 0x00007effe24258a8 in syncenv_task () from /lib64/libglusterfs.so.0 #6 0x00007effe24267f0 in syncenv_processor () from /lib64/libglusterfs.so.0 #7 0x00007effe1249dc5 in start_thread (arg=0x7effd77bd700) at pthread_create.c:308 #8 0x00007effe0b1776d in putspent (p=0x0, stream=0x7effd77bd700) at putspent.c:60 #9 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- Thread 15 (LWP 20892): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=5, result=0x6) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000001 in ?? () #2 0x0000000000000000 in ?? () Thread 14 (LWP 20889): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0x3) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000000 in ?? () Thread 13 (LWP 20896): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0xa) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000004 in ?? () #2 0x0000000000000000 in ?? () Thread 12 (LWP 20895): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0x9) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000004 in ?? () #2 0x0000000000000000 in ?? () Thread 11 (LWP 20900): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=4, result=0xe) at ../nss/getXXbyYY_r.c:297 ---Type <return> to continue, or q <return> to quit--- #1 0x0000000000000004 in ?? () #2 0x0000000000000000 in ?? () Thread 10 (LWP 20906): #0 0x00007effe124d6d5 in __pthread_cond_init (cond=0x7effe2e00ef4, cond_attr=0x80) at pthread_cond_init.c:40 #1 0x0000000000000000 in ?? () Thread 9 (LWP 20893): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=3, result=0x7) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000007 in ?? () #2 0x0000000000000000 in ?? () Thread 8 (LWP 20881): #0 0x00007effe1250bdd in __recvmsg_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000000000 in ?? () Thread 7 (LWP 20891): #0 0x00007effe12501bd in unwind_stop (version=2013313424, actions=<optimized out>, exc_class=2, exc_obj=0xffffffffffffffff, context=0x7eff7800b990, stop_parameter=0x519b) at unwind.c:98 #1 0x0000000000000000 in ?? () Thread 6 (LWP 20882): #0 0x00007effe1251101 in __libc_tcdrain (fd=32511) at ../sysdeps/unix/sysv/linux/tcdrain.c:34 #1 0x0000000000000000 in ?? () Thread 5 (LWP 20905): #0 0x00007effe0b0e5c0 in tdestroy_recurse (freefct=0x7effe2df4d70, root=0x7eff7802dec0) at tsearch.c:640 #1 tdestroy_recurse (freefct=0x7effe2df4d70, root=0x7effbe7fbe60) at tsearch.c:641 #2 tdestroy_recurse (freefct=0x7effe2df4d70, root=0x7effe2df4e00) at tsearch.c:639 #3 tdestroy_recurse (freefct=0x7effe2df4d70, root=0x7effe2dee590) at tsearch.c:641 ---Type <return> to continue, or q <return> to quit--- #4 tdestroy_recurse (root=0x7effe2de52d8, freefct=0x7effe2df4d70) at tsearch.c:641 #5 0x00007effe2df4e00 in ?? () #6 0x00007effbe7fbe60 in ?? () #7 0x00007effd97e1b40 in fuse_thread_proc () from /usr/lib64/glusterfs/5.1/xlator/mount/fuse.so #8 0x00007effe1249dc5 in start_thread (arg=0x7effbe7fc700) at pthread_create.c:308 #9 0x00007effe0b1776d in putspent (p=0x0, stream=0x7effbe7fc700) at putspent.c:60 #10 0x0000000000000000 in ?? () Thread 4 (LWP 20901): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0xf) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000000 in ?? () Thread 3 (LWP 20899): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=5, result=0xd) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000001 in ?? () #2 0x0000000000000000 in ?? () Thread 2 (LWP 20902): #0 0x00007effe0b17d43 in __getspnam_r (name=0x0, resbuf=0x7effe2de60b0, buffer=0x7effe244a340 <event_dispatch_epoll_worker+384> "\205\300t\277\203\370\377\017\204\203\001", buflen=0, result=0x10) at ../nss/getXXbyYY_r.c:297 #1 0x0000000000000000 in ?? () Thread 1 (LWP 20887): #0 0x00007effe2411775 in __gf_free () from /lib64/libglusterfs.so.0 #1 0x00007effe23da649 in dict_destroy () from /lib64/libglusterfs.so.0 #2 0x00007effd48288b4 in afr_local_cleanup () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #3 0x00007effd4802ab4 in afr_transaction_done () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so ---Type <return> to continue, or q <return> to quit--- #4 0x00007effd480919a in afr_unlock () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #5 0x00007effd4800819 in afr_changelog_post_op_done () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #6 0x00007effd480362c in afr_changelog_post_op_now () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #7 0x00007effd4804f1b in afr_transaction_start () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #8 0x00007effd480537a in afr_transaction () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #9 0x00007effd47fd562 in afr_fsync () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #10 0x00007effd45971b8 in dht_fsync () from /usr/lib64/glusterfs/5.1/xlator/cluster/distribute.so #11 0x00007effd42fd093 in wb_fsync_helper () from /usr/lib64/glusterfs/5.1/xlator/performance/write-behind.so #12 0x00007effe240e1b5 in call_resume_keep_stub () from /lib64/libglusterfs.so.0 #13 0x00007effd43038b9 in wb_do_winds () from /usr/lib64/glusterfs/5.1/xlator/performance/write-behind.so #14 0x00007effd43039cb in wb_process_queue () from /usr/lib64/glusterfs/5.1/xlator/performance/write-behind.so #15 0x00007effd4303b5f in wb_fulfill_cbk () from /usr/lib64/glusterfs/5.1/xlator/performance/write-behind.so #16 0x00007effd45855f9 in dht_writev_cbk () from /usr/lib64/glusterfs/5.1/xlator/cluster/distribute.so #17 0x00007effd47f020e in afr_writev_unwind () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #18 0x00007effd47f07be in afr_writev_wind_cbk () from /usr/lib64/glusterfs/5.1/xlator/cluster/replicate.so #19 0x00007effd4abdbc5 in client4_0_writev_cbk () from /usr/lib64/glusterfs/5.1/xlator/protocol/client.so #20 0x00007effe21b2c70 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0 #21 0x00007effe21b3043 in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #22 0x00007effe21aef23 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #23 0x00007effd6da937b in socket_event_handler () from /usr/lib64/glusterfs/5.1/rpc-transport/socket.so #24 0x00007effe244a5f9 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #25 0x00007effe1249dc5 in start_thread (arg=0x7effd54f9700) at pthread_create.c:308 #26 0x00007effe0b1776d in putspent (p=0x0, stream=0x7effd54f9700) at putspent.c:60 #27 0x0000000000000000 in ?? ()
Any update on resolution? Is there any fix included in 5.3? or 5.1.??
Similar problem on a newly provisioned ovirt 4.3 cluster (centos 7.6, gluster 5.2-1) : [2019-01-15 09:32:02.558598] I [MSGID: 100030] [glusterfsd.c:2691:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.2 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=ps-inf-int-kvm-fr-306-210.hostics.fr --volfile-server=10.199.211.7 --volfile-server=10.199.211.5 --volfile-id=/vmstore /rhev/data-center/mnt/glusterSD/ps-inf-int-kvm-fr-306-210.hostics.fr:_vmstore) [2019-01-15 09:32:02.566701] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-01-15 09:32:02.581138] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2019-01-15 09:32:02.581272] I [MSGID: 114020] [client.c:2354:notify] 0-vmstore-client-0: parent translators are ready, attempting connect on transport [2019-01-15 09:32:02.583283] I [MSGID: 114020] [client.c:2354:notify] 0-vmstore-client-1: parent translators are ready, attempting connect on transport [2019-01-15 09:32:02.583911] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-vmstore-client-0: changing port to 49155 (from 0) [2019-01-15 09:32:02.585505] I [MSGID: 114020] [client.c:2354:notify] 0-vmstore-client-2: parent translators are ready, attempting connect on transport [2019-01-15 09:32:02.587413] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-15 09:32:02.587441] E [MSGID: 108006] [afr-common.c:5314:__afr_handle_child_down_event] 0-vmstore-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2019-01-15 09:32:02.587951] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-15 09:32:02.588685] I [MSGID: 114046] [client-handshake.c:1107:client_setvolume_cbk] 0-vmstore-client-0: Connected to vmstore-client-0, attached to remote volume '/gluster_bricks/vmstore/vmstore'. [2019-01-15 09:32:02.588708] I [MSGID: 108005] [afr-common.c:5237:__afr_handle_child_up_event] 0-vmstore-replicate-0: Subvolume 'vmstore-client-0' came back up; going online. Final graph: +------------------------------------------------------------------------------+ 1: volume vmstore-client-0 2: type protocol/client 3: option opversion 50000 4: option clnt-lk-version 1 5: option volfile-checksum 0 6: option volfile-key /vmstore 7: option client-version 5.2 8: option process-name fuse 9: option process-uuid CTX_ID:e5dad97f-5289-4464-9e2f-36e9bb115118-GRAPH_ID:0-PID:39987-HOST:ps-inf-int-kvm-fr-307-210.hostics.fr-PC_NAME:vmstore-client-0-RECON_NO:-0 10: option fops-version 1298437 11: option ping-timeout 30 12: option remote-host 10.199.211.6 13: option remote-subvolume /gluster_bricks/vmstore/vmstore 14: option transport-type socket 15: option transport.address-family inet 16: option filter-O_DIRECT off 17: option transport.tcp-user-timeout 0 18: option transport.socket.keepalive-time 20 19: option transport.socket.keepalive-interval 2 20: option transport.socket.keepalive-count 9 21: option send-gids true 22: end-volume 23: 24: volume vmstore-client-1 25: type protocol/client 26: option ping-timeout 30 27: option remote-host 10.199.211.7 28: option remote-subvolume /gluster_bricks/vmstore/vmstore 29: option transport-type socket 30: option transport.address-family inet 31: option filter-O_DIRECT off 32: option transport.tcp-user-timeout 0 33: option transport.socket.keepalive-time 20 34: option transport.socket.keepalive-interval 2 35: option transport.socket.keepalive-count 9 36: option send-gids true 37: end-volume 38: 39: volume vmstore-client-2 40: type protocol/client 41: option ping-timeout 30 42: option remote-host 10.199.211.5 43: option remote-subvolume /gluster_bricks/vmstore/vmstore 44: option transport-type socket 45: option transport.address-family inet 46: option filter-O_DIRECT off 47: option transport.tcp-user-timeout 0 48: option transport.socket.keepalive-time 20 49: option transport.socket.keepalive-interval 2 50: option transport.socket.keepalive-count 9 51: option send-gids true 52: end-volume 53: 54: volume vmstore-replicate-0 55: type cluster/replicate 56: option afr-pending-xattr vmstore-client-0,vmstore-client-1,vmstore-client-2 57: option arbiter-count 1 58: option data-self-heal-algorithm full 59: option eager-lock enable 60: option quorum-type auto 61: option choose-local off 62: option shd-max-threads 8 63: option shd-wait-qlength 10000 64: option locking-scheme granular 65: option granular-entry-heal enable 66: option use-compound-fops off 67: subvolumes vmstore-client-0 vmstore-client-1 vmstore-client-2 68: end-volume 69: 70: volume vmstore-dht 71: type cluster/distribute 72: option lock-migration off 73: option force-migration off 74: subvolumes vmstore-replicate-0 75: end-volume 76: 77: volume vmstore-shard 78: type features/shard 79: subvolumes vmstore-dht 80: end-volume 81: 82: volume vmstore-write-behind 83: type performance/write-behind 84: option strict-O_DIRECT on 85: subvolumes vmstore-shard 86: end-volume 87: 88: volume vmstore-readdir-ahead 89: type performance/readdir-ahead 90: option parallel-readdir off 91: option rda-request-size 131072 92: option rda-cache-limit 10MB 93: subvolumes vmstore-write-behind 94: end-volume 95: 96: volume vmstore-open-behind 97: type performance/open-behind 98: subvolumes vmstore-readdir-ahead 99: end-volume 100: 101: volume vmstore-md-cache 102: type performance/md-cache 103: subvolumes vmstore-open-behind 104: end-volume 105: 106: volume vmstore 107: type debug/io-stats 108: option log-level INFO 109: option latency-measurement off 110: option count-fop-hits off 111: subvolumes vmstore-md-cache 112: end-volume 113: 114: volume meta-autoload 115: type meta 116: subvolumes vmstore 117: end-volume 118: +------------------------------------------------------------------------------+ [2019-01-15 09:32:02.590376] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-vmstore-client-2: changing port to 49155 (from 0) [2019-01-15 09:32:02.592649] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-15 09:32:02.593512] I [MSGID: 114046] [client-handshake.c:1107:client_setvolume_cbk] 0-vmstore-client-2: Connected to vmstore-client-2, attached to remote volume '/gluster_bricks/vmstore/vmstore'. [2019-01-15 09:32:02.593528] I [MSGID: 108002] [afr-common.c:5588:afr_notify] 0-vmstore-replicate-0: Client-quorum is met [2019-01-15 09:32:02.594714] I [fuse-bridge.c:4259:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2019-01-15 09:32:02.594746] I [fuse-bridge.c:4870:fuse_graph_sync] 0-fuse: switched to graph 0 [2019-01-15 09:32:06.562678] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-15 09:32:09.435695] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.2/xlator/performance/open-behind.so(+0x3d7c) [0x7f5c279cfd7c] -->/usr/lib64/glusterfs/5.2/xlator/performance/open-behind.so(+0x3bd6) [0x7f5c279cfbd6] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7f5c340ae20d] ) 0-dict: dict is NULL [Invalid argument] The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 7 times between [2019-01-15 09:32:06.562678] and [2019-01-15 09:32:27.578753] [2019-01-15 09:32:29.966249] W [glusterfsd.c:1481:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f5c32f1ddd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55af1f5bad45] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55af1f5babbb] ) 0-: received signum (15), shutting down [2019-01-15 09:32:29.966265] I [fuse-bridge.c:5897:fini] 0-fuse: Unmounting '/rhev/data-center/mnt/glusterSD/ps-inf-int-kvm-fr-306-210.hostics.fr:_vmstore'. [2019-01-15 09:32:29.985157] I [fuse-bridge.c:5134:fuse_thread_proc] 0-fuse: initating unmount of /rhev/data-center/mnt/glusterSD/ps-inf-int-kvm-fr-306-210.hostics.fr:_vmstore [2019-01-15 09:32:29.985434] I [fuse-bridge.c:5902:fini] 0-fuse: Closing fuse connection to '/rhev/data-center/mnt/glusterSD/ps-inf-int-kvm-fr-306-210.hostics.fr:_vmstore'.
Per 5.2 release note: NOTE: Next minor release tentative date: Week of 10th January, 2019 This issue is urgent and impacting customer deployment. Any projection on 5.3 availability and whether a fix will be available.
Still happening in 5.3.
(In reply to waza123 from comment #6) > I downgraded to 3.12.15 because 5.1 is not stable at all (clear install) > > Documentation for downgrade for someone who need this: > > backup your cluster data somewhere.. > > remove all instalation files > > gluster volume stop hadoop_volume > gluster volume delete hadoop_volume > killall glusterfs glusterfsd glusterd glustereventsd python > > # remove all files from bricks: > > rm -rf /hadoop/* && rm -rf /hadoop/.glusterfs > > # remove all configs > rm -rf /usr/var/lib/glusterd && rm -rf /usr/var/log/glusterfs && rm -rf > /usr/var/run/gluster && rm -rf /usr/etc/glusterfs > > # install new gluster, mount, copy all files to new cluster from backup. 3.12.13 has a memory leak in "readdir-ahead.C". I saw it fixed in 5.3, is it fixed in 3.12.15?
(In reply to Emerson Gomes from comment #11) > Still happening in 5.3. Is anybody looking at it in 5.3? This is a release waited for!!!
(In reply to Amgad from comment #13) > (In reply to Emerson Gomes from comment #11) > > Still happening in 5.3. > > Is anybody looking at it in 5.3? This is a release waited for!!! Yes, I have updated to 5.3 yesterday, and issue is still there.
I'm having what appears to be the same issue. Started when I upgraded from 3.12 to 5.2 a few weeks back, and the subsequent upgrade to 5.3 did not resolve the problem. My servers (two, in a 'replica 2' setup) publish two volumes. One is Web site content, about 110GB; the other is Web config files, only a few megabytes. (Wasn't worth building extra servers for that second volume.) FUSE clients have been crashing on the larger volume every three or four days. The client's logs show many hundreds of instances of this (I don't know if it's related): [2019-01-29 08:14:16.542674] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384) [0x7fa171ead384] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e) [0x7fa1720bee3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7fa1809cc2ad] ) 0-dict: dict is NULL [Invalid argument] Then, when the client's glusterfs process crashes, this is logged: The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 871 times between [2019-01-29 08:12:48.390535] and [2019-01-29 08:14:17.100279] pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-01-29 08:14:17 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.3 /lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84] /lib64/libc.so.6(+0x36280)[0x7fa17f03c280] /lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d] /lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2] /lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96] /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd] /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a] /lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5] /lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead] --------- Info on the volumes themselves, gathered from one of my servers: [davidsmith@wuit-s-10889 ~]$ sudo gluster volume info all Volume Name: web-config Type: Replicate Volume ID: 6c5dce6e-e64e-4a6d-82b3-f526744b463d Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 172.23.128.26:/data/web-config Brick2: 172.23.128.27:/data/web-config Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet server.event-threads: 4 client.event-threads: 4 cluster.min-free-disk: 1 cluster.quorum-count: 2 cluster.quorum-type: fixed network.ping-timeout: 10 auth.allow: * performance.readdir-ahead: on Volume Name: web-content Type: Replicate Volume ID: fcabc15f-0cec-498f-93c4-2d75ad915730 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 172.23.128.26:/data/web-content Brick2: 172.23.128.27:/data/web-content Options Reconfigured: network.ping-timeout: 10 cluster.quorum-type: fixed cluster.quorum-count: 2 performance.readdir-ahead: on auth.allow: * cluster.min-free-disk: 1 client.event-threads: 4 server.event-threads: 4 transport.address-family: inet nfs.disable: on performance.client-io-threads: off performance.cache-size: 4GB gluster> volume status all detail Status of volume: web-config ------------------------------------------------------------------------------ Brick : Brick 172.23.128.26:/data/web-config TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 5612 File System : ext3 Device : /dev/sdb1 Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 135.9GB Total Disk Space : 246.0GB Inode Count : 16384000 Free Inodes : 14962279 ------------------------------------------------------------------------------ Brick : Brick 172.23.128.27:/data/web-config TCP Port : 49152 RDMA Port : 0 Online : Y Pid : 5540 File System : ext3 Device : /dev/sdb1 Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 135.9GB Total Disk Space : 246.0GB Inode Count : 16384000 Free Inodes : 14962277 Status of volume: web-content ------------------------------------------------------------------------------ Brick : Brick 172.23.128.26:/data/web-content TCP Port : 49153 RDMA Port : 0 Online : Y Pid : 5649 File System : ext3 Device : /dev/sdb1 Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 135.9GB Total Disk Space : 246.0GB Inode Count : 16384000 Free Inodes : 14962279 ------------------------------------------------------------------------------ Brick : Brick 172.23.128.27:/data/web-content TCP Port : 49153 RDMA Port : 0 Online : Y Pid : 5567 File System : ext3 Device : /dev/sdb1 Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 135.9GB Total Disk Space : 246.0GB Inode Count : 16384000 Free Inodes : 14962277 I have a couple of core files that appear to be from this, but I'm not much of a developer (haven't touched C in fifteen years) so I don't know what to do with them that would be of value in this case.
I have same issue , and my server crash 4-5 time per day , we need urgent bug fix , we cant work any more
[2019-01-30 15:50:39.219564] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8853076771410540308.tmp (ba250583-e103-473e-92de-3e0d87afe8be) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0086.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:50:44.206312] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:50:44.350266] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-6758755102184008102.tmp (32dbb8cb-aec9-4bae-992b-fbd86cd50828) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0017.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:50:45.489090] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-6687721062137662117.tmp (62bbb010-16ff-462c-b0dd-718b0e62a8c7) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0018.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:50:45.551349] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 5 times between [2019-01-30 15:50:45.551349] and [2019-01-30 15:50:56.559333] [2019-01-30 15:51:02.317536] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8497906571178810383.tmp (15ba641a-cb3f-42d2-b9b5-b17f10e027c8) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0081.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:51:07.031853] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0086.exr (ba250583-e103-473e-92de-3e0d87afe8be) (hash=mothervolume-client-0/cache=mothervolume-client-1) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Copy #2 of Seq_A_A_Sh010_comp_SH030208_v001.0086.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:51:07.109087] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8853076771410540308.tmp (d514f600-f3e6-4639-822a-05e057e1d83c) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0086.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:51:07.620516] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8568940611225156368.tmp (f7efca88-3886-4750-ad2f-4f793fd8487d) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0082.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:51:12.458961] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8711008691317848338.tmp (92ddf1a3-50a9-48b1-be22-7bfa359b5b65) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0084.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:51:15.629779] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 53 times between [2019-01-30 15:51:15.629779] and [2019-01-30 15:51:45.695496] [2019-01-30 15:51:45.700709] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8497906571178810383.tmp (f853a226-70fb-4537-a629-e1e2cefdcfe7) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0081.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:51:47.398973] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:51:47.588670] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:51:51.885883] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8568940611225156368.tmp (8f5bcab2-e3ba-478a-957c-fdf243216a4e) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0082.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:51:53.453191] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 13 times between [2019-01-30 15:51:53.453191] and [2019-01-30 15:51:56.196530] [2019-01-30 15:51:56.510824] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8711008691317848338.tmp (45aaff0e-15b5-4b6b-8b41-c4caed57f881) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0084.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:51:57.207664] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 81 times between [2019-01-30 15:51:57.207664] and [2019-01-30 15:52:19.002777] [2019-01-30 15:52:19.183448] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr (504238f8-7918-496e-835e-3246d16cf35e) (hash=mothervolume-client-1/cache=mothervolume-client-0) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:52:19.257335] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/1446751043051559505.tmp (3dabe7b2-9682-4b33-842e-b144533e97d4) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:52:19.574477] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 4 times between [2019-01-30 15:52:19.574477] and [2019-01-30 15:52:24.127146] [2019-01-30 15:52:24.656623] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/nuke/Seq_A_A_Sh010_comp_scene_208.v002.nk.autosavet (9ef65443-4745-448d-a8b4-fa3f3bbf7487) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/nuke/Seq_A_A_Sh010_comp_scene_208.v002.nk.autosave ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:52:24.899131] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:52:27.431451] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8497906571178810383.tmp (749b5a12-80a7-47d5-b5d2-2e60cbea57aa) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0081.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:52:30.891799] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:52:31.047076] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:52:32.939577] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8568940611225156368.tmp (952bcc2f-a486-4d61-ade5-329e1a6165a8) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0082.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:52:37.606502] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8711008691317848338.tmp (83ebc0c0-ba5c-4fdc-808e-f343e1ae28e2) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0084.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:52:43.967857] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:52:55.087185] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/nuke/Seq_A_A_Sh010_comp_scene_208.v002.nk.autosavet (4f7b2158-c4b7-4579-8e8a-7ab3dc8d9b0d) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/nuke/Seq_A_A_Sh010_comp_scene_208.v002.nk.autosave ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:53:17.204114] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2765564699243796980.tmp (e72fb096-c4ef-490a-a39c-47531608dd63) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0080.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:53:17.396151] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:53:22.458305] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2694530659197450995.tmp (e02e5460-3557-470b-a7f3-109a9914c692) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0081.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:53:26.229226] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2623496619151105010.tmp (2fa4c400-77a0-459e-b734-8e4f28926859) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0082.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:53:32.149207] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2552462579104759025.tmp (cb9b9130-fc7e-4c04-89e9-7b925c955669) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0083.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 18 times between [2019-01-30 15:53:17.396151] and [2019-01-30 15:53:34.167757] [2019-01-30 15:53:37.062257] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0098.exr (f20e23cd-76ff-4371-a31b-0a9cf9022860) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Copy #1 of Seq_A_A_Sh010_comp_SH030208_v001.0098.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:53:37.149778] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/1730887203236943445.tmp (f4b57fa7-4de1-4de7-bc06-00e2741c5129) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0098.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:53:37.306807] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2481428539058413040.tmp (27286f68-e368-48c1-b497-0300cc3af4c7) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0084.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:53:38.961986] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr (3dabe7b2-9682-4b33-842e-b144533e97d4) (hash=mothervolume-client-1/cache=mothervolume-client-0) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Copy #1 of Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:53:39.053762] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/1446751043051559505.tmp (3055dc73-8b71-4437-9dab-8219f7ea6189) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:53:43.220690] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2410394499012067055.tmp (41462049-f2d4-4638-b137-c7921219820b) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0085.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:53:44.188358] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 4 times between [2019-01-30 15:53:44.188358] and [2019-01-30 15:53:45.698529] [2019-01-30 15:53:47.773401] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/8497906571178810383.tmp (def787fb-4c77-4049-9629-15db1b4acd36) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0081.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:53:48.345901] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2339360458965721070.tmp (53f63fd7-904e-412d-a789-c2170735a61f) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0086.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) [2019-01-30 15:53:49.291189] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:53:49.450504] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:53:53.495085] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2268326418919375085.tmp (5a339407-94ab-456e-9670-9488c43e5a9e) (hash=mothervolume-client-1/cache=mothervolume-client-1) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0087.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:53:54.919809] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:53:56.335023] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-30 15:53:58.191979] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/-2197292378873029100.tmp (12a7836b-4e9d-46db-9f03-acd30edee2f1) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH010208/Seq_A_A_Sh010_comp_SH010208_v001.0088.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:53:58.920443] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 7 times between [2019-01-30 15:53:58.920443] and [2019-01-30 15:54:00.336410] [2019-01-30 15:54:00.519418] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr (3055dc73-8b71-4437-9dab-8219f7ea6189) (hash=mothervolume-client-1/cache=mothervolume-client-0) => /.recycle/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Copy #2 of Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-0/cache=<nul>) [2019-01-30 15:54:00.601804] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-mothervolume-dht: renaming /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/1446751043051559505.tmp (1ed49b72-c1d5-4704-9ebf-00814afdcb43) (hash=mothervolume-client-0/cache=mothervolume-client-0) => /work_serveur/Peugeot_phev/sequences/Seq_A/A_Sh010/comp/work/images/EXR_Seq_A_A_Sh010_comp_v001/SH030208/Seq_A_A_Sh010_comp_SH030208_v001.0094.exr ((null)) (hash=mothervolume-client-1/cache=<nul>) pending frames: frame : type(0) op(0) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(1) op(OPEN) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 6 time of crash: 2019-01-30 15:54:00 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.3 /lib64/libglusterfs.so.0(+0x26610)[0x7f30187de610] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f30187e8b84] /lib64/libc.so.6(+0x36280)[0x7f3016e42280] /lib64/libc.so.6(gsignal+0x37)[0x7f3016e42207] /lib64/libc.so.6(abort+0x148)[0x7f3016e438f8] /lib64/libc.so.6(+0x78d27)[0x7f3016e84d27] /lib64/libc.so.6(+0x81489)[0x7f3016e8d489] /lib64/libglusterfs.so.0(+0x1a6e9)[0x7f30187d26e9] /usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x8cf9)[0x7f300a9a7cf9] /usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x4ab90)[0x7f300a9e9b90] /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x616d2)[0x7f300acb86d2] /lib64/libgfrpc.so.0(+0xec70)[0x7f30185aac70] /lib64/libgfrpc.so.0(+0xf043)[0x7f30185ab043] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f30185a6f23] /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7f300d19337b] /lib64/libglusterfs.so.0(+0x8aa49)[0x7f3018842a49] /lib64/libpthread.so.0(+0x7dd5)[0x7f3017641dd5] /lib64/libc.so.6(clone+0x6d)[0x7f3016f09ead]
Created attachment 1525090 [details] Mount Log
I'm also experiencing this issue, began after an upgrade to 5.1, continued to occur through upgrades to 5.3 The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 447 times between [2019-01-30 18:13:29.742333] and [2019-01-30 18:15:27.890656] [2019-01-30 18:15:34.980908] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 27 times between [2019-01-30 18:15:34.980908] and [2019-01-30 18:17:23.626256] [2019-01-30 18:17:31.085125] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 31 times between [2019-01-30 18:17:31.085125] and [2019-01-30 18:19:27.231000] [2019-01-30 18:19:38.782441] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
Got a ton of these in my logs after upgrading from 4.1 to 5.3, in addition to a lot of repeated messages here https://bugzilla.redhat.com/show_bug.cgi?id=1313567. ==> mnt-SITE_data1.log <== [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] ==> mnt-SITE_data3.log <== The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 413 times between [2019-01-30 20:36:23.881090] and [2019-01-30 20:38:20.015593] The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0" repeated 42 times between [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] ==> mnt-SITE_data1.log <== The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0" repeated 50 times between [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and [2019-01-30 20:38:20.546355] [2019-01-30 20:38:21.492319] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0 ==> mnt-SITE_data3.log <== [2019-01-30 20:38:22.349689] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0 ==> mnt-SITE_data1.log <== [2019-01-30 20:38:22.762941] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler
I've seen this issue in about 20 different environments (large and small, all of which were upgraded from 3.x)
We have not upgraded from 3.x , we have fresh install of 5.x and have same issue
Corrected the version and assigned this to Milind to backport the relevant patches to release-5. As per an email discussion, he confirmed that the following patches are required to fix the flood of "Failed to dispatch handler" logs. https://review.gluster.org/#/c/glusterfs/+/22044 https://review.gluster.org/#/c/glusterfs/+/22046/
(In reply to David E. Smith from comment #15) > I'm having what appears to be the same issue. Started when I upgraded from > 3.12 to 5.2 a few weeks back, and the subsequent upgrade to 5.3 did not > resolve the problem. > > My servers (two, in a 'replica 2' setup) publish two volumes. One is Web > site content, about 110GB; the other is Web config files, only a few > megabytes. (Wasn't worth building extra servers for that second volume.) > FUSE clients have been crashing on the larger volume every three or four > days. > > The client's logs show many hundreds of instances of this (I don't know if > it's related): > [2019-01-29 08:14:16.542674] W [dict.c:761:dict_ref] > (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384) > [0x7fa171ead384] > -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e) > [0x7fa1720bee3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7fa1809cc2ad] > ) 0-dict: dict is NULL [Invalid argument] > > Then, when the client's glusterfs process crashes, this is logged: > > The message "E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler" repeated 871 times between [2019-01-29 08:12:48.390535] and > [2019-01-29 08:14:17.100279] > pending frames: > frame : type(1) op(LOOKUP) > frame : type(1) op(LOOKUP) > frame : type(0) op(0) > frame : type(0) op(0) > patchset: git://git.gluster.org/glusterfs.git > signal received: 11 > time of crash: > 2019-01-29 08:14:17 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 5.3 > /lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610] > /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84] > /lib64/libc.so.6(+0x36280)[0x7fa17f03c280] > /lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d] > /lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2] > /lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96] > /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd] > /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a] > /lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5] > /lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead] > --------- > > > > Info on the volumes themselves, gathered from one of my servers: > > [davidsmith@wuit-s-10889 ~]$ sudo gluster volume info all > > Volume Name: web-config > Type: Replicate > Volume ID: 6c5dce6e-e64e-4a6d-82b3-f526744b463d > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 172.23.128.26:/data/web-config > Brick2: 172.23.128.27:/data/web-config > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > server.event-threads: 4 > client.event-threads: 4 > cluster.min-free-disk: 1 > cluster.quorum-count: 2 > cluster.quorum-type: fixed > network.ping-timeout: 10 > auth.allow: * > performance.readdir-ahead: on > > Volume Name: web-content > Type: Replicate > Volume ID: fcabc15f-0cec-498f-93c4-2d75ad915730 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 172.23.128.26:/data/web-content > Brick2: 172.23.128.27:/data/web-content > Options Reconfigured: > network.ping-timeout: 10 > cluster.quorum-type: fixed > cluster.quorum-count: 2 > performance.readdir-ahead: on > auth.allow: * > cluster.min-free-disk: 1 > client.event-threads: 4 > server.event-threads: 4 > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > performance.cache-size: 4GB > > > > gluster> volume status all detail > Status of volume: web-config > ----------------------------------------------------------------------------- > - > Brick : Brick 172.23.128.26:/data/web-config > TCP Port : 49152 > RDMA Port : 0 > Online : Y > Pid : 5612 > File System : ext3 > Device : /dev/sdb1 > Mount Options : rw,seclabel,relatime,data=ordered > Inode Size : 256 > Disk Space Free : 135.9GB > Total Disk Space : 246.0GB > Inode Count : 16384000 > Free Inodes : 14962279 > ----------------------------------------------------------------------------- > - > Brick : Brick 172.23.128.27:/data/web-config > TCP Port : 49152 > RDMA Port : 0 > Online : Y > Pid : 5540 > File System : ext3 > Device : /dev/sdb1 > Mount Options : rw,seclabel,relatime,data=ordered > Inode Size : 256 > Disk Space Free : 135.9GB > Total Disk Space : 246.0GB > Inode Count : 16384000 > Free Inodes : 14962277 > > Status of volume: web-content > ----------------------------------------------------------------------------- > - > Brick : Brick 172.23.128.26:/data/web-content > TCP Port : 49153 > RDMA Port : 0 > Online : Y > Pid : 5649 > File System : ext3 > Device : /dev/sdb1 > Mount Options : rw,seclabel,relatime,data=ordered > Inode Size : 256 > Disk Space Free : 135.9GB > Total Disk Space : 246.0GB > Inode Count : 16384000 > Free Inodes : 14962279 > ----------------------------------------------------------------------------- > - > Brick : Brick 172.23.128.27:/data/web-content > TCP Port : 49153 > RDMA Port : 0 > Online : Y > Pid : 5567 > File System : ext3 > Device : /dev/sdb1 > Mount Options : rw,seclabel,relatime,data=ordered > Inode Size : 256 > Disk Space Free : 135.9GB > Total Disk Space : 246.0GB > Inode Count : 16384000 > Free Inodes : 14962277 > > > I have a couple of core files that appear to be from this, but I'm not much > of a developer (haven't touched C in fifteen years) so I don't know what to > do with them that would be of value in this case. Please file a separate BZ for the crashes and provide the bt and corefiles.
REVIEW: https://review.gluster.org/22134 (socket: fix issue when socket write return with EAGAIN) posted (#1) for review on release-5 by Milind Changire
REVIEW: https://review.gluster.org/22135 (socket: don't pass return value from protocol handler to event handler) posted (#1) for review on release-5 by Milind Changire
I wish I saw this bug report before I updated from rock solid 4.1 to 5.3. Less than 24 hours after upgrading, I already got a crash and had to unmount, kill gluster, and remount: [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993] pending frames: frame : type(1) op(READ) frame : type(1) op(OPEN) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 6 time of crash: 2019-01-31 09:38:04 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.3 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] /lib64/libc.so.6(+0x36160)[0x7fccd622d160] /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] --------- Do the pending patches fix the crash or only the repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, not too sure how to make it core dump. If it's not fixed by the patches above, has anyone already opened a ticket for the crashes that I can join and monitor? This is going to create a massive problem for us since production systems are crashing. Thanks.
As requested, opened a new bug report for my crashes, https://bugzilla.redhat.com/show_bug.cgi?id=1671556 . Links to cores will be added there Really Soon.
The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this? In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround. By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process. monit check: check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 start program = "/bin/mount /mnt/glusterfs_data1" stop program = "/bin/umount /mnt/glusterfs_data1" if space usage > 90% for 5 times within 15 cycles then alert else if succeeded for 10 cycles then alert stack trace: [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427] The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] pending frames: frame : type(1) op(LOOKUP) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 6 time of crash: 2019-02-01 23:22:03 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.3 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
the following line the backtrace which is the topmost line pointing to gluster bits: /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] resolves to: afr-common.c:2203 intersection = alloca0(priv->child_count); ----- NOTE: print-backtrace.sh isn't helping here because the naming convention of rpms have changed
REVIEW: https://review.gluster.org/22135 (socket: don't pass return value from protocol handler to event handler) merged (#2) on release-5 by Shyamsundar Ranganathan
REVIEW: https://review.gluster.org/22134 (socket: fix issue when socket write return with EAGAIN) merged (#2) on release-5 by Shyamsundar Ranganathan
I'm also having problems with Gluster bricks going offline since upgrading to oVirt 4.3 yesterday (previously I've never had a single issue with gluster nor have had a brick ever go down). I suspect this will continue to happen daily as some other users on this group have suggested. I was able to pull some logs from engine and gluster from around the time the brick dropped. My setup is 3 node HCI and I was previously running the latest 4.2 updates (before upgrading to 4.3). My hardware is has a lot of overhead and I'm on 10Gbe gluster backend (the servers were certainly not under any significant amount of load when the brick went offline). To recover I had to place the host in maintenance mode and reboot (although I suspect I could have simply unmounted and remounted gluster mounts). grep "2019-02-14" engine.log-20190214 | grep "GLUSTER_BRICK_STATUS_CHANGED" 2019-02-14 02:41:48,018-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from UP to DOWN via cli. 2019-02-14 03:20:11,189-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/engine/engine of volume engine of cluster Default from DOWN to UP via cli. 2019-02-14 03:20:14,819-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/prod_b/prod_b of volume prod_b of cluster Default from DOWN to UP via cli. 2019-02-14 03:20:19,692-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/isos/isos of volume isos of cluster Default from DOWN to UP via cli. 2019-02-14 03:20:25,022-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/prod_a/prod_a of volume prod_a of cluster Default from DOWN to UP via cli. 2019-02-14 03:20:29,088-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from DOWN to UP via cli. 2019-02-14 03:20:34,099-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [760f7851] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_a/non_prod_a of volume non_prod_a of cluster Default from DOWN to UP via cli glusterd.log # grep -B20 -A20 "2019-02-14 02:41" glusterd.log [2019-02-14 02:36:49.585034] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b [2019-02-14 02:36:49.597788] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:36:49.597788] and [2019-02-14 02:36:49.900505] [2019-02-14 02:36:53.437539] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a [2019-02-14 02:36:53.452816] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-02-14 02:36:53.864153] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a [2019-02-14 02:36:53.875835] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-02-14 02:36:30.958649] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine [2019-02-14 02:36:35.322129] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b [2019-02-14 02:36:39.639645] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos [2019-02-14 02:36:45.301275] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:36:53.875835] and [2019-02-14 02:36:54.180780] [2019-02-14 02:37:59.193409] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-02-14 02:38:44.065560] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine [2019-02-14 02:38:44.072680] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos [2019-02-14 02:38:44.077841] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a [2019-02-14 02:38:44.082798] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b [2019-02-14 02:38:44.088237] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a [2019-02-14 02:38:44.093518] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 2 times between [2019-02-14 02:37:59.193409] and [2019-02-14 02:38:44.100494] [2019-02-14 02:41:58.649683] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 6 times between [2019-02-14 02:41:58.649683] and [2019-02-14 02:43:00.286999] [2019-02-14 02:43:46.366743] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine [2019-02-14 02:43:46.373587] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos [2019-02-14 02:43:46.378997] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a [2019-02-14 02:43:46.384324] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b [2019-02-14 02:43:46.390310] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a [2019-02-14 02:43:46.397031] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b [2019-02-14 02:43:46.404083] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-02-14 02:45:47.302884] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine [2019-02-14 02:45:47.309697] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos [2019-02-14 02:45:47.315149] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a [2019-02-14 02:45:47.320806] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b [2019-02-14 02:45:47.326865] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a [2019-02-14 02:45:47.332192] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b [2019-02-14 02:45:47.338991] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-02-14 02:46:47.789575] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_b [2019-02-14 02:46:47.795276] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_a [2019-02-14 02:46:47.800584] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume prod_b [2019-02-14 02:46:47.770601] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume engine [2019-02-14 02:46:47.778161] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume isos [2019-02-14 02:46:47.784020] I [MSGID: 106499] [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: Received status volume req for volume non_prod_a engine.log # grep -B20 -A20 "2019-02-14 02:41:48" engine.log-20190214 2019-02-14 02:41:43,495-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 172c9ee8 2019-02-14 02:41:43,609-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@479fcb69, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6443e68f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2b4cf035, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5864f06a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6119ac8c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1a9549be, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5614cf81, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@290c9289, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5dd26e8, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@35355754, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@452deeb4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8f8b442, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@647e29d3, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7bee4dff, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@511c4478, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c0bb0bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@92e325e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@260731, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@33aaacc9, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@72657c59, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@aa10c89], log id: 172c9ee8 2019-02-14 02:41:43,610-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 3a0e9d63 2019-02-14 02:41:43,703-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@5ca4a20f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@57a8a76, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@7bd1b14], log id: 3a0e9d63 2019-02-14 02:41:43,704-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 49966b05 2019-02-14 02:41:44,213-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 49966b05 2019-02-14 02:41:44,214-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 30db0ce2 2019-02-14 02:41:44,311-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@61a309b5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@ea9cb2e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@749d57bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c49f9d0, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@655eb54d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@256ee273, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3bd079dc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@6804900f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@78e0a49f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2acfbc8a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12e92e96, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5ea1502c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2398c33b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7464102e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2f221daa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7b561852, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1eb29d18, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4a030b80, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@75739027, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3eac8253, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@34fc82c3], log id: 30db0ce2 2019-02-14 02:41:44,312-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 6671d0d7 2019-02-14 02:41:44,329-04 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}' 2019-02-14 02:41:44,345-04 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}' 2019-02-14 02:41:44,374-04 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}' 2019-02-14 02:41:44,405-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@f6a9696, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@558e3332, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@5b449da], log id: 6671d0d7 2019-02-14 02:41:44,406-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 6d2bc6d3 2019-02-14 02:41:44,908-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 6d2bc6d3 2019-02-14 02:41:44,909-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVolumeAdvancedDetailsVDSCommand(HostName = Host0, GlusterVolumeAdvancedDetailsVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5', volumeName='non_prod_b'}), log id: 36ae23c6 2019-02-14 02:41:47,336-04 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}' 2019-02-14 02:41:47,351-04 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}' 2019-02-14 02:41:47,379-04 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler3) [7b9bd2d] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}' 2019-02-14 02:41:47,979-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVolumeAdvancedDetailsVDSCommand, return: org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeAdvancedDetails@7a4a787b, log id: 36ae23c6 2019-02-14 02:41:48,018-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b of cluster Default from UP to DOWN via cli. 2019-02-14 02:41:48,046-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID: GLUSTER_BRICK_STATUS_DOWN(4,151), Status of brick host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume non_prod_b on cluster Default is down. 2019-02-14 02:41:48,139-04 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler1) [5ff5b093] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}' 2019-02-14 02:41:48,140-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] START, GlusterServersListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: e1fb23 2019-02-14 02:41:48,911-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] FINISH, GlusterServersListVDSCommand, return: [10.12.0.220/24:CONNECTED, host1.replaced.domain.com:CONNECTED, host2.replaced.domain.com:CONNECTED], log id: e1fb23 2019-02-14 02:41:48,930-04 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler1) [5ff5b093] Failed to acquire lock and wait lock 'EngineLock:{exclusiveLocks='[a45fe964-9989-11e8-b3f7-00163e4bf18a=GLUSTER]', sharedLocks=''}' 2019-02-14 02:41:48,931-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] START, GlusterVolumesListVDSCommand(HostName = Host0, GlusterVolumesListVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 68f1aecc 2019-02-14 02:41:49,366-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler3) [7b9bd2d] FINISH, GlusterVolumesListVDSCommand, return: {6c05dfc6-4dc0-41e3-a12f-55b4767f1d35=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@1952a85, 3f8f6a0f-aed4-48e3-9129-18a2a3f64eef=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@2f6688ae, 71ff56d9-79b8-445d-b637-72ffc974f109=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@730210fb, 752a9438-cd11-426c-b384-bc3c5f86ed07=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@c3be510c, c3e7447e-8514-4e4a-9ff5-a648fe6aa537=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@450befac, 79e8e93c-57c8-4541-a360-726cec3790cf=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@1926e392}, log id: 68f1aecc 2019-02-14 02:41:49,489-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 38debe74 2019-02-14 02:41:49,581-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5e5a7925, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2cdf5c9e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@443cb62, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@49a3e880, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@443d23c0, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1250bc75, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8d27d86, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5e6363f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@73ed78db, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@64c9d1c7, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7fecbe95, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3a551e5f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2266926e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@88b380c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1209279e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3c6466, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@16df63ed, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@47456262, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c2b88c3, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7f57c074, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12fa0478], log id: 38debe74 2019-02-14 02:41:49,582-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 7ec02237 2019-02-14 02:41:49,660-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@3eedd0bc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@7f78e375, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@3d63e126], log id: 7ec02237 2019-02-14 02:41:49,661-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host0, VdsIdVDSCommandParametersBase:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5'}), log id: 42cdad27 2019-02-14 02:41:50,142-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 42cdad27 2019-02-14 02:41:50,143-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 12f5fdf2 2019-02-14 02:41:50,248-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2aaed792, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@8e66930, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@276d599e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1aca2aec, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@46846c60, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7d103269, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@30fc25fc, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7baae445, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1ea8603c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@62578afa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@33d58089, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1f71d27a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4205e828, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1c5bbac8, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@395a002, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@12664008, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7f4faec4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@3e03d61f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1038e46d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@307e8062, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@32453127], log id: 12f5fdf2 2019-02-14 02:41:50,249-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 1256aa5e 2019-02-14 02:41:50,338-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@459a2ff5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@123cab4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@1af41fbe], log id: 1256aa5e 2019-02-14 02:41:50,339-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host1, VdsIdVDSCommandParametersBase:{hostId='fb1e62d5-1dc1-4ccc-8b2b-cf48f7077d0d'}), log id: 3dd752e4 2019-02-14 02:41:50,847-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 3dd752e4 2019-02-14 02:41:50,848-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalLogicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 29a6272c 2019-02-14 02:41:50,954-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalLogicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@364f3ec6, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@c7cce5e, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@b3bed47, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@13bc244b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5cca81f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@36aeba0d, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@62ab384a, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@1047d628, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@188a30f5, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@5bb79f3b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@60e5956f, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@4e3df9cd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@7796567, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@60d06cf4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@2cd2d36c, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@d80a4aa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@411eaa20, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@22cac93b, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@18b927bd, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@101465f4, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalLogicalVolume@246f927c], log id: 29a6272c 2019-02-14 02:41:50,955-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterLocalPhysicalVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 501814db 2019-02-14 02:41:51,044-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterLocalPhysicalVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterLocalPhysicalVolumeListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@1cd55aa, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@32c5aba2, org.ovirt.engine.core.common.businessentities.gluster.GlusterLocalPhysicalVolume@6ae123f4], log id: 501814db 2019-02-14 02:41:51,045-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVDOVolumeListVDSCommand(HostName = Host2, VdsIdVDSCommandParametersBase:{hostId='fd0752d8-2d41-45b0-887a-0ffacbb8a237'}), log id: 7acf4cbf 2019-02-14 02:41:51,546-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVDOVolumeListVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] FINISH, GetGlusterVDOVolumeListVDSCommand, return: [], log id: 7acf4cbf 2019-02-14 02:41:51,547-04 INFO [org.ovirt.engine.core.vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand] (DefaultQuartzScheduler1) [5ff5b093] START, GetGlusterVolumeAdvancedDetailsVDSCommand(HostName = Host0, GlusterVolumeAdvancedDetailsVDSParameters:{hostId='771c67eb-56e6-4736-8c67-668502d4ecf5', volumeName='non_prod_a'}), log id: 11c42649
REVIEW: https://review.gluster.org/22221 (socket: socket event handlers now return void) posted (#1) for review on master by Milind Changire
Find below GDB output from crash. Id Target Id Frame 12 Thread 0x7fea4ae43700 (LWP 26597) 0x00007fea530e2361 in sigwait () from /lib64/libpthread.so.0 11 Thread 0x7fea54773780 (LWP 26595) 0x00007fea530dbf47 in pthread_join () from /lib64/libpthread.so.0 10 Thread 0x7fea47392700 (LWP 26601) 0x00007fea530e14ed in __lll_lock_wait () from /lib64/libpthread.so.0 9 Thread 0x7fea3f7fe700 (LWP 26604) 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6 8 Thread 0x7fea3ffff700 (LWP 26603) 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6 7 Thread 0x7fea3effd700 (LWP 26605) 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6 6 Thread 0x7fea3dffb700 (LWP 26615) 0x00007fea530de965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x7fea49640700 (LWP 26600) 0x00007fea530ded12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 0x7fea4a642700 (LWP 26598) 0x00007fea52969e2d in nanosleep () from /lib64/libc.so.6 3 Thread 0x7fea49e41700 (LWP 26599) 0x00007fea530ded12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x7fea4b644700 (LWP 26596) 0x00007fea530e1e3d in nanosleep () from /lib64/libpthread.so.0 * 1 Thread 0x7fea3e7fc700 (LWP 26614) 0x00007fea45b62ff1 in ioc_inode_update () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so Thread 12 (Thread 0x7fea4ae43700 (LWP 26597)): #0 0x00007fea530e2361 in sigwait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x000055959d410e2b in glusterfs_sigwaiter () No symbol table info available. #2 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 11 (Thread 0x7fea54773780 (LWP 26595)): #0 0x00007fea530dbf47 in pthread_join () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fea542dadb8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0 No symbol table info available. #2 0x000055959d40d56b in main () No symbol table info available. Thread 10 (Thread 0x7fea47392700 (LWP 26601)): #0 0x00007fea530e14ed in __lll_lock_wait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fea530dcdcb in _L_lock_883 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00007fea530dcc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fea45b62fb6 in ioc_inode_update () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so No symbol table info available. #4 0x00007fea45b6314a in ioc_lookup_cbk () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so No symbol table info available. #5 0x00007fea461a0343 in wb_lookup_cbk () from /usr/lib64/glusterfs/5.3/xlator/performance/write-behind.so No symbol table info available. #6 0x00007fea463f2b79 in dht_revalidate_cbk () from /usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so No symbol table info available. #7 0x00007fea466d09e5 in afr_lookup_done () from /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so No symbol table info available. #8 0x00007fea466d1198 in afr_lookup_metadata_heal_check () from /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so No symbol table info available. #9 0x00007fea466d1cbb in afr_lookup_entry_heal () from /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so No symbol table info available. #10 0x00007fea466d1f99 in afr_lookup_cbk () from /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so No symbol table info available. #11 0x00007fea4695a6d2 in client4_0_lookup_cbk () from /usr/lib64/glusterfs/5.3/xlator/protocol/client.so No symbol table info available. #12 0x00007fea54043c70 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0 No symbol table info available. #13 0x00007fea54044043 in rpc_clnt_notify () from /lib64/libgfrpc.so.0 No symbol table info available. #14 0x00007fea5403ff23 in rpc_transport_notify () from /lib64/libgfrpc.so.0 No symbol table info available. #15 0x00007fea48c2c37b in socket_event_handler () from /usr/lib64/glusterfs/5.3/rpc-transport/socket.so No symbol table info available. #16 0x00007fea542dba49 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 No symbol table info available. #17 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #18 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 9 (Thread 0x7fea3f7fe700 (LWP 26604)): #0 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fea542db790 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 No symbol table info available. #2 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 8 (Thread 0x7fea3ffff700 (LWP 26603)): #0 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fea542db790 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 No symbol table info available. #2 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 7 (Thread 0x7fea3effd700 (LWP 26605)): #0 0x00007fea529a3483 in epoll_wait () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fea542db790 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 No symbol table info available. #2 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 6 (Thread 0x7fea3dffb700 (LWP 26615)): #0 0x00007fea530de965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fea4b64ddbb in notify_kernel_loop () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #2 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 5 (Thread 0x7fea49640700 (LWP 26600)): #0 0x00007fea530ded12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fea542b6cf8 in syncenv_task () from /lib64/libglusterfs.so.0 No symbol table info available. #2 0x00007fea542b7c40 in syncenv_processor () from /lib64/libglusterfs.so.0 No symbol table info available. #3 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 4 (Thread 0x7fea4a642700 (LWP 26598)): #0 0x00007fea52969e2d in nanosleep () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fea52969cc4 in sleep () from /lib64/libc.so.6 No symbol table info available. #2 0x00007fea542a2e7d in pool_sweeper () from /lib64/libglusterfs.so.0 No symbol table info available. #3 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 3 (Thread 0x7fea49e41700 (LWP 26599)): #0 0x00007fea530ded12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fea542b6cf8 in syncenv_task () from /lib64/libglusterfs.so.0 No symbol table info available. #2 0x00007fea542b7c40 in syncenv_processor () from /lib64/libglusterfs.so.0 No symbol table info available. #3 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 2 (Thread 0x7fea4b644700 (LWP 26596)): #0 0x00007fea530e1e3d in nanosleep () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fea54285f76 in gf_timer_proc () from /lib64/libglusterfs.so.0 No symbol table info available. #2 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available. Thread 1 (Thread 0x7fea3e7fc700 (LWP 26614)): #0 0x00007fea45b62ff1 in ioc_inode_update () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so No symbol table info available. #1 0x00007fea45b634cb in ioc_readdirp_cbk () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so No symbol table info available. #2 0x00007fea45d7a69f in rda_readdirp () from /usr/lib64/glusterfs/5.3/xlator/performance/readdir-ahead.so No symbol table info available. #3 0x00007fea45b5eb0e in ioc_readdirp () from /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so No symbol table info available. #4 0x00007fea4594f8e7 in qr_readdirp () from /usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so No symbol table info available. #5 0x00007fea5430bfb1 in default_readdirp () from /lib64/libglusterfs.so.0 No symbol table info available. #6 0x00007fea455333e6 in mdc_readdirp () from /usr/lib64/glusterfs/5.3/xlator/performance/md-cache.so No symbol table info available. #7 0x00007fea452f7d32 in io_stats_readdirp () from /usr/lib64/glusterfs/5.3/xlator/debug/io-stats.so No symbol table info available. #8 0x00007fea5430bfb1 in default_readdirp () from /lib64/libglusterfs.so.0 No symbol table info available. #9 0x00007fea450dc343 in meta_readdirp () from /usr/lib64/glusterfs/5.3/xlator/meta.so No symbol table info available. #10 0x00007fea4b659697 in fuse_readdirp_resume () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #11 0x00007fea4b64cc45 in fuse_resolve_all () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #12 0x00007fea4b64c958 in fuse_resolve () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #13 0x00007fea4b64cc8e in fuse_resolve_all () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #14 0x00007fea4b64bf23 in fuse_resolve_continue () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #15 0x00007fea4b64c8d6 in fuse_resolve () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #16 0x00007fea4b64cc6e in fuse_resolve_all () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #17 0x00007fea4b64ccb0 in fuse_resolve_and_resume () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #18 0x00007fea4b664d7a in fuse_thread_proc () from /usr/lib64/glusterfs/5.3/xlator/mount/fuse.so No symbol table info available. #19 0x00007fea530dadd5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #20 0x00007fea529a2ead in clone () from /lib64/libc.so.6 No symbol table info available.
Core dump: https://drive.google.com/open?id=1cEehuPAdXHIR7eG_-RsbJkmu8lJz80k6
1. crash listing in comment #3 points the disperse xlator /usr/lib/glusterfs/5.1/xlator/cluster/disperse.so(+0xf8a4)[0x7f0e23aec8a4] 2. crash listing in comment #15 points to inode_forget_with_unref from the fuse xlator /lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96] 3. crash listing in comment #17 points to the distribute xlator /usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x8cf9)[0x7f300a9a7cf9] 4. crash listing in comment #27 points to the replicate xlator /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] 5. crash listing in comment #29 points to the replicate xlator /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] See comment #30 for a preliminary finding about this crash Ravi, could you please take a look at item#5 above.
REVIEW: https://review.gluster.org/22221 (socket: socket event handlers now return void) merged (#4) on master by Amar Tumballi
(In reply to Emerson Gomes from comment #35) > Find below GDB output from crash. > Please use BZ#1671556 to report any Fuse client crashes. These look similar to an issue in the write-behind translator that we are working to fix. Try setting performance.write-behind to off and let us know if you still see the crashes.
Clearing the need info on me based on comment #39.
REVIEW: https://review.gluster.org/22237 (socket: socket event handlers now return void) posted (#1) for review on release-5 by Milind Changire
Any news on a patched version for ovirt 4.3 ? We keep seeing crashes like these too..
REVIEW: https://review.gluster.org/22237 (socket: socket event handlers now return void) merged (#4) on release-5 by Shyamsundar Ranganathan
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.5, please open a new bug report. glusterfs-5.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-March/000119.html [2] https://www.gluster.org/pipermail/gluster-users/