1313567 – flooding of "dict is NULL" logging

Bug 1313567 - flooding of "dict is NULL" logging

Summary: flooding of "dict is NULL" logging

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	index
Sub Component:
Version:	3.7.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Anuradha
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1671603 1674225
TreeView+	depends on / blocked

Reported:	2016-03-01 23:14 UTC by evangelos
Modified:	2019-03-08 17:38 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Clones:	1671603 (view as bug list)
Environment:
Last Closed:	2016-07-11 09:09:11 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description evangelos 2016-03-01 23:14:36 UTC

Description of problem:
following logs flood the log files

[2016-03-01 10:45:51.688339] W [dict.c:1282:dict_foreach_match] (-->/usr/lib64/libglusterfs.so.0(dict_foreach_match+0x65) [0x7ff139e1e5d5] -->/usr/lib64/glusterfs/3.7.8/xlator/features/index.so(+0x3950) [0x7ff12de49950] -->/usr/lib64/libglusterfs.so.0(dict_foreach_match+0xe1) [0x7ff139e1e651] ) 0-dict: dict|match|action is NULL [Invalid argument]

Version-Release number of selected component (if applicable):
glusterfs-3.7.8

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I have seen the older one 
https://bugzilla.redhat.com/show_bug.cgi?id=1289893
but since i am using the latest version (3.7.8) fix is there. Could this be related to another part of index.c ?

Comment 2 Nithya Balachandran 2016-03-09 04:16:05 UTC

This looks like it refers to an upstream release (3.7.8). Changing the product to reflect this.

Comment 3 evangelos 2016-03-09 09:18:21 UTC

is there any update on this ?

thank you very much!

Comment 4 Nithya Balachandran 2016-03-09 10:43:09 UTC

Moving this to Anuradha who worked on the original patch.

Comment 5 evangelos 2016-04-12 11:20:00 UTC

is there any update on this ?

Comment 6 Anuradha 2016-06-23 10:01:53 UTC

Hi evangelos,

That fix was made in 3.7.5. You say you have been the old issue. Did you upgrade from 3.7.5 to 3.7.8 and are seeing problem or was this volume freshly created based on 3.7.8?

As far as I know all the fixes for dict is NULL in index translator are sent. But there is an issue when volfiles are not updated during an upgrade.
If you had upgraded the volume, could you please provided the steps that you used to upgrade?

Also, could you also verify one thing for me from brick volfiles of you volume?
The brick volfiles are supposed to have the following lines:

volume test-index
    type features/index
    option xattrop-pending-watchlist trusted.afr.test-    <--------(1)
    option xattrop-dirty-watchlist trusted.afr.dirty      <--------(2)
    option index-base /export/test/brick2/.glusterfs/indices
    subvolumes test-barrier
end-volume

The two options mentioned above should exist. Otherwise you will see this problem. You can find volfiles at /var/lib/glusterd/vols/<volname>.

Thanks.

Comment 7 evangelos 2016-07-07 19:28:59 UTC

Hi Anuradha, 

in the mean time due to various issues we had we decided to downgrade to 3.6 branch so currently i do not have a 3.7 deployment up and running. But thx for the hint i will keep this in mind for the future.
In the meantime you can close this bugzilla.

thank you

Comment 8 Anuradha 2016-07-11 09:09:11 UTC

Hi Evangelos,

Thanks for the update.

Closing this bug as per comment#7

Thanks,
Anuradha.

Comment 9 Emerson Gomes 2019-01-27 15:42:59 UTC

This error is still reproduceable in 5.3 when upgrading from a 3.x volume.
I had to recreate volume from scratch in 5.3 and copy data back in order to avoid it.

Comment 10 Artem Russakovskii 2019-01-30 20:23:44 UTC

I just started seeing this error after upgrading from 4.1 to 5.3.

[2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

And it floods like crazy with these.

Comment 11 Emerson Gomes 2019-01-30 20:33:12 UTC

I "solved" the issue after upgrading to 5.3 by creating a new volume and moving all data to it.

Apparently something is missing on the volumes after upgrade.

Comment 12 Artem Russakovskii 2019-01-30 20:37:13 UTC

I just sent a message to the gluster mailing list about this because that's not how this problem should be resolved. I'm curious to hear what they say.

Comment 13 Emerson Gomes 2019-01-30 20:39:04 UTC

Absolutely. That's the second big issue I had after upgrading. The first one is https://bugzilla.redhat.com/show_bug.cgi?id=1651246
Still unsolved (open for more than 2 months now)

Comment 14 Artem Russakovskii 2019-01-30 20:40:29 UTC

You know, I was *just* going to comment in a follow-up reply about whether the issue here is possibly related to the one you just linked. Seeing tons of those too, though at least the dupes are suppressed.



==> mnt-SITE_data1.log <==
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

==> mnt-SITE_data3.log <==
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 413 times between [2019-01-30 20:36:23.881090] and [2019-01-30 20:38:20.015593]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0" repeated 42 times between [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]

==> mnt-SITE_data1.log <==
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0" repeated 50 times between [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and [2019-01-30 20:38:20.546355]
[2019-01-30 20:38:21.492319] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0

==> mnt-SITE_data3.log <==
[2019-01-30 20:38:22.349689] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0

==> mnt-SITE_data1.log <==
[2019-01-30 20:38:22.762941] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler

Comment 15 Emerson Gomes 2019-01-30 20:48:52 UTC

Yeah, both arrised after upgrading from 3.x to 5.1, persisting in 5.2 and 5.3.

The other issue is even more critical.
It causes crashes, making the mount point being inacessible ("Transport endpoint is not connected" error) - Requiring a new manual umount/mount.
For now I have a crontab entry doing this, but I will have to downgrade if a fix is not issued soon...

Comment 16 Artem Russakovskii 2019-01-31 18:00:40 UTC

Damn, you weren't kidding, I wish I saw these bug reports before I updated from rock solid 4.1.

Less than 24 hours after upgrading, I already got a crash that you referenced:


[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-01-31 09:38:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
---------

Comment 17 Artem Russakovskii 2019-02-02 20:16:21 UTC

The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this?

In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround.

By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process.


monit check:
check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
  start program  = "/bin/mount  /mnt/glusterfs_data1"
  stop program  = "/bin/umount /mnt/glusterfs_data1"
  if space usage > 90% for 5 times within 15 cycles
    then alert else if succeeded for 10 cycles then alert


stack trace:
[2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-01 23:22:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]

Comment 18 Nithya Balachandran 2019-02-06 08:18:36 UTC

(In reply to Artem Russakovskii from comment #17)
> The fuse crash happened again yesterday, to another volume. Are there any
> mount options that could help mitigate this?
> 
> In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch
> and restart the mount, which works and recovers the mount point within a
> minute. Not ideal, but a temporary workaround.
> 
> By the way, the way to reproduce this "Transport endpoint is not connected"
> condition for testing purposes is to kill -9 the right "glusterfs
> --process-name fuse" process.
> 
> 
> monit check:
> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>   if space usage > 90% for 5 times within 15 cycles
>     then alert else if succeeded for 10 cycles then alert
> 
> 
> stack trace:
> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fa0249e4329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fa0249e4329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
> [2019-02-01 23:21:56.164427]
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3"
> repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01
> 23:22:03.474036]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-02-01 23:22:03
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.
> so(+0x5dc9d)[0x7fa025543c9d]
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.
> so(+0x70ba1)[0x7fa025556ba1]
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]




Please mount the volume using the option lru-limit=0 and see if the crashes go away. We are currently working on analysing some coredumps and will update once we have a fix.

Comment 19 Artem Russakovskii 2019-02-06 18:40:30 UTC

(In reply to Nithya Balachandran from comment #18)

Before I make this change, what is the expected effect on performance on a busy system with lots of small and large files (millions)?

Also, is it possible to set the lru-limit to 0 via gluster volume options (network.inode-lru-limit?) or will it be sufficiently different from using the fuse mount option?

Do gluster docs have clear instructions on how to capture a core dump because I haven't found any other than the instructions to provide one (but not how) in case of a crash?

Comment 20 Nithya Balachandran 2019-02-07 03:01:01 UTC

(In reply to Artem Russakovskii from comment #19)
> (In reply to Nithya Balachandran from comment #18)
> 
> Before I make this change, what is the expected effect on performance on a
> busy system with lots of small and large files (millions)?

This turns off a feature that was introduced in 5.3 to clear unused inodes from memory for the fuse client gluster process and prevent the memory usage from rising. Turning it off would cause the client to behave the same as in the earlier releases. 

> 
> Also, is it possible to set the lru-limit to 0 via gluster volume options
> (network.inode-lru-limit?) or will it be sufficiently different from using
> the fuse mount option?

Not at the moment. This is a mount option for the gluster fuse client. This is not the same as the network.inode-lru-limit option which controls the inodes in the brick processes.
> 
> Do gluster docs have clear instructions on how to capture a core dump
> because I haven't found any other than the instructions to provide one (but
> not how) in case of a crash?


To configure coredumps, you need to specify the size and location. The following link describes it well:
https://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-on-a-segmentation-fault

Make sure you have sufficient space in the location for a large coredump (it can run into several GBs)

Comment 21 ryan 2019-02-13 07:52:08 UTC

Also seeing this issue in 4.1.7 with the Samba VFS client logs.
Have worked around the issue by setting client log levels to error ('gluster volume set vol01 diagnostics.client-log-level ERROR').

Please could you advise on a long term solution?

Many thanks,
Ryan

Comment 22 Emerson Gomes 2019-02-16 10:07:09 UTC

Added core dump and gdb output here: https://bugzilla.redhat.com/show_bug.cgi?id=1651246

Probably same issue as this one.

Comment 23 ryan 2019-03-06 12:24:10 UTC

Hello,

Is there any progress with this?
We've had multiple systems consume the entire root volume due to the log file filling the volume.

Comment 24 Emerson Gomes 2019-03-08 17:38:37 UTC

I confirm all issues are gone after upgrading to 5.4.

Note You need to log in before you can comment on or make changes to this bug.