Bug 1644322

Summary:	flooding log with "glusterfs-fuse: read from /dev/fuse returned -1 (Operation not permitted)"
Product:	[Community] GlusterFS	Reporter:	Christian Lohmaier <lohmaier+rhbz>
Component:	geo-replication	Assignee:	Csaba Henk <csaba>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	mainline	CC:	atumball, bugs, lohmaier+rhbz, pasik, sacharya, sunkumar
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-08-08 06:11:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Christian Lohmaier 2018-10-30 14:06:43 UTC

Description of problem:
From time to time gluster runs amok and floods the geo-replication logs with 
"W [fuse-bridge.c:5098:fuse_thread_proc] 0-glusterfs-fuse: read from /dev/fuse returned -1 (Operation not permitted)"

at a rate as fast as the disk can keep up with, filling /var with tens of gigabytes of the same log message, until either /var is run full or for some magical reason gluster recovers before.

Seen this on both master as well as geo-replication slave, but is more frequent on the master (which tends to have more load)

How reproducible:

No clear recipe, but seems related to system load

Sample lines of when the flood starts:

[2018-10-30 13:42:35.336146] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-backup1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-10-30 13:42:35.336533] W [dict.c:923:str_to_data] (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/protocol/client.so(+0x3d6a3) [0x7f1bc82336a3] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_str+0x16) [0x7f1bce3d16e6] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(str_to_data+0x8a) [0x7f1bce3ce67a] ) 0-dict: value is NULL [Invalid argument]
[2018-10-30 13:42:35.336560] I [MSGID: 114006] [client-handshake.c:1308:client_setvolume] 0-backup1-client-0: failed to set process-name in handshake msg
[2018-10-30 13:42:35.336594] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-backup1-client-0: error returned while attempting to connect to host:(null), port:0
[2018-10-30 13:42:35.337190] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-backup1-client-0: Connected to backup1-client-0, attached to remote volume '/srv/backup/brick1'.
[2018-10-30 13:42:35.338506] W [fuse-bridge.c:5098:fuse_thread_proc] 0-glusterfs-fuse: read from /dev/fuse returned -1 (Operation not permitted)
[2018-10-30 13:42:35.338531] W [fuse-bridge.c:5098:fuse_thread_proc] 0-glusterfs-fuse: read from /dev/fuse returned -1 (Operation not permitted)
[2018-10-30 13:42:35.338539] W [fuse-bridge.c:5098:fuse_thread_proc] 0-glusterfs-fuse: read from /dev/fuse returned -1 (Operation not permitted)
[2018-10-30 13:42:35.338547] W [fuse-bridge.c:5098:fuse_thread_proc] 0-glusterfs-fuse: read from /dev/fuse returned -1 (Operation not permitted)

continues with gigabytes of those/until /var is filled up and you'll end up with follow-up errors since no further changelogs, etc can be created in /var with no free diskspace.

the logging desperately is in need for some rate-limiting, as filling up /var is kinda DOS attack on geo-replication, all sessions go to faulty mode and geo-replication falls behind until /var is cleaned up and the sessions can be resumed, likely having causes some inconsistencies in between since it couldn't properly write changelogs in the meantime.

Comment 1 Csaba Henk 2019-03-07 16:08:03 UTC

Please confirm one thing. So does it happen that the glusterfs client producing the "read from /dev/fuse returned -1 (Operation not permitted)" flood recovers and gets back to normal operational state? I wonder if it's a transient overloaded state in the kernel or a non-recoverable faulty state. (As far as I understand you, it should be the former, just please let me know if my understanding is correct.)

And if yes, then is there anything else that can be said about the circumstances? How often does it manage to recover, how long does the faulty state hold, is there anything that you can observe about the system state when it hits in, while it holds, when it ceases?

Comment 2 Christian Lohmaier 2019-03-22 17:18:26 UTC

yes, there is a possibility of it recovering, however never when it manages to fill up the ~60GB of free disk space on var before - which unfortunately is the case more often than not.. - if it fills the disk, then also the other geo-replication sessions go to faulty state.

so if it cannot recover within 10-15 minutes, it likely won't (as the disk is filled up with the log spam) - I'd say we have it once a week.

Nothing special about system state AFAICT - at least not a ramp-up of resource usage, if there's anything, it comes and goes in a flash. No effect on rest of the system, apart from var being full and other geo-replication sessions suffering from that.

Geo-replication where it occurred last time are in history changelog mode, but not sure whether that is coincidence

But I think bug#1643716 might be related, as I think it is more likely to trigger after it failed because of that, i.e. when geo-repliction session keeps relaunching a gvfs mount / switches from failed to initializing. But that as well might be a red herring, as the recovery method used so far is to truncate the logs.... But at least that was the case on the last case where I didn't throw away the whole log.

The usage pattern on the volume that is geo-replicated is as follows: rsnapshot creates backups from other hosts via rsync, then those backups are rotated using hardlinks, in the directories .sync, daily.[0-6], weekly.[0-3] 
i.e. it rsyncs to .sync, then mv daily.6 _delete.$pid; mv daily.5 daily.6 (...); cp -al .sync daily.0; rm -r _delete.$pid

Thus most of the files are hardlinks.

Unfortunately I cannot offer a 100% reliable way to trigger the problem, HTH.

Comment 3 Worker Ant 2019-04-03 12:25:58 UTC

REVIEW: https://review.gluster.org/22494 (fuse: rate limit reading from fuse device upon receiving EPERM) posted (#1) for review on master by Csaba Henk

Comment 4 Worker Ant 2019-08-08 06:11:45 UTC

REVIEW: https://review.gluster.org/22494 (fuse: rate limit reading from fuse device upon receiving EPERM) merged (#7) on master by Amar Tumballi

Comment 5 Red Hat Bugzilla 2023-09-14 04:41:33 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days