Bug 1763208

Summary:	fuse log registers error with misleading "No such file or directory" when we interrupt a file copy
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	fuse	Assignee:	Csaba Henk <csaba>
Status:	CLOSED ERRATA	QA Contact:	Bala Konda Reddy M <bmekala>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.5	CC:	bmekala, csaba, puebele, rhs-bugs, rkothiya, sheggodu, storage-qa-internal, ubansal, vdas
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-6.0-26	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	1790997 (view as bug list)		Environment:
Last Closed:	2020-01-30 06:42:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1790997

Description Nag Pavan Chilakam 2019-10-18 13:42:14 UTC

Description of problem:
====================
on a disperse volume, if we interrupt or stop a copy of a file(was using dd), the fuse mount log throws below error which is misleading  by saying "No such file or directory"


[2019-10-18 13:36:20.318902] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f20d6711a9b] (--> /usr/lib64/glusterfs/6.0/xlator/mount/fuse.so(+0x8201)[0x7f20cdabc201] (--> /usr/lib64/glusterfs/6.0/xlator/mount/fuse.so(+0x8ada)[0x7f20cdabcada] (--> /lib64/libpthread.so.0(+0x7ea5)[0x7f20d554eea5] (--> /lib64/libc.so.6(clone+0x6d)[0x7f20d4e148cd] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory


Version-Release number of selected component (if applicable):
==========
4.0.15

How reproducible:
===========
quite consistent

Steps to Reproduce:
==================
1) 3x(4+2) volume with below options set
disperse.shd-max-threads: 8
server.event-threads: 8
client.event-threads: 8

2) fuse mount the volume and run an IO as below:
   created a file using dd, then copying this file to a new file using dd with unaligned block size
eg:  dd if=1_basefile_5203af56e3ece89811d53f597868112f of=dfile_dd-1-547 bs=547  

3. now while the copying is in progress, do a ctrl+c to stop it

Actual results:
===========
fuse mount log registers an error as below

[2019-10-18 13:36:20.318902] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f20d6711a9b] (--> /usr/lib64/glusterfs/6.0/xlator/mount/fuse.so(+0x8201)[0x7f20cdabc201] (--> /usr/lib64/glusterfs/6.0/xlator/mount/fuse.so(+0x8ada)[0x7f20cdabcada] (--> /lib64/libpthread.so.0(+0x7ea5)[0x7f20d554eea5] (--> /lib64/libc.so.6(clone+0x6d)[0x7f20d4e148cd] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory




Expected results:
================
seems a bit misleading by saying "no such file or dir" which may need to be corrected



volinfo
Volume Name: ecvol
Type: Distributed-Disperse
Volume ID: eb126eba-37c6-46e1-98ab-ea8992406977
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (4 + 2) = 18
Transport-type: tcp
Bricks:
Brick1: dhcp35-190.lab.eng.blr.redhat.com:/gluster/brick1/ecvol
Brick2: dhcp35-237.lab.eng.blr.redhat.com:/gluster/brick1/ecvol
Brick3: dhcp35-208.lab.eng.blr.redhat.com:/gluster/brick1/ecvol
Brick4: dhcp35-32.lab.eng.blr.redhat.com:/gluster/brick1/ecvol
Brick5: dhcp35-29.lab.eng.blr.redhat.com:/gluster/brick1/ecvol
Brick6: dhcp35-157.lab.eng.blr.redhat.com:/gluster/brick1/ecvol
Brick7: dhcp35-190.lab.eng.blr.redhat.com:/gluster/brick2/ecvol
Brick8: dhcp35-237.lab.eng.blr.redhat.com:/gluster/brick2/ecvol
Brick9: dhcp35-208.lab.eng.blr.redhat.com:/gluster/brick2/ecvol
Brick10: dhcp35-32.lab.eng.blr.redhat.com:/gluster/brick2/ecvol
Brick11: dhcp35-29.lab.eng.blr.redhat.com:/gluster/brick2/ecvol
Brick12: dhcp35-157.lab.eng.blr.redhat.com:/gluster/brick2/ecvol
Brick13: dhcp35-190.lab.eng.blr.redhat.com:/gluster/brick3/ecvol
Brick14: dhcp35-237.lab.eng.blr.redhat.com:/gluster/brick3/ecvol
Brick15: dhcp35-208.lab.eng.blr.redhat.com:/gluster/brick3/ecvol
Brick16: dhcp35-32.lab.eng.blr.redhat.com:/gluster/brick3/ecvol
Brick17: dhcp35-29.lab.eng.blr.redhat.com:/gluster/brick3/ecvol
Brick18: dhcp35-157.lab.eng.blr.redhat.com:/gluster/brick3/ecvol
Options Reconfigured:
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
disperse.shd-max-threads: 8
server.event-threads: 8
client.event-threads: 8

Comment 3 Csaba Henk 2019-11-22 19:15:10 UTC

Quoting https://bugzilla.redhat.com/1699869#c61 :

> This is an error that can occur with a reverse entry invalidation request. Such a request needs to specify a directory inode and a name. The effect of the request is that the cached lookup data associating the name in the dir inode with some inode is dropped.
>
> There is no guarantee though that the nodeid via which the reverse entry invalidation request intends to specify the dir inode actually resolves to a dir inode: it can be the case that it does not resolve to an inode, or it does resolve to an inode but that's not the inode of a directory. In former case the write(2) to /dev/fuse on behalf of the request fails with ENOENT; in latter case, it fails with ENOTDIR.
>
> This is the latter case. Probably there is a race? I can imagine that kernel forgets the directory inode on its own (ie. on its own agenda and not due to an invalidation request) and simultaneously invalidation of some entry of this dir is triggered in glusterfs, taking effect earlier than processing the respective FORGET / BATCH_FORGET messages. If that's the case, then it's most likely a harmless phenomenon.

So the phenomenon is harmless and the error message is correct -- we do need to report "No such file or directory" if the system fails the syscall with ENOENT. (Note that this is admittedly odd, as write(2) / write(3p) does not specify ENOENT as possible errno for the write syscall; it's a FUSE idiosyncrasy to use ENOENT for indication of certain system states as part of communication through /dev/fuse.)

However, we acknowledge that flooding the logs with this message is annoying. The following upstream patches seek to mitigate this issue:

https://review.gluster.org/23187, "fuse: Set limit on invalidate queue size"
https://review.gluster.org/23461, "glusterfs/fuse: Reduce the default lru-limit value"

Please let me know what action plan to follow (should I provide a custom build with backports of these? Or wait while these patches hit downstream? Actually the first change is the important one, as the effect of the second can be reached also by setting a volume option).

Comment 4 Csaba Henk 2019-11-27 14:01:23 UTC

*** Bug 1773901 has been marked as a duplicate of this bug. ***

Comment 17 Csaba Henk 2020-01-07 17:46:11 UTC

I investigated the case. The conjecture put forward in comment #3 was not correct. It's not invalidation what happens, but interruption. If a system call is interrupted, glusterfs gets an INTERRUPT message. From that point on handling the interrupted message and handling the interrupt are in race (that's why number of threads makes a difference). If handling the interrupted message wins the race, then by the time the answer is sent to the interrupt the reference to the original message is dangling. This is indicated by the kernel by failing the write to /dev/fuse with ENOENT. It's a harmless phenomenon. So even if the analysis in comment #3 was incorrect, the conclusion is still valid: it's a FUSE idiosyncrasy.

To improve the UX, I'll will send a patch to upstream that degrades the log message from Error to Warning and conceals the errno string which feels counter-intuitive in this situation. There will be no need to backport the patch.

I will close the bug with NOTABUG on QA ack.

Comment 18 Csaba Henk 2020-01-07 19:06:19 UTC

Upstream change: https://review.gluster.org/23974

Comment 19 Nag Pavan Chilakam 2020-01-08 06:04:48 UTC

(In reply to Csaba Henk from comment #17)
> I investigated the case. The conjecture put forward in comment #3 was not
> correct. It's not invalidation what happens, but interruption. If a system
> call is interrupted, glusterfs gets an INTERRUPT message. From that point on
> handling the interrupted message and handling the interrupt are in race
> (that's why number of threads makes a difference). If handling the
> interrupted message wins the race, then by the time the answer is sent to
> the interrupt the reference to the original message is dangling. This is
> indicated by the kernel by failing the write to /dev/fuse with ENOENT. It's
> a harmless phenomenon. So even if the analysis in comment #3 was incorrect,
> the conclusion is still valid: it's a FUSE idiosyncrasy.
> 
> To improve the UX, I'll will send a patch to upstream that degrades the log
> message from Error to Warning and conceals the errno string which feels
> counter-intuitive in this situation. There will be no need to backport the
> patch.
> 
> I will close the bug with NOTABUG on QA ack.

Hi Csaba,
After putting all the effort required from engineering side to get the bug to onqa and QE spending test cycle to validate the fix, closing as "NOTABUG" is quite unproductive.
Also, I don't see a reason closing this as NOTABUG, as this is a bug given that we are seeing messages which can be false alarms.
If we believe these messages occur due to fuse limitations, then we should either suppress such messages, without suppressing genuinely expected messages.
For Now we will have to failqa the bug, and then we can take the necessary action

Comment 21 Nag Pavan Chilakam 2020-01-08 06:25:01 UTC

(In reply to Csaba Henk from comment #17)
> I investigated the case. The conjecture put forward in comment #3 was not
> correct. It's not invalidation what happens, but interruption. If a system
> call is interrupted, glusterfs gets an INTERRUPT message. From that point on
> handling the interrupted message and handling the interrupt are in race
> (that's why number of threads makes a difference). If handling the
> interrupted message wins the race, then by the time the answer is sent to
> the interrupt the reference to the original message is dangling. This is
> indicated by the kernel by failing the write to /dev/fuse with ENOENT. It's
> a harmless phenomenon. So even if the analysis in comment #3 was incorrect,
> the conclusion is still valid: it's a FUSE idiosyncrasy.
> 
> To improve the UX, I'll will send a patch to upstream that degrades the log
> message from Error to Warning and conceals the errno string which feels
> counter-intuitive in this situation. There will be no need to backport the
> patch.
Also, in the case of moving the logs to warning, are we making sure, that we won't be supressing genuine errors (ie in case of really missing files/dirs)?
If the answer is yes, then you can get the patch merged and this same bug can have qe  to retest, and see if the logs are being suppressed or not(ie moved to warning) as part of validation

> 
> I will close the bug with NOTABUG on QA ack.

Comment 23 Csaba Henk 2020-01-08 15:00:03 UTC

We decided to backport the change mentioned in comment #18. Then obviously the resolution will be something other than NOTABUG.

However, the patch needs to be revised/refined to address some concerns (of reviewers and of myself). When the new revision is submitted, I'll answer the remaining questions in light of that.

Comment 24 Csaba Henk 2020-01-09 10:13:53 UTC

New revision of patch addresses the following concerns:
- suppressing genuinely expected messages: The degradation of /dev/fuse write errors happens context sensitively. That is, caller of the convenience function can specify which errnos are to be treated as non-errors for glusterfs.
- flooding of logs: degraded log messages are degraded to DEBUG so they won't be seen under normal conditions. (Note that the frequency reported in comment #14 and the 1-1 mapping of discussed log messages with SIGINT-s does not sound to be a real flooding situation, but it will be even better now.)

Comment 32 errata-xmlrpc 2020-01-30 06:42:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0288