Bug 1702043 - Newly created files are inaccessible via FUSE
Summary: Newly created files are inaccessible via FUSE
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 6
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-22 19:45 UTC by bio.erikson
Modified: 2023-09-14 05:27 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:34:00 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description bio.erikson 2019-04-22 19:45:11 UTC
Description of problem:
Newly created files/dirs will be inaccessible to the local FUSE mount after file IO is completed. 

I have recently started to experience this problem after upgrading to gluster 6.0, and did not previously experience this problem.  
I have two nodes running glusterfs, each with a FUSE mount pointed to localhost.

```
#/etc/fstab
localhost:/gv0 /data/ glusterfs lru-limit=0,defaults,_netdev,acl 0 0
```

 I have ran in to this problem with rsync, random file creation with dd, and mkdir/touch. I have noticed that files are accessible while being written too, and become inaccessible once the file IO is complete. It usually happens in 'chunks' of sequential files. After some period of time >15 min the problem resolves itself. The files on the local bricks ls just fine. The problematic files/dirs are accessible via FUSE mounts on other machines. Heal doesn't report any problems. Small file workloads seem to make the problem worse. Overwriting existing files seems to not create problematic files.

*Gluster Info*
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: ...
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
...
Options Reconfigured:
cluster.self-heal-daemon: enable
server.ssl: on
client.ssl: on
auth.ssl-allow: *
transport.address-family: inet
nfs.disable: on
user.smb: disable
performance.write-behind: on
diagnostics.latency-measurement: off
diagnostics.count-fop-hits: off
cluster.lookup-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.nl-cache: on
cluster.readdir-optimize: on
storage.build-pgfid: off
diagnostics.brick-log-level: ERROR
diagnostics.brick-sys-log-level: ERROR
diagnostics.client-log-level: ERROR

*Client Log*
The FUSE log is flooded with:
```
[2019-04-22 19:12:39.231654] D [MSGID: 0] [io-stats.c:2227:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7f535ca5c728, gv0 returned -1 error: No such file or directory [No such file or directory]
```

Version-Release number of selected component (if applicable):

apt list | grep gluster

bareos-filedaemon-glusterfs-plugin/stable 16.2.4-3+deb9u2 amd64
bareos-storage-glusterfs/stable 16.2.4-3+deb9u2 amd64
glusterfs-client/unknown 6.1-1 amd64 [upgradable from: 6.0-1]
glusterfs-common/unknown 6.1-1 amd64 [upgradable from: 6.0-1]
glusterfs-dbg/unknown 6.1-1 amd64 [upgradable from: 6.0-1]
glusterfs-server/unknown 6.1-1 amd64 [upgradable from: 6.0-1]
tgt-glusterfs/stable 1:1.0.69-1 amd64
uwsgi-plugin-glusterfs/stable,stable 2.0.14+20161117-3+deb9u2 amd64


How reproducible:
Always 

Steps to Reproduce:
1. Upgrade from 5.6 to either 6.0 or 6.1, with the described configuration.
2. Run a small file intensive workload.


Actual results:
```
dd if=/dev/urandom bs=1024 count=10240 | split -a 4 -b 1k - file.
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 18.3999 s, 57.0 kB/s

ls: cannot access 'file.abbd': No such file or directory
ls: cannot access 'file.aabb': No such file or directory
ls: cannot access 'file.aadh': No such file or directory
ls: cannot access 'file.aafq': No such file or directory
...

total 845
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaa
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaab
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaac
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaad
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaae
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaf
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaag
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaah
-????????? ? ?        ?           ?            ? file.aaai
-????????? ? ?        ?           ?            ? file.aaaj
-????????? ? ?        ?           ?            ? file.aaak
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaal
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaam
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaan
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaao
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaap
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaq
-????????? ? ?        ?           ?            ? file.aaar
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaas
-rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaat
-rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaau
-????????? ? ?        ?           ?            ? file.aaav
-rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaaw
-rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaax
-rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaay
-????????? ? ?        ?           ?            ? file.aaaz
-????????? ? ?        ?           ?            ? file.aaba
-????????? ? ?        ?           ?            ? file.aabb
-rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aabc
...

# Wait 10 mins
total 1024
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaa
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaab
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaac
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaad
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaae
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaf
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaag
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaah
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaai
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaj
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaak
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaal
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaam
-rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaan
...


Expected results:
All files to be accessible immediately. 

Additional info:
There was nothing of interest in the other logs when changed to INFO.
Seems similar to Bug 1647229

Comment 1 Xavi Hernandez 2020-01-09 17:42:23 UTC
I tried to reproduce the issue with v 6.0 but it doesn't happen on my setup. Could you reproduce it but setting debug level to trace ?

To set trace log level run these commands:

# gluster volume set <volname> brick-log-level TRACE
# gluster volume set <volname> client-log-level TRACE

Once the error happens, I would need all brick logs and mount log.

Comment 2 Mohit Agrawal 2020-02-24 04:42:53 UTC
Hi Erikson,

 Can you share some updates if you are able to reproduce it?

Thanks,
Mohit Agrawal

Comment 3 Worker Ant 2020-03-12 12:34:00 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/906, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 4 Red Hat Bugzilla 2023-09-14 05:27:22 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.