Description of problem: Newly created files/dirs will be inaccessible to the local FUSE mount after file IO is completed. I have recently started to experience this problem after upgrading to gluster 6.0, and did not previously experience this problem. I have two nodes running glusterfs, each with a FUSE mount pointed to localhost. ``` #/etc/fstab localhost:/gv0 /data/ glusterfs lru-limit=0,defaults,_netdev,acl 0 0 ``` I have ran in to this problem with rsync, random file creation with dd, and mkdir/touch. I have noticed that files are accessible while being written too, and become inaccessible once the file IO is complete. It usually happens in 'chunks' of sequential files. After some period of time >15 min the problem resolves itself. The files on the local bricks ls just fine. The problematic files/dirs are accessible via FUSE mounts on other machines. Heal doesn't report any problems. Small file workloads seem to make the problem worse. Overwriting existing files seems to not create problematic files. *Gluster Info* Volume Name: gv0 Type: Distributed-Replicate Volume ID: ... Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: ... Options Reconfigured: cluster.self-heal-daemon: enable server.ssl: on client.ssl: on auth.ssl-allow: * transport.address-family: inet nfs.disable: on user.smb: disable performance.write-behind: on diagnostics.latency-measurement: off diagnostics.count-fop-hits: off cluster.lookup-optimize: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.nl-cache: on cluster.readdir-optimize: on storage.build-pgfid: off diagnostics.brick-log-level: ERROR diagnostics.brick-sys-log-level: ERROR diagnostics.client-log-level: ERROR *Client Log* The FUSE log is flooded with: ``` [2019-04-22 19:12:39.231654] D [MSGID: 0] [io-stats.c:2227:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7f535ca5c728, gv0 returned -1 error: No such file or directory [No such file or directory] ``` Version-Release number of selected component (if applicable): apt list | grep gluster bareos-filedaemon-glusterfs-plugin/stable 16.2.4-3+deb9u2 amd64 bareos-storage-glusterfs/stable 16.2.4-3+deb9u2 amd64 glusterfs-client/unknown 6.1-1 amd64 [upgradable from: 6.0-1] glusterfs-common/unknown 6.1-1 amd64 [upgradable from: 6.0-1] glusterfs-dbg/unknown 6.1-1 amd64 [upgradable from: 6.0-1] glusterfs-server/unknown 6.1-1 amd64 [upgradable from: 6.0-1] tgt-glusterfs/stable 1:1.0.69-1 amd64 uwsgi-plugin-glusterfs/stable,stable 2.0.14+20161117-3+deb9u2 amd64 How reproducible: Always Steps to Reproduce: 1. Upgrade from 5.6 to either 6.0 or 6.1, with the described configuration. 2. Run a small file intensive workload. Actual results: ``` dd if=/dev/urandom bs=1024 count=10240 | split -a 4 -b 1k - file. 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 18.3999 s, 57.0 kB/s ls: cannot access 'file.abbd': No such file or directory ls: cannot access 'file.aabb': No such file or directory ls: cannot access 'file.aadh': No such file or directory ls: cannot access 'file.aafq': No such file or directory ... total 845 -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaa -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaab -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaac -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaad -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaae -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaf -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaag -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaah -????????? ? ? ? ? ? file.aaai -????????? ? ? ? ? ? file.aaaj -????????? ? ? ? ? ? file.aaak -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaal -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaam -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaan -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaao -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaap -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaq -????????? ? ? ? ? ? file.aaar -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaas -rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaat -rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaau -????????? ? ? ? ? ? file.aaav -rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaaw -rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaax -rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aaay -????????? ? ? ? ? ? file.aaaz -????????? ? ? ? ? ? file.aaba -????????? ? ? ? ? ? file.aabb -rw-r--r-- 1 someone someone 1024 Apr 22 12:07 file.aabc ... # Wait 10 mins total 1024 -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaa -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaab -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaac -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaad -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaae -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaf -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaag -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaah -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaai -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaaj -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaak -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaal -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaam -rw-r--r-- 1 someone someone 1024 Apr 22 12:06 file.aaan ... Expected results: All files to be accessible immediately. Additional info: There was nothing of interest in the other logs when changed to INFO. Seems similar to Bug 1647229
I tried to reproduce the issue with v 6.0 but it doesn't happen on my setup. Could you reproduce it but setting debug level to trace ? To set trace log level run these commands: # gluster volume set <volname> brick-log-level TRACE # gluster volume set <volname> client-log-level TRACE Once the error happens, I would need all brick logs and mount log.
Hi Erikson, Can you share some updates if you are able to reproduce it? Thanks, Mohit Agrawal
This bug is moved to https://github.com/gluster/glusterfs/issues/906, and will be tracked there from now on. Visit GitHub issues URL for further details
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days