Bug 1732961 - shard file with different gfid in different subvolume causing VM to pause on stale file
Summary: shard file with different gfid in different subvolume causing VM to pause on ...
Keywords:
Status: ASSIGNED
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Susant Kumar Palai
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-24 20:41 UTC by Olaf Buitelaar
Modified: 2019-12-05 19:41 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
ansible job to clear T file, showing shard's affected (7.95 KB, text/plain)
2019-08-27 13:02 UTC, Olaf Buitelaar
no flags Details
ansible job to clear T file, showing shard's affected (7.11 KB, text/plain)
2019-08-27 13:03 UTC, Olaf Buitelaar
no flags Details
qemu log (1.75 KB, text/plain)
2019-08-27 13:04 UTC, Olaf Buitelaar
no flags Details
qemu log (46.39 KB, text/plain)
2019-08-27 13:04 UTC, Olaf Buitelaar
no flags Details
qemu log (7.49 KB, text/plain)
2019-08-27 13:04 UTC, Olaf Buitelaar
no flags Details
qemu log (508 bytes, text/plain)
2019-08-27 13:05 UTC, Olaf Buitelaar
no flags Details
volume info (1.45 KB, text/plain)
2019-08-27 13:05 UTC, Olaf Buitelaar
no flags Details
volume info (1.63 KB, text/plain)
2019-08-27 13:06 UTC, Olaf Buitelaar
no flags Details
volume info (1.24 KB, text/plain)
2019-08-27 13:06 UTC, Olaf Buitelaar
no flags Details
volume info (1.39 KB, text/plain)
2019-08-27 13:07 UTC, Olaf Buitelaar
no flags Details
gluster logs (2.42 MB, application/zip)
2019-08-27 13:07 UTC, Olaf Buitelaar
no flags Details
gluster logs (296.39 KB, application/gzip)
2019-08-27 13:08 UTC, Olaf Buitelaar
no flags Details
gluster logs (143.04 KB, application/gzip)
2019-08-27 13:09 UTC, Olaf Buitelaar
no flags Details
gluster logs (646 bytes, application/gzip)
2019-08-27 13:09 UTC, Olaf Buitelaar
no flags Details
gluster logs (295.14 KB, application/gzip)
2019-08-27 13:09 UTC, Olaf Buitelaar
no flags Details
gluster logs (141.00 KB, application/gzip)
2019-08-27 13:10 UTC, Olaf Buitelaar
no flags Details
gluster logs (143.17 KB, application/gzip)
2019-08-27 13:11 UTC, Olaf Buitelaar
no flags Details
gluster logs (143.05 KB, application/gzip)
2019-08-27 13:11 UTC, Olaf Buitelaar
no flags Details
gluster logs (141.26 KB, application/gzip)
2019-08-27 13:12 UTC, Olaf Buitelaar
no flags Details
another instance of stale file log's with v6.5 (1.46 MB, application/zip)
2019-08-30 09:23 UTC, Olaf Buitelaar
no flags Details
different stale file (2.10 MB, application/zip)
2019-12-05 19:11 UTC, Olaf Buitelaar
no flags Details

Description Olaf Buitelaar 2019-07-24 20:41:38 UTC
Description of problem:
We're running oVirt (4.3.4) with gluster as backing storage. Since gluster 3.12.15, currently running gluster 5.6, VM's randomly pause on a storage I/O error. 
In the qemu log the following messages is logged;
[2019-07-17 07:14:55.089368] E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 111923 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
The message "E [MSGID: 133010] [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 111923 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]" repeated 6 times between [2019-07-17 07:14:55.089368] and [2019-07-17 07:14:55.091061]

in the bricklogs a message like appears;
[2019-07-17 01:21:52.768672] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.111756
[2019-07-17 01:21:52.768713] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.111756

When i inspect the underlying shard files i always see the shard file exists in 2 different sub-volumes, one containing the sticky bit, and has zero bytes, while the other has data. Also timestamps are from the same time.
for example;
=== vol 0 ===

root@lease-03 ovirt-data]#  getfattr -d -m . -e hex /data/gfs/bricks/brick3/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
getfattr: Removing leading '/' from absolute path names
# file: data/gfs/bricks/brick3/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.gfid=0x055235bd970a4c23a92213af1ada1006
trusted.gfid2path.736c7b4605ab409c=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f32346435306530332d363339642d346430642d383333312d6632366535333437656639352e313534313333
trusted.glusterfs.dht.linkto=0x6f766972742d646174612d7265706c69636174652d3200

[root@lease-03 ovirt-data]# ls -l /data/gfs/bricks/brick3/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
---------T. 2 root root 0 Jul 19 04:30 /data/gfs/bricks/brick3/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133

[root@lease-07 ~]# getfattr -d -m . -e hex /data/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
getfattr: Removing leading '/' from absolute path names
# file: data/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.gfid=0x055235bd970a4c23a92213af1ada1006
trusted.gfid2path.736c7b4605ab409c=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f32346435306530332d363339642d346430642d383333312d6632366535333437656639352e313534313333
trusted.glusterfs.dht.linkto=0x6f766972742d646174612d7265706c69636174652d3200

[root@lease-07 ~]# ls -la  /data/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
---------T. 2 root root 0 Jul 19 04:30 /data/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133

root@lease-08 ~#  getfattr -d -m . -e hex /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
getfattr: Removing leading '/' from absolute path names
# file: data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.gfid=0x055235bd970a4c23a92213af1ada1006
trusted.gfid2path.736c7b4605ab409c=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f32346435306530332d363339642d346430642d383333312d6632366535333437656639352e313534313333
trusted.glusterfs.dht.linkto=0x6f766972742d646174612d7265706c69636174652d3200

root@lease-08 ~# ls -la /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
---------T. 2 root root 0 Jul 19 04:30 /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133


=== vol 1 ===  

[root@lease-04 ~]# ls -la /data/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
ls: cannot access /data/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133: No such file or director

[root@lease-05 ~]# ls -la /data/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
ls: cannot access /data/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133: No such file or directory

root@lease-06 ~# ls -la /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
ls: cannot access /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133: No such file or directory

=== vol 2 ===   

[root@lease-09 ~]# ls -la /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
-rw-rw----. 2 root root 1137152 Jul 19 04:30 /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133

[root@lease-09 ~]# getfattr -d -m . -e hex  /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
getfattr: Removing leading '/' from absolute path names
# file: data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x4c7f2bf981c44a7c867689b9bea9a8be
trusted.gfid2path.736c7b4605ab409c=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f32346435306530332d363339642d346430642d383333312d6632366535333437656639352e313534313333

[root@lease-11 ovirt-data]# ls -la  /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
-rw-rw----. 2 root root 1137152 Jul 19 04:30 /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133

[root@lease-11 ovirt-data]# getfattr -d -m . -e hex /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
getfattr: Removing leading '/' from absolute path names
# file: data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x4c7f2bf981c44a7c867689b9bea9a8be
trusted.gfid2path.736c7b4605ab409c=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f32346435306530332d363339642d346430642d383333312d6632366535333437656639352e313534313333

[root@lease-10 ~]# ls -la  /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
-rw-rw----. 2 root root 0 Jul 19 04:30 /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133

[root@lease-10 ~]#  getfattr -d -m . -e hex  /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
getfattr: Removing leading '/' from absolute path names
# file: data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x4c7f2bf981c44a7c867689b9bea9a8be
trusted.gfid2path.736c7b4605ab409c=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f32346435306530332d363339642d346430642d383333312d6632366535333437656639352e313534313333

=== vol 3 ===

[root@lease-08 ~]# ls -la /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
ls: cannot access /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133: No such file or directory

[root@lease-10 ~]#  ls -la /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
ls: cannot access /data0/gfs/bricks/brick1/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133: No such file or directory

[root@lease-11 ovirt-data]# ls -la /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133
ls: cannot access /data/gfs/bricks/bricka/ovirt-data/.shard/24d50e03-639d-4d0d-8331-f26e5347ef95.154133: No such file or directory

What mismatches is the gfid from vol0 and vol2 (0x055235bd970a4c23a92213af1ada1006 vs. 0x4c7f2bf981c44a7c867689b9bea9a8be)
The issue mainly seems to manifest at volumes with a arbiter brick, but it did also occur on volumes with 3 full bricks.

I've found a similar issue in the mailing list; https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
Which suggests removing the stickybit files, which is exactly my workaround for this issue. Without removing the files i'm unable to run the VM again, since it will pause again on the same shard file.


How reproducible:
I don't have exact steps to reproduce, the VM just seems running normally, not particular high load. However on lot's of file IO it seems to surface, this could be a simple yum install, or operations as xfs_fsr 
or syntactic: dd if=/dev/zero of=/data/test_file count=2024000

Comment 2 Susant Kumar Palai 2019-07-25 09:49:32 UTC
Olaf,
   Can you update the IO pattern from the application? With that, I will try to reproduce the issue. For now, if lookup-optimize is turned off you can delete the linkto file(with the permission --------T) on the backend directly and do a lookup from the mount (or a fresh mount).

Comment 4 Nithya Balachandran 2019-08-27 12:02:42 UTC
Olaf,

Gluster 3.12.15 is an upstream release so it looks like you are using the upstream bits. I am modifying the BZ accordingly.

Regards,
Nithya

Comment 5 Nithya Balachandran 2019-08-27 12:23:51 UTC
Olaf,

Please provide the information Susant has asked for as well as the gluster volume info, the gluster mount and brick logs.

Regards,
Nithya

Comment 6 Olaf Buitelaar 2019-08-27 13:01:39 UTC
Dear Susant, 

Sorry i completely missed your comment. I don't really think it's possible to update the IO pattern of the App's. It's essentially VM's (via KVM, managed by oVirt) The vm's are running various apps via docker, probably the most IO intensive are Mariadb databases.
I've recently updated to gluster 6.4 and are currently running gluster 6.5, and the issue still exists. however i did notice a few things;
on VM's where this issue did occur before, i ran sparsify (https://ovirt.org/develop/release-management/features/virt/virt-sparsify.html) after that i've not observed the issue since.
On 2 VM's the issue still exists, these VM's do have snapshots, and didn't allow me to run sparsify. They currently only seem to pause when running xfs_fsr. Except once, while a brick-replace was in action (as we're migrating to a new environment).

@Nithya I've attached the logs earlier send in the mailing list. If you do require new logs let me know which exactly you're interested in, since it shouldn't be hard to reproduce the issue here. I don't think the mount logs will add much, since the VM's use libgfapi

Thanks Olaf

Comment 7 Olaf Buitelaar 2019-08-27 13:02:47 UTC
Created attachment 1608565 [details]
ansible job to clear T file, showing shard's affected

Comment 8 Olaf Buitelaar 2019-08-27 13:03:06 UTC
Created attachment 1608566 [details]
ansible job to clear T file, showing shard's affected

Comment 9 Olaf Buitelaar 2019-08-27 13:04:01 UTC
Created attachment 1608567 [details]
qemu log

Comment 10 Olaf Buitelaar 2019-08-27 13:04:18 UTC
Created attachment 1608568 [details]
qemu log

Comment 11 Olaf Buitelaar 2019-08-27 13:04:38 UTC
Created attachment 1608569 [details]
qemu log

Comment 12 Olaf Buitelaar 2019-08-27 13:05:07 UTC
Created attachment 1608570 [details]
qemu log

Comment 13 Olaf Buitelaar 2019-08-27 13:05:52 UTC
Created attachment 1608571 [details]
volume info

Comment 14 Olaf Buitelaar 2019-08-27 13:06:17 UTC
Created attachment 1608572 [details]
volume info

Comment 15 Olaf Buitelaar 2019-08-27 13:06:46 UTC
Created attachment 1608573 [details]
volume info

Comment 16 Olaf Buitelaar 2019-08-27 13:07:12 UTC
Created attachment 1608574 [details]
volume info

Comment 17 Olaf Buitelaar 2019-08-27 13:07:54 UTC
Created attachment 1608575 [details]
gluster logs

Comment 18 Olaf Buitelaar 2019-08-27 13:08:32 UTC
Created attachment 1608576 [details]
gluster logs

Comment 19 Olaf Buitelaar 2019-08-27 13:09:09 UTC
Created attachment 1608577 [details]
gluster logs

Comment 20 Olaf Buitelaar 2019-08-27 13:09:34 UTC
Created attachment 1608578 [details]
gluster logs

Comment 21 Olaf Buitelaar 2019-08-27 13:09:58 UTC
Created attachment 1608579 [details]
gluster logs

Comment 22 Olaf Buitelaar 2019-08-27 13:10:36 UTC
Created attachment 1608580 [details]
gluster logs

Comment 23 Olaf Buitelaar 2019-08-27 13:11:02 UTC
Created attachment 1608581 [details]
gluster logs

Comment 24 Olaf Buitelaar 2019-08-27 13:11:54 UTC
Created attachment 1608582 [details]
gluster logs

Comment 25 Olaf Buitelaar 2019-08-27 13:12:19 UTC
Created attachment 1608583 [details]
gluster logs

Comment 26 Olaf Buitelaar 2019-08-30 09:23:28 UTC
Created attachment 1609814 [details]
another instance of stale file log's with v6.5

Just adding another sample, of the stale file error.
probably the entry you're interested in is at; lease-08.dc01.adsolutions-data-gfs-bricks-bricka-ovirt-data.log
with line; 
[2019-08-30 03:48:16.689485] E [MSGID: 113002] [posix-entry-ops.c:323:posix_lookup] 1-ovirt-data-posix: buf->ia_gfid is null for /data0/gfs/bricks/brick1/ovirt-data/.shard/8a27b91a-ff02-42dc-bd4c-caa019424de8.1682 [No data available]
[2019-08-30 03:48:16.689544] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-ovirt-data-server: 20674155: LOOKUP /.shard/8a27b91a-ff02-42dc-bd4c-caa019424de8.1682 (be318638-e8a0-4c6d-977d-7a937aa84806/8a27b91a-ff02-42dc-bd4c-caa019424de8.1682), client: CTX_ID:e00800f6-01d4-4d32-b374-1cd8f82dae57-GRAPH_ID:0-PID:18060-HOST:lease-07.dc01.adsolutions-PC_NAME:ovirt-data-client-9-RECON_NO:-0, error-xlator: ovirt-data-posix [No data available]
this error seems different than the other encounters, which usually logged something like;
[2019-07-17 01:21:52.768672] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.111756
[2019-07-17 01:21:52.768713] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.111756

Comment 27 billycole 2019-12-03 18:34:03 UTC
Hello,

We're also seeing this issue with some frequency.  I've been unable to exactly reproduce the steps, but can attest that it seems to be more prevalent when the volume is closer to full.  Seemingly anywhere about 80% or 90% and the issue appears.

Heres a log of our instance:

[2019-12-03 18:16:14.623264] E [MSGID: 109040] [dht-helper.c:1365:dht_migration_complete_check_task] 0-scratch-dht: c9b3d9cd-ffb2-4ea3-af8a-dff45151f86d: failed to lookup the file on scratch-dht [Stale file handle]
[2019-12-03 18:16:14.626687] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 189341727: READ => -1 gfid=4068dcae-a9d1-4429-aa86-621f9a242fd1 fd=0x7fe552460b88 (Stale file handle)
[2019-12-03 18:16:14.639266] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 189341728: READ => -1 gfid=4068dcae-a9d1-4429-aa86-621f9a242fd1 fd=0x7fe552460b88 (Stale file handle)
[2019-12-03 18:16:14.639405] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 189341729: READ => -1 gfid=4068dcae-a9d1-4429-aa86-621f9a242fd1 fd=0x7fe552460b88 (Stale file handle)
[2019-12-03 18:16:14.640918] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 189341741: READ => -1 gfid=4068dcae-a9d1-4429-aa86-621f9a242fd1 fd=0x7fe552460b88 (Stale file handle)
[2019-12-03 18:16:14.642831] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 189341730: READ => -1 gfid=4068dcae-a9d1-4429-aa86-621f9a242fd1 fd=0x7fe552460b88 (Stale file handle)
[2019-12-03 18:16:18.023463] W [fuse-bridge.c:2830:fuse_readv_cbk] 0-glusterfs-fuse: 189342328: READ => -1 gfid=4068dcae-a9d1-4429-aa86-621f9a242fd1 fd=0x7fe552460b88 (Stale file handle)

We're running Gluster 6.5 on Amazon Linux 2 which is a CentOS 7 variant.

Our shard size is set to 512MB if that matters, and we have a 3x2 setup for this volume, totaling 14TB.

Thanks!
Tim

Comment 28 billycole 2019-12-04 00:53:21 UTC
Another bit of detail on this; for some of this at least seems to be persisting in the client as well.

I went through each of the servers and cleaned up the empty shard files, but one of my clients continues to have the above error, while one of them does not.  I tried to clear cache running:

echo 3 | sudo tee /proc/sys/vm/drop_caches

But the problem still continued on that client.

So it does seem that there is perhaps some issue on the client side only as well?  Or at least there is some confused state that doesn't get flushed once the issue on the server is corrected.

Comment 29 Olaf Buitelaar 2019-12-04 10:58:47 UTC
Hi Tim,

I'm sorry i forgot to mention the procedure changed for me with newer gluster versions.
First it was enough to just cleanup the stale file's, but now after cleaning the files i always had to run a; gluster volume rebalance [volume name] start. before the it was fixed.
So you might want to try a rebalance to see if it resolves the issue.

Best Olaf

Comment 30 Olaf Buitelaar 2019-12-05 19:11:10 UTC
Created attachment 1642453 [details]
different stale file

Today i found a completely different scenario on which stale files could occur.
While i was investigating why the docker image of gitlab (https://hub.docker.com/r/gitlab/gitlab-ce) couldn't start on a regular mount. It just fails on the postgresql initialisation. I've attempted this several times before all the way back on gluster 3.12.15 (the latest tests were performed on gluster 6.6). I decided to give it different route and create a new volume with the options;
network.remote-dio off
performance.strict-o-direct on
cluster.choose-local off
interestingly enough this allowed the postgresql initialisation to complete, and the container to fully start-up and being functional. To test further i also wanted to see if it would work with sharding, since none my other docker containers work with a gluster volume which is shared.So i created another volume, with sharding, and this also worked well. Except that both volumes report a (the same) stale file on;/gitlab/data/postgresql/data/pg_stat_tmp/global.stat

([fuse-bridge.c:1509:fuse_fd_cbk] 0-glusterfs-fuse: 13945691: OPEN() /gitlab/data/postgresql/data/pg_stat_tmp/global.stat => -1 (Stale file handle))

i couldn't directly find anything in the logs, but i've attached it all, including the docker stack file used to deploy gitlab (which is one of the most bloated containers i've seen, running many services inside)

For the record;
============================
on this volume gitlab wouldn't start, since it wouldn't go passed the postgresql initialisation;

gluster v info docker2

Volume Name: docker2
Type: Distributed-Replicate
Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2
Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2
Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2 (arbiter)
Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2
Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2
Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2 (arbiter)
Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2
Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2
Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2 (arbiter)
Options Reconfigured:
performance.cache-size: 128MB
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on
============================  
commands used to create none-sharded volume;

gluster volume create docker2-dio replica 3 arbiter 1 10.201.0.1:/data0/gfs/bricks/brick1/docker2-dio 10.201.0.5:/data0/gfs/bricks/brick1/docker2-dio 10.201.0.6:/data0/gfs/bricks/bricka/docker2-dio
gluster volume set docker2-dio network.remote-dio off
gluster volume set docker2-dio performance.strict-o-direct on
gluster volume set docker2-dio cluster.choose-local off
gluster v start docker2-dio

gluster v info docker2-dio

Volume Name: docker2-dio
Type: Replicate
Volume ID: 54b3a0dc-20a9-4d29-a7ea-bd7cd8500b91
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.201.0.1:/data0/gfs/bricks/brick1/docker2-dio
Brick2: 10.201.0.5:/data0/gfs/bricks/brick1/docker2-dio
Brick3: 10.201.0.6:/data0/gfs/bricks/bricka/docker2-dio (arbiter)
Options Reconfigured:
cluster.choose-local: off
performance.strict-o-direct: on
network.remote-dio: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: on
============================    

commands used to create sharded volume;


gluster volume create docker2-dio-shrd replica 3 arbiter 1 10.201.0.1:/data0/gfs/bricks/brick1/docker2-dio-shrd 10.201.0.5:/data0/gfs/bricks/brick1/docker2-dio-shrd 10.201.0.6:/data0/gfs/bricks/bricka/docker2-dio-shrd
gluster volume set docker2-dio-shrd network.remote-dio off
gluster volume set docker2-dio-shrd performance.strict-o-direct on
gluster volume set docker2-dio-shrd cluster.choose-local off
gluster volume set docker2-dio-shrd features.shard on
gluster v start docker2-dio-shrd

gluster v info docker2-dio-shrd

Volume Name: docker2-dio-shrd
Type: Replicate
Volume ID: 50300e86-56d6-4e2d-be61-944f562e58f1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.201.0.1:/data0/gfs/bricks/brick1/docker2-dio-shrd
Brick2: 10.201.0.5:/data0/gfs/bricks/brick1/docker2-dio-shrd
Brick3: 10.201.0.6:/data0/gfs/bricks/bricka/docker2-dio-shrd (arbiter)
Options Reconfigured:
features.shard: on
cluster.choose-local: off
performance.strict-o-direct: on
network.remote-dio: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: on  
============================     
available space;

df -h|grep docker2-dio
10.201.0.5:/docker2-dio        5.4T  2.6T  2.9T  47% /mnt/docker2-dio
10.201.0.5:/docker2-dio-shrd   5.4T  2.6T  2.9T  47% /mnt/docker2-dio-shrd

Comment 31 Olaf Buitelaar 2019-12-05 19:41:35 UTC
I forgot to mention that i'm able to stat the file (/gitlab/data/postgresql/data/pg_stat_tmp/global.stat) and hexdump it..despite all stale file warnings;

 stat /mnt/docker2-dio-shrd/gitlab/data/postgresql/data/pg_stat_tmp/global.stat
  File: ‘/mnt/docker2-dio-shrd/gitlab/data/postgresql/data/pg_stat_tmp/global.stat’
  Size: 977             Blocks: 2          IO Block: 131072 regular file
Device: 50h/80d Inode: 12028693131041868837  Links: 1
Access: (0600/-rw-------)  Uid: (  996/ gluster)   Gid: (  996/  chrony)
Context: system_u:object_r:fusefs_t:s0
Access: 2019-12-05 19:39:05.173555251 +0000
Modify: 2019-12-05 19:39:05.195042075 +0000
Change: 2019-12-05 19:39:05.196744057 +0000

hexdump /mnt/docker2-dio-shrd/gitlab/data/postgresql/data/pg_stat_tmp/global.stat |head
0000000 bc9d 01a5 8b56 e066 3bf8 0002 0104 0000
0000010 0000 0000 0002 0000 0000 0000 006c 0000
0000020 0000 0000 0e5f 0000 0000 0000 000b 0000
0000030 0000 0000 0601 0000 0000 0000 0000 0000
0000040 0000 0000 148e 0000 0000 0000 0000 0000
0000050 0000 0000 5c53 0018 0000 0000 8d64 ae66
0000060 3be6 0002 0000 0000 0000 0000 0000 0000
0000070 0000 0000 0000 0000 0000 0000 0000 0000
*
00000e0 0000 0000 8d64 ae66 3be6 0002 0144 0000


Note You need to log in before you can comment on or make changes to this bug.