Bug 1568521

Summary: shard files present even after deleting vm from ovirt UI
Product: [Community] GlusterFS Reporter: Krutika Dhananjay <kdhananj>
Component: shardingAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED CURRENTRELEASE QA Contact: bugs <bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: abhishku, bugs, gusevmk.uni, kdhananj, pkarampu, rhinduja, rhs-bugs, sasundar, storage-qa-internal, vharihar
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-5.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1520882 Environment:
Last Closed: 2018-06-20 18:05:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1520882, 1522624    
Attachments:
Description Flags
gluster client log
none
gluster server log
none
glustershd.log
none
glusterd.log
none
mount point (client) log none

Description Krutika Dhananjay 2018-04-17 16:47:41 UTC
+++ This bug was initially created as a clone of Bug #1520882 +++

Description of problem:

ghost shard files present even after deleting vm from the rhev resulting more space utilized than it should actually have.

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-18.6.el7rhgs.x86_64 

Actual results:

shard files available even after deleting vm

Expected results:

shard files should have deleted after vm deletion.

--- Additional comment from Abhishek Kumar on 2017-12-05 07:11:29 EST ---

Root cause of the issue :

Presence of 3 orphaned shard set in .shard location caused less free available space on the volume.

All these 3 shard set almost consist of around 630 GB of space.

Reason for these shard not getting deleted after VM got deleted :
~~~
[2017-11-20 11:52:48.788824] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 1725: UNLINK() /1d7b3d24-88d4-4eba-a0cf-9bd3625acbf6/images/_remove_me_d9a3fbce-1abf-46b4-ab88-ae38a61bb9f9/e47a4bfe-84d8-4f29-af78-7730e7ec1008 => -1 (Transport endpoint is not connected)
~~~
and a lot of errors of the kind 'Transport endpoint is not connected' in the fuse mount log from both the replicas, it is clear that the UNLINK operation (which is nothing but disk deletion) failed midway before all shards could be cleaned up.

Comment 1 Worker Ant 2018-04-17 16:49:17 UTC
REVIEW: https://review.gluster.org/19892 (features/shard: Make operations on internal directories generic) posted (#1) for review on master by Krutika Dhananjay

Comment 2 Krutika Dhananjay 2018-04-17 16:51:18 UTC
(In reply to Worker Ant from comment #1)
> REVIEW: https://review.gluster.org/19892 (features/shard: Make operations on
> internal directories generic) posted (#1) for review on master by Krutika
> Dhananjay

This is still only first of the many patches to come that are needed to fix this bug. Splitting it into multiple patches for easier review and incremental testing.

Comment 3 Worker Ant 2018-04-18 15:50:55 UTC
COMMIT: https://review.gluster.org/19892 committed in master by "Pranith Kumar Karampuri" <pkarampu> with a commit message- features/shard: Make operations on internal directories generic

Change-Id: Iea7ad2102220c6d415909f8caef84167ce2d6818
updates: bz#1568521
Signed-off-by: Krutika Dhananjay <kdhananj>

Comment 4 Worker Ant 2018-04-22 16:48:25 UTC
REVIEW: https://review.gluster.org/19915 (features/shard: Add option to barrier parallel lookup and unlink of shards) posted (#1) for review on master by Krutika Dhananjay

Comment 5 Worker Ant 2018-04-23 15:53:54 UTC
COMMIT: https://review.gluster.org/19915 committed in master by "Pranith Kumar Karampuri" <pkarampu> with a commit message- features/shard: Add option to barrier parallel lookup and unlink of shards

Also move the common parallel unlink callback for GF_FOP_TRUNCATE and
GF_FOP_FTRUNCATE into a separate function.

Change-Id: Ib0f90a5f62abdfa89cda7bef9f3ff99f349ec332
updates: bz#1568521
Signed-off-by: Krutika Dhananjay <kdhananj>

Comment 6 Worker Ant 2018-04-23 15:55:28 UTC
REVIEW: https://review.gluster.org/19927 (libglusterfs/syncop: Handle barrier_{init/destroy} in error cases) posted (#1) for review on master by Pranith Kumar Karampuri

Comment 7 Worker Ant 2018-04-24 04:06:46 UTC
REVIEW: https://review.gluster.org/19929 (features/shard: Introducing .shard_remove_me for atomic shard deletion (part 1)) posted (#1) for review on master by Krutika Dhananjay

Comment 8 Worker Ant 2018-04-25 01:53:32 UTC
COMMIT: https://review.gluster.org/19927 committed in master by "Pranith Kumar Karampuri" <pkarampu> with a commit message- libglusterfs/syncop: Handle barrier_{init/destroy} in error cases

BUG: 1568521
updates: bz#1568521
Change-Id: I53e60cfcaa7f8edfa5eca47307fa99f10ee64505
Signed-off-by: Pranith Kumar K <pkarampu>

Comment 9 Worker Ant 2018-04-25 09:28:45 UTC
REVIEW: https://review.gluster.org/19937 (features/shard: Make sure .shard_remove_me is not exposed to mount point) posted (#1) for review on master by Krutika Dhananjay

Comment 10 Worker Ant 2018-05-07 10:56:20 UTC
REVIEW: https://review.gluster.org/19970 (features/shard: Perform shards deletion in the background) posted (#2) for review on master by Krutika Dhananjay

Comment 11 Worker Ant 2018-06-13 09:57:43 UTC
COMMIT: https://review.gluster.org/19929 committed in master by "Pranith Kumar Karampuri" <pkarampu> with a commit message- features/shard: Introducing ".shard/.remove_me" for atomic shard deletion (part 1)

PROBLEM:
Shards are deleted synchronously when a sharded file is unlinked or
when a sharded file participating as the dst in a rename() is going to
be replaced. The problem with this approach is it makes the operation
really slow, sometimes causing the application to time out, especially
with large files.

SOLUTION:
To make this operation atomic, we introduce a ".remove_me" directory.
Now renames and unlinks will simply involve two steps:
1. creating an empty file under .remove_me named after the gfid of the file
participating in unlink/rename
2. carrying out the actual rename/unlink
A synctask is created (more on that in part 2) to scan this directory
after every unlink/rename operation (or upon a volume mount) and clean
up all shards associated with it. All of this happens in the background.
The task takes care to delete the shards associated with the gfid in
.remove_me only if this gfid doesn't exist in backend, ensuring that the
file was successfully renamed/unlinked and its shards can be discarded now
safely.

Change-Id: Ia1d238b721a3e99f951a73abbe199e4245f51a3a
updates: bz#1568521
Signed-off-by: Krutika Dhananjay <kdhananj>

Comment 12 Worker Ant 2018-06-20 15:04:22 UTC
COMMIT: https://review.gluster.org/19970 committed in master by "Krutika Dhananjay" <kdhananj> with a commit message- features/shard: Perform shards deletion in the background

A synctask is created that would scan the indices from
.shard/.remove_me, to delete the shards associated with the
gfid corresponding to the index bname and the rate of deletion
is controlled by the option features.shard-deletion-rate whose
default value is 100.
The task is launched on two accounts:
1. when shard receives its first-ever lookup on the volume
2. when a rename or unlink deleted an inode

Change-Id: Ia83117230c9dd7d0d9cae05235644f8475e97bc3
updates: bz#1568521
Signed-off-by: Krutika Dhananjay <kdhananj>

Comment 13 Shyamsundar 2018-06-20 18:05:09 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 14 Shyamsundar 2018-10-23 15:06:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 15 Mikhail Gusev 2020-12-03 16:06:49 UTC
Hello, I think my problem is related with this issue
After remove files from client (rm -rf /mountpoint/*), I get many error in gluster mnt log (client side):
E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-<volume name>-shard: Failed to clean up shards of gfid 4b5afa49-5446-49e2-a7ba-1b4f2ffadb12 [Stale file handle]

Files in mountpoint was deleted, but there are no available space after this operation.
I did check - there are no files in directory .glusterfs/unlink, but many files are in .shard/.remove_me/

As far as I understand glusterd server process have to check files in .shard/.remove_me/ and if no mapping in .glusterfs/unlink, shards must be removed.. But seems it not working. 


# glusterd --version 
glusterfs 8.2

# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.3 (Ootpa)

gluster volume type: dispersed-distribute (sharding is enabled)

Comment 16 Vinayak Hariharmath 2020-12-04 06:32:52 UTC
Hello,
Bit of background about sharded file deletion: There are 2 parts of sharded files 1. base file (1st shard or reference shard) 2. shards of the base file stored as GFID.index

When we delete a sharded file
1. firstly entry of a base file created (In the name of GFID) under .shard/.remove_me
2. next, the base file will be unlinked
3. in the background, the associated shards will be cleaned, and then finally reference entry present at .shard/.remove_me will be removed

The reference created under .shard/.remove_me always referred to build the path to delete the associated shards. So the background thread picks up the ".shard/remove_me" entries, builds the shards path, and deletes them. 

So with your description, It looks like steps 1 and 2 are done but the background thread is getting ESTALE while cleaning up those .shard/.remove_me entries, the shards left undelete and space is not freed up. 

It looks strange, why you are getting ESTALE though the entry is present at .shard/.remove_me. Can you please post the complete logs of the time you performed the 1st deletion? History of events also helpful to analyze the issue

Regards
Vh

Comment 17 Mikhail Gusev 2020-12-04 08:16:02 UTC
Created attachment 1736356 [details]
gluster client log

Comment 18 Mikhail Gusev 2020-12-04 08:16:34 UTC
Created attachment 1736357 [details]
gluster server log

Comment 19 Mikhail Gusev 2020-12-04 08:17:41 UTC
Events:
1) Generate many files test-* on client side to /mnt/glusterfs-mountpoint (via for i in .. ; do dd if=/dev/zero .. ). File size = 100 G or 1T (amount 38 tb, gluster volume - 48 tb)
2) Start operation rm -rf /mnt/glusterfs-mountpoin/test-*
3) There are no files /mnt/glusterfs-mountpoin/test-*, but shards are still on bricks file systems.
Logs of glusterfs server and client (server-log.txt and client-log.txt - unix generated) (period - rm -rf operation) in attachment 
https://bugzilla.redhat.com/attachment.cgi?id=1736356
https://bugzilla.redhat.com/attachment.cgi?id=1736357

Comment 20 Mikhail Gusev 2020-12-04 08:36:16 UTC
I did try now remove file (size = 256 GB), many errors on client side like

[2020-12-04 08:27:22.461946] W [MSGID: 109009] [dht-common.c:2957:dht_lookup_linkfile_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3108: gfid different on data file on gv0-disperse-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node =
 433844c5-b0ec-4e31-91b5-af844d466a25  
[2020-12-04 08:27:22.463406] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3108: gfid differs on subvolume gv0-disperse-0, gfid local = bdf3dc58-4364-44dd-be81-a4fd83f5c85c, gfid node = 43
3844c5-b0ec-4e31-91b5-af844d466a25 
[2020-12-04 08:27:22.464209] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3108: gfid differs on subvolume gv0-disperse-5, gfid local = 433844c5-b0ec-4e31-91b5-af844d466a25, gfid node = bd
f3dc58-4364-44dd-be81-a4fd83f5c85c 
[2020-12-04 08:27:22.464377] W [MSGID: 109009] [dht-common.c:2957:dht_lookup_linkfile_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3114: gfid different on data file on gv0-disperse-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 4a19b4b6-6c02-4a2c-babe-4203a7c0667f  
[2020-12-04 08:27:22.466356] E [MSGID: 133010] [shard.c:2413:shard_common_lookup_shards_cbk] 0-gv0-shard: Lookup on shard 3108 failed. Base file gfid = 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]
[2020-12-04 08:27:22.467072] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3114: gfid differs on subvolume gv0-disperse-0, gfid local = e61e9167-0c6f-4328-a90d-1a1f238d95b7, gfid node = 4a19b4b6-6c02-4a2c-babe-4203a7c0667f 
[2020-12-04 08:27:22.469243] E [MSGID: 133010] [shard.c:2413:shard_common_lookup_shards_cbk] 0-gv0-shard: Lookup on shard 3114 failed. Base file gfid = 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]
[2020-12-04 08:27:22.481961] W [MSGID: 109009] [dht-common.c:2957:dht_lookup_linkfile_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3213: gfid different on data file on gv0-disperse-10, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 25a0cf4f-7d37-4243-a6a2-4e69766669ed  
[2020-12-04 08:27:22.485468] W [MSGID: 109009] [dht-common.c:2957:dht_lookup_linkfile_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3222: gfid different on data file on gv0-disperse-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = e6e7e106-62b2-41fb-9f0d-0e40d864ff55  
[2020-12-04 08:27:22.485552] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3213: gfid differs on subvolume gv0-disperse-10, gfid local = 8ca9e1a6-33ca-4ce8-a7bb-5226fdc70e45, gfid node = 25a0cf4f-7d37-4243-a6a2-4e69766669ed 
[2020-12-04 08:27:22.485611] E [MSGID: 133010] [shard.c:2413:shard_common_lookup_shards_cbk] 0-gv0-shard: Lookup on shard 3213 failed. Base file gfid = 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]
[2020-12-04 08:27:22.489397] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3222: gfid differs on subvolume gv0-disperse-0, gfid local = 87c28849-ce01-4d88-942c-3ca11d92d278, gfid node = e6e7e106-62b2-41fb-9f0d-0e40d864ff55 
[2020-12-04 08:27:22.489598] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3222: gfid differs on subvolume gv0-disperse-5, gfid local = e6e7e106-62b2-41fb-9f0d-0e40d864ff55, gfid node = 87c28849-ce01-4d88-942c-3ca11d92d278 
[2020-12-04 08:27:22.490539] W [MSGID: 109009] [dht-common.c:2957:dht_lookup_linkfile_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3224: gfid different on data file on gv0-disperse-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 1c07c691-e394-4bd2-ba09-1ddebffe07c3  
[2020-12-04 08:27:22.492180] E [MSGID: 133010] [shard.c:2413:shard_common_lookup_shards_cbk] 0-gv0-shard: Lookup on shard 3222 failed. Base file gfid = 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]
[2020-12-04 08:27:22.493505] W [MSGID: 109009] [dht-common.c:2957:dht_lookup_linkfile_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3238: gfid different on data file on gv0-disperse-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 9e2d6430-62bb-45b2-8306-c0d4d294a6f1  
[2020-12-04 08:27:22.496633] W [MSGID: 109009] [dht-common.c:2957:dht_lookup_linkfile_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3246: gfid different on data file on gv0-disperse-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = c9b3d7af-b770-4096-8082-b717922f2294  
[2020-12-04 08:27:22.498543] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3224: gfid differs on subvolume gv0-disperse-0, gfid local = b53ecf59-4f98-4089-ae46-9e57fe9e3369, gfid node = 1c07c691-e394-4bd2-ba09-1ddebffe07c3 
[2020-12-04 08:27:22.498969] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3224: gfid differs on subvolume gv0-disperse-5, gfid local = 1c07c691-e394-4bd2-ba09-1ddebffe07c3, gfid node = b53ecf59-4f98-4089-ae46-9e57fe9e3369 
[2020-12-04 08:27:22.499004] E [MSGID: 133010] [shard.c:2413:shard_common_lookup_shards_cbk] 0-gv0-shard: Lookup on shard 3224 failed. Base file gfid = 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]



and like:
[2020-12-04 08:27:27.096514] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 7b79e9cc-e648-4bc7-abc6-25d5daf51915 [Stale file handle]
[2020-12-04 08:27:27.101842] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 3ed7518d-b846-4dc8-bc67-2fcc89f98064 [Stale file handle]
[2020-12-04 08:27:27.106891] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid eef9d331-2a63-44df-9d68-cf51b07d9458 [Stale file handle]
[2020-12-04 08:27:27.111908] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 8fb6d139-9bda-4f50-806d-14bc55cd7091 [Stale file handle]
[2020-12-04 08:27:27.117283] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid e09dab41-abd0-4c76-8e4d-d0eaf23d999d [Stale file handle]
[2020-12-04 08:27:27.121697] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 1cfed72b-4a04-4fa8-a1dd-44e4be5ee3c1 [Stale file handle]
[2020-12-04 08:27:27.125805] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid c0f11f8c-6f61-490b-823c-e4d353dcd72d [Stale file handle]
[2020-12-04 08:27:27.129995] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 6031e10c-8e4e-4a8c-b303-7bb5570ce914 [Stale file handle]
[2020-12-04 08:27:27.134135] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 28c0c968-4530-46d2-9f7f-0c9db7476a35 [Stale file handle]
[2020-12-04 08:27:27.138357] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid f18218c4-6c06-4964-bdb5-c185cfcdeebf [Stale file handle]
[2020-12-04 08:27:27.143152] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 6d99232f-dad3-4104-bc85-ba367444a846 [Stale file handle]
[2020-12-04 08:27:27.148389] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 57c50b21-362a-42e8-96fc-7cca889578a2 [Stale file handle]
[2020-12-04 08:27:27.154078] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 0249319a-dad6-446f-902e-26bda7ccc746 [Stale file handle]
[2020-12-04 08:27:27.159567] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 932f6044-43ca-44e6-b44a-1584723e38c4 [Stale file handle]
[2020-12-04 08:27:27.164102] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 9bd305d6-90d1-41ce-886f-adc00e9f0065 [Stale file handle]
[2020-12-04 08:27:27.171360] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 95f20db2-0ba9-4eab-b915-c3470655222a [Stale file handle]
[2020-12-04 08:27:27.176355] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid a1123086-81f8-4069-84e8-e0fa26e209bc [Stale file handle]
[2020-12-04 08:27:27.180846] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid b89f7ccb-77c2-4848-a428-66023fd4003e [Stale file handle]
[2020-12-04 08:27:27.185352] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 3bd6f1da-36e1-4a63-aa89-df1eb048a7e6 [Stale file handle]
[2020-12-04 08:27:27.189919] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 50d80b18-31f1-4a01-9bb0-3948feb4a08c [Stale file handle]
[2020-12-04 08:27:27.194483] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid f43023a5-bfc6-421f-a017-773311c2c38e [Stale file handle]
[2020-12-04 08:27:27.199047] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 18e740d1-6fe0-4cf0-b48c-8cbc36a9f7a6 [Stale file handle]
[2020-12-04 08:27:27.203427] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid fc3d058a-1f39-461b-9c89-d170ad797c1f [Stale file handle]
[2020-12-04 08:27:27.207865] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 3b6f5249-4c04-49a4-b510-8c45f57024cb [Stale file handle]
[2020-12-04 08:27:27.214162] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 99e5091e-b330-4191-823f-d16fb47ed875 [Stale file handle]
[2020-12-04 08:27:27.218764] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 85af3050-0350-4b2a-a947-f6e606835268 [Stale file handle]
[2020-12-04 08:27:27.223314] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 2ca6d5b8-2619-4c4b-b19e-7eb58a6b9c96 [Stale file handle]
[2020-12-04 08:27:27.228213] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 51244f79-8b83-4e6b-bb36-c7fade1c7cfe [Stale file handle]

Comment 21 Vinayak Hariharmath 2020-12-04 10:00:01 UTC
Yes, If you delete any file from the gluster mount, every time the entries in .shard/.remove_me are checked and tried for deletion of shards.

The issue seems to be related to link-to-file and your last comment/logs quite useful to analyze the situation. I don't see similar information in the attached logs.
Can you please give a full set of the above-attached logs? Also, would you please check your brick status whether a few of them are completely full?

Regards
Vh

Comment 22 Mikhail Gusev 2020-12-04 14:07:14 UTC
Created attachment 1736420 [details]
glustershd.log

Comment 23 Mikhail Gusev 2020-12-04 14:09:42 UTC
Created attachment 1736421 [details]
glusterd.log

Comment 24 Mikhail Gusev 2020-12-04 14:10:29 UTC
Created attachment 1736422 [details]
mount point (client) log

Comment 25 Mikhail Gusev 2020-12-04 14:14:51 UTC
Full log files of server and client are loaded, as attachments. 
I did check all bricks filesystems on all gluster nodes - available space 38% on each one.

Comment 26 Vinayak Hariharmath 2020-12-07 07:31:06 UTC
Hello Mikhail,

Thanks for the logs. I need some more time to go through the logs and come back. Meanwhile just want to update you that the issue you have reported is not related to this issue and I feel it's better to open a separate issue for this.

Regards
Vh

Comment 27 Mikhail Gusev 2020-12-07 09:22:00 UTC
Should I do this (open new bug), or you yourself?

Comment 28 Mikhail Gusev 2020-12-08 05:11:19 UTC
Already is open - https://bugzilla.redhat.com/show_bug.cgi?id=1904330