Bug 1904330

Summary: No removes shards, listed in ./shard/.remove_me
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Mikhail Gusev <gusevmk.uni>
Component: shardingAssignee: Vinayak Hariharmath <vharihar>
Status: CLOSED NOTABUG QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: gusevmk.uni, rhs-bugs, sajmoham, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-25 05:18:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mikhail Gusev 2020-12-04 05:45:40 UTC
Hello, I think my problem is related with this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1568521
After remove files from client (rm -rf /mountpoint/*), I get many error in gluster mnt log (client side):
E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-<volume name>-shard: Failed to clean up shards of gfid 4b5afa49-5446-49e2-a7ba-1b4f2ffadb12 [Stale file handle]
 
Files in mountpoint was deleted, but there are no available space after this operation.
I did check - there are no files in directory .glusterfs/unlink, but many files are in .shard/.remove_me/
 
As far as I understand glusterd server process have to check files in .shard/.remove_me/ and if no mapping in .glusterfs/unlink, shards must be removed.. But seems it not working. 
 
# glusterd --version 
glusterfs 8.2
 
# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.3 (Ootpa)
 
gluster volume type: dispersed-distribute (sharding is enabled)

Comment 1 Vinayak Hariharmath 2020-12-08 13:39:31 UTC
Hello Mikhail,

Quick questions:

1. Have you made any changes to Gluster volume recently. I mean adding bricks, removing bricks, rebalancing, or any other changes.

2. Could you please tell me what is "ls -la $BRICK_PATH{1..n}/.shard/.remove_me" where n number of bricks?

3. gluster vol info

Regards
Vh

Comment 2 Vinayak Hariharmath 2020-12-23 08:04:46 UTC
Hello Mikhail,

Also, along along with comment 1, please provide

4. do provide xattrs of shards which are having issue.
ex: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3114 is one such shard like this fetch xattrs of few other files for analysis.

copied for the client logs:

[2020-12-03 09:49:13.043519] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3108: gfid differs on subvolume gv0-disperse-0, gfid local = bdf3dc58-4364-44dd-be81-a4fd83f5c85c, gfid node = 433844c5-b0ec-4e31-91b5-af844d466a25 
[2020-12-03 09:49:13.043798] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3108: gfid differs on subvolume gv0-disperse-5, gfid local = 433844c5-b0ec-4e31-91b5-af844d466a25, gfid node = bdf3dc58-4364-44dd-be81-a4fd83f5c85c 
[2020-12-03 09:49:13.044192] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3114: gfid differs on subvolume gv0-disperse-0, gfid local = e61e9167-0c6f-4328-a90d-1a1f238d95b7, gfid node = 4a19b4b6-6c02-4a2c-babe-4203a7c0667f 
[2020-12-03 09:49:13.044614] W [MSGID: 109009] [dht-common.c:2712:dht_lookup_everywhere_cbk] 0-gv0-dht: /.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3114: gfid differs on subvolume gv0-disperse-5, gfid local = 4a19b4b6-6c02-4a2c-babe-4203a7c0667f, gfid node = e61e9167-0c6f-4328-a90d-1a1f238d95b7 
[2020-12-03 09:49:13.045125] E [MSGID: 133010] [shard.c:2413:shard_common_lookup_shards_cbk] 0-gv0-shard: Lookup on shard 3108 failed. Base file gfid = 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]
[2020-12-03 09:49:13.045283] E [MSGID: 133010] [shard.c:2413:shard_common_lookup_shards_cbk] 0-gv0-shard: Lookup on shard 3114 failed. Base file gfid = 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]
[2020-12-03 09:49:13.045335] E [MSGID: 133020] [shard.c:3030:shard_post_lookup_shards_unlink_handler] 0-gv0-shard: failed to delete shards of 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]
[2020-12-03 09:49:13.045917] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 25280164-20ac-47cf-b3ab-fa52edd1b795 [Stale file handle]
[2020-12-03 09:49:13.050320] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 65ecc93b-92e4-42a5-85b5-37ed3b009190 [Stale file handle]
[2020-12-03 09:49:13.054813] E [MSGID: 133021] [shard.c:3761:shard_delete_shards] 0-gv0-shard: Failed to clean up shards of gfid 869733ac-c0fb-4d3b-8f45-333defeb5a37 [Stale file handle]


5. do provide xattrs for base files created under /.shard/.remove_me

6. please do ls -la on the shards which are returning ESTALE ex: "ls -la $BRICK_PATH{1..n}/.shard/25280164-20ac-47cf-b3ab-fa52edd1b795.3114"

7. Have you tried to do unmount and mount the volume and tried to delete the files? Please try and let us know the result

Regards,
Vh

Comment 3 Vinayak Hariharmath 2021-03-25 05:18:55 UTC
Closing this issue since there is no response from reporter

Comment 4 Red Hat Bugzilla 2023-09-15 00:52:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days