Bug 1569074
Summary: | fatal: unable to access file: Stale file handle | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Frank Rühlemann <ruehlemann> |
Component: | distribute | Assignee: | bugs <bugs> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3.12 | CC: | atumball, bugs, g.amedick, nbalacha, ruehlemann |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-10 10:25:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Frank Rühlemann
2018-04-18 14:54:58 UTC
Hi, according to our users, this affects exclusively git projects. Today, I found something interesting in the logs of our rebalances. They contain the following lines (the path varies): [2018-04-25 16:44:45.443553] E [MSGID: 109023] [dht-rebalance.c:2669:gf_defrag_migrate_single_file] 0-$vol-dht: Migrate file failed: /$somepath/.git/config lookup failed [Stale file handle] [2018-06-19 22:50:55.621752] I [dht-rebalance.c:3437:gf_defrag_process_dir] 0-$vol-dht: Migration operation on dir /$somepath/.git took 0.03 secs Side note: this does not count as a failure in the rebalance status. Not sure whether this is intended or not, since it is classified as an error. The paths varies, but it's always something in a .git-folder. Mostly the config, sometimes other files in a .git folder. This goes back until February. The logs before February contain messages like this: [2017-12-15 23:53:02.919615] W [MSGID: 109023] [dht-rebalance.c:2503:gf_defrag_get_entry] 0-$vol-dht: lookup failed for file:/$somepath/.git/config [Stale file handle] [2018-06-19 22:50:55.621752] I [dht-rebalance.c:3437:gf_defrag_process_dir] 0-$vol-dht: Migration operation on dir /$somepath/.git took 0.03 secs Similar but different. In February, we migrated from 3.8 to 3.12, so perhapes the versions treated the same thing slightly differently. Searching in the rebalance log for one of those files showed the following (for both before and after the upgrade to 3.12): $ zgrep "$somepath/.git/config" $vol-rebalance.log.5 [2018-05-01 12:01:00.633768] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-$vol-dht: /$somepath/.git/config: gfid different on data file on $vol-client-22, gfid local = f61909e4-cfbf-4802-af00-70ffb9e79d28, gfid node = f61909e4-cfbf-4802-af00-70ffb9e79d28 [2018-05-01 12:01:00.634506] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-$vol-dht: /$somepath/.git/config: gfid differs on subvolume $vol-client-22, gfid local = 070f37a1-de0c-439b-97dd-ce5311988737, gfid node = f61909e4-cfbf-4802-af00-70ffb9e79d28 [2018-05-01 12:01:00.645447] E [MSGID: 109023] [dht-rebalance.c:2669:gf_defrag_migrate_single_file] 0-$vol-dht: Migrate file failed: /$somepath/.git/config lookup failed [Stale file handle] It looks like the files have two different gfid's somehow. dht_lookup_linkfile_cbk actually reports identical gfid's, but dht_lookup_everywhere_cbk finds two different ones. Could that be the reason for the stale files handles? And if yes, why does this happen (and why only to .git-files) and how can I fix that? Additional info, I dug around a bit with the gfid and this is the result (those are the gfid's found in the logfile above): # getfattr -n trusted.glusterfs.pathinfo -e text /mnt/.gfid/f61909e4-cfbf-4802-af00-70ffb9e79d28 # file: mnt/.gfid/f61909e4-cfbf-4802-af00-70ffb9e79d28 trusted.glusterfs.pathinfo="(<DISTRIBUTE:$vol-dht> <POSIX(/srv/glusterfs/bricks/DATA109/data):gluster01:/srv/glusterfs/bricks/DATA109/data/.glusterfs/f6/19/f61909e4-cfbf-4802-af00-70ffb9e79d28>)" # getfattr -n trusted.glusterfs.pathinfo -e text /mnt/.gfid/070f37a1-de0c-439b-97dd-ce5311988737 getfattr: mnt/.gfid/070f37a1-de0c-439b-97dd-ce5311988737: No such file or directory # getfattr -n glusterfs.gfid.string /mnt/$somepath # file: mnt/$somepath glusterfs.gfid.string="1d4a5ccc-986f-4051-aee9-d31dd2ca3775" # getfattr -n glusterfs.gfid.string /mnt/$somepath/.git # file: mnt/$somepath/.git glusterfs.gfid.string="ca2dfd69-7466-4949-911a-b8874c364987" # getfattr -n glusterfs.gfid.string /mnt/$somepath/.git/config getfattr: /mnt/$somepath/.git/config: Stale file handle # cat /mnt/$somepath/.git/config cat: /mnt/$somepath/.git/config: Stale file handle # cd /mnt/$somepath/.git/ /mnt/$somepath/.git# ls branches COMMIT_EDITMSG config description FETCH_HEAD HEAD hooks index info logs objects ORIG_HEAD packed-refs refs /mnt/$somepath/.git# cat config [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] ... [branch "master"] remote = origin merge = refs/heads/master /mnt/$somepath/.git# getfattr -n glusterfs.gfid.string /mnt/$somepath/.git/config # file: mnt/$somepath/.git/config glusterfs.gfid.string="f61909e4-cfbf-4802-af00-70ffb9e79d28" /mnt/$somepath/.git# cd # getfattr -n glusterfs.gfid.string /mnt/$somepath/.git/config # file: mnt/$somepath/.git/config glusterfs.gfid.string="f61909e4-cfbf-4802-af00-70ffb9e79d28" # getfattr -n trusted.glusterfs.pathinfo -e text /mnt/.gfid/f61909e4-cfbf-4802-af00-70ffb9e79d28 # file: mnt/.gfid/f61909e4-cfbf-4802-af00-70ffb9e79d28 trusted.glusterfs.pathinfo="(<DISTRIBUTE:$vol-dht> <POSIX(/srv/glusterfs/bricks/DATA109/data):gluster01:/srv/glusterfs/bricks/DATA109/data/$somepath/.git/config>)" It seems like cat-ing the file with a relative path name fixes the gfid. Sooo I went and looked where those two GFID's come from. All those config files have a linkfile. And this linkfile has a different gfid than the original. I checked a few other linkfiles and they seem to have the same gfid as the original file. Can I fix my broken files? And if yes, how? > Can I fix my broken files? And if yes, how?
Recommend to remove linkfile. They can get auto created by glusterfs.
Are you still seeing this problem? Unknown. The previously "stale'd" files were still stale'd. Removing the linkfile fixed them, thanks for the tip. I checked the GFID's, they now match. Our users didn't report another stale file handle. Which doesn't necessarily mean that there isn't another. The only way for me to make sure there isn't any I know is a rebalance, and there hasn't been the need for one for a while. We'll switch from pure distributed to a distributed dispersed volume soon. Not sure if that influences this behaviour but if yes, this problem might solve itself for us. (In reply to g.amedick from comment #6) > Unknown. The previously "stale'd" files were still stale'd. Removing the > linkfile fixed them, thanks for the tip. I checked the GFID's, they now > match. > > Our users didn't report another stale file handle. Which doesn't necessarily > mean that there isn't another. The only way for me to make sure there isn't > any I know is a rebalance, and there hasn't been the need for one for a > while. > > We'll switch from pure distributed to a distributed dispersed volume soon. > Not sure if that influences this behaviour but if yes, this problem might > solve itself for us. Thanks. Please let us know if you see this again. Can I close this BZ ? We can reopen it if you see it again. Hi, yes, this can be closed The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |