| Summary: | Mismatched link/target gfid and ESTALE/ENOENT | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Jeff Darcy <jdarcy> | ||||||
| Component: | distribute | Assignee: | shishir gowda <sgowda> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | mainline | CC: | gluster-bugs, joe, lakshmipathi, mohitanchlia, nsathyan | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | --- | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Hi Jeff, This seems to be a dup of bug 764254 related to dht-rename. We also are aware of the bug in io-stats xlator, and have disabling stat-prefetch is a work around for time being. The mismatching gfid is the bug, which causes invalid args. *** This bug has been marked as a duplicate of bug 2522 *** Created attachment 498 [details]
The Kickstart configuration file being used (made by mkkickstart)
Created attachment 499 It's highly unfortunate that I (and other interested parties) can't view bug764254, so it's impossible to know what progress is being made toward a fix. During my own investigation, I discovered two things which led to the two attached patches. (1) The client translator returns a spurious ESTALE when it detects that the GFID has changed. As explained in the comment to the attached gfid.patch, the value we get back is authoritative and if the value we have cached in the inode differs then it should simply be updated. Running the test procedure I've described with this patch applies generates only the transient errors I'd expect while renames are in progress, avoiding the persistent ESTALE/ENOENT errors that are the subject of this bug. Note that the patch doesn't address the issue of the inode number also changing in this scenario, which could be the source of other problems. (2) The DHT rename code is, to put it delicately, strange. There seems to be little justification for the creation of extra linkfiles and hardlinks as part of a rename. In several places the code that decides which subvolume should receive a particular request seems quite wrong, and even inconsistent with other places which should be making related decisions the same way. The attached rename.patch provides an alternate rename path which seems much more likely to yield correct results in all of the weird src/dst/hashed/cached cases, in addition to being noticeably more efficient and (by virtue of not creating any new objects at all) verifiably incapable of creating bogus linkfiles and hardlinks that might lead to errors and/or need to be cleaned up. I made it so that the old and new paths can coexist, chosen by translator option, so users can choose which behavior they'll get if they consider either broken. PATCH: http://patches.gluster.com/patch/7241 in master (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached gfid for a path.) PATCH: http://patches.gluster.com/patch/7321 in release-3.1 (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached) PATCH: http://patches.gluster.com/patch/7262 in release-3.2 (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached gfid for a path.) (In reply to comment #7) > PATCH: http://patches.gluster.com/patch/7262 in release-3.2 > (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as > cached gfid for a path.) Thanks! Is this going to be part of next 3.2.1? Verified that with git-current (including Raghavendra G's commit 411aa2902d304495a4a374a09b767e588b330e88) the problem no longer occurs on my systems. tested with 3.2.1.qa2 with qa-rename scripts. |
I was trying to reproduce #2921 and I'm not sure if I succeeded, but I certainly did find something that seems to be in the same general vicinity. I'll log it separately just in case it is separate, but if you decide they're related then we can treat one as a dup. What I did, using my 3.2git build from last Friday, was set up a simple three-way distribute and create a directory with 100 files in it. From one node (not a server) I did: for x in $(seq 0 99); do for f in file??; do sed -i 's/foo/bar/' $f done echo $x done Simultaneously, from another node (this one was one of the servers) I did: for i in $(seq 0 99); do mount -t glusterfs localhost:test20 /mnt/test ls -alR /mnt/test/fu > /dev/null umount /mnt/test echo $i done Within a few seconds, I'd start seeing ESTALE errors on the second node. Once we get into this state, it's persistent. After a mount, I get ESTALE the first time and then ENOENT thereafter, for a consistent set of files. This remains true even through client remounts and server restarts. Looking at the files, I see that there's one real file and one linkfile in each case (as I would expect based on "sed -i" using rename). On the real file, I see the following xattrs: trusted.gfid=0x067a402005c14507a0fbef3ac4b997b3 On the linkfile: trusted.gfid=0x7868c6df737e48469db5d7ec77e59f30 trusted.glusterfs.dht.linkto="test20-client-2 Now for the real fun part. If I disable stat-prefetch the problem *does not happen*. I went back and forth three times to be sure. I don't fully understand the nature of the race here, but it seems pretty easy to hit with this method. Let me know if there's anything else I can do to help debug.