Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 764659 (GLUSTER-2927)

Summary:

Mismatched link/target gfid and ESTALE/ENOENT

Product:

[Community] GlusterFS

Reporter:

Jeff Darcy <jdarcy>

Component:

distribute

Assignee:

shishir gowda <sgowda>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

medium

Docs Contact:

Priority:

medium

Version:

mainline

CC:

gluster-bugs, joe, lakshmipathi, mohitanchlia, nsathyan

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
gfid.patch to xlators/protocol/client	none
rename.patch to xlators/cluster/dht	none

Description Jeff Darcy 2011-05-23 19:47:11 UTC

I was trying to reproduce #2921 and I'm not sure if I succeeded, but I certainly did find something that seems to be in the same general vicinity. I'll log it separately just in case it is separate, but if you decide they're related then we can treat one as a dup.

What I did, using my 3.2git build from last Friday, was set up a simple three-way distribute and create a directory with 100 files in it. From one node (not a server) I did:

for x in $(seq 0 99); do
for f in file??; do
sed -i 's/foo/bar/' $f
done
echo $x
done

Simultaneously, from another node (this one was one of the servers) I did:

for i in $(seq 0 99); do
mount -t glusterfs localhost:test20 /mnt/test
ls -alR /mnt/test/fu > /dev/null
umount /mnt/test
echo $i
done

Within a few seconds, I'd start seeing ESTALE errors on the second node. Once we get into this state, it's persistent. After a mount, I get ESTALE the first time and then ENOENT thereafter, for a consistent set of files. This remains true even through client remounts and server restarts. Looking at the files, I see that there's one real file and one linkfile in each case (as I would expect based on "sed -i" using rename). On the real file, I see the following xattrs:

trusted.gfid=0x067a402005c14507a0fbef3ac4b997b3

On the linkfile:

trusted.gfid=0x7868c6df737e48469db5d7ec77e59f30
trusted.glusterfs.dht.linkto="test20-client-2

Now for the real fun part. If I disable stat-prefetch the problem *does not happen*. I went back and forth three times to be sure. I don't fully understand the nature of the race here, but it seems pretty easy to hit with this method. Let me know if there's anything else I can do to help debug.

Comment 1 shishir gowda 2011-05-24 09:01:27 UTC

Hi Jeff,

This seems to be a dup of bug 764254 related to dht-rename.

We also are aware of the bug in io-stats xlator, and have disabling stat-prefetch is a work around for time being.

The mismatching gfid is the bug, which causes invalid args.

*** This bug has been marked as a duplicate of bug 2522 ***

Comment 2 Jeff Darcy 2011-05-24 11:51:25 UTC

Created attachment 498 [details]
The Kickstart configuration file being used (made by mkkickstart)

Comment 3 Jeff Darcy 2011-05-24 11:51:56 UTC

Created attachment 499

Comment 4 Jeff Darcy 2011-05-24 11:52:53 UTC

It's highly unfortunate that I (and other interested parties) can't view bug764254, so it's impossible to know what progress is being made toward a fix. During my own investigation, I discovered two things which led to the two attached patches.

(1) The client translator returns a spurious ESTALE when it detects that the GFID has changed. As explained in the comment to the attached gfid.patch, the value we get back is authoritative and if the value we have cached in the inode differs then it should simply be updated. Running the test procedure I've described with this patch applies generates only the transient errors I'd expect while renames are in progress, avoiding the persistent ESTALE/ENOENT errors that are the subject of this bug. Note that the patch doesn't address the issue of the inode number also changing in this scenario, which could be the source of other problems.

(2) The DHT rename code is, to put it delicately, strange. There seems to be little justification for the creation of extra linkfiles and hardlinks as part of a rename. In several places the code that decides which subvolume should receive a particular request seems quite wrong, and even inconsistent with other places which should be making related decisions the same way. The attached rename.patch provides an alternate rename path which seems much more likely to yield correct results in all of the weird src/dst/hashed/cached cases, in addition to being noticeably more efficient and (by virtue of not creating any new objects at all) verifiably incapable of creating bogus linkfiles and hardlinks that might lead to errors and/or need to be cleaned up. I made it so that the old and new paths can coexist, chosen by translator option, so users can choose which behavior they'll get if they consider either broken.

Comment 5 Anand Avati 2011-05-31 09:11:01 UTC

PATCH: http://patches.gluster.com/patch/7241 in master (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached gfid for a path.)

Comment 6 Anand Avati 2011-05-31 13:11:33 UTC

PATCH: http://patches.gluster.com/patch/7321 in release-3.1 (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached)

Comment 7 Anand Avati 2011-05-31 13:12:54 UTC

PATCH: http://patches.gluster.com/patch/7262 in release-3.2 (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached gfid for a path.)

Comment 8 mohitanchlia 2011-05-31 14:03:17 UTC

(In reply to comment #7)
> PATCH: http://patches.gluster.com/patch/7262 in release-3.2
> (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as
> cached gfid for a path.)

Thanks! Is this going to be part of next 3.2.1?

Comment 9 Jeff Darcy 2011-05-31 15:24:16 UTC

Verified that with git-current (including Raghavendra G's commit 411aa2902d304495a4a374a09b767e588b330e88) the problem no longer occurs on my systems.

Comment 10 Lakshmipathi G 2011-06-03 06:15:45 UTC

tested with 3.2.1.qa2 with qa-rename scripts.