Bug 764659 (GLUSTER-2927) - Mismatched link/target gfid and ESTALE/ENOENT
Summary: Mismatched link/target gfid and ESTALE/ENOENT
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-2927
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-23 19:47 UTC by Jeff Darcy
Modified: 2013-12-09 01:24 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
gfid.patch to xlators/protocol/client (835 bytes, patch)
2011-05-24 11:51 UTC, Jeff Darcy
no flags Details | Diff
rename.patch to xlators/cluster/dht (8.92 KB, patch)
2011-05-24 11:51 UTC, Jeff Darcy
no flags Details | Diff

Description Jeff Darcy 2011-05-23 19:47:11 UTC
I was trying to reproduce #2921 and I'm not sure if I succeeded, but I certainly did find something that seems to be in the same general vicinity.  I'll log it separately just in case it is separate, but if you decide they're related then we can treat one as a dup.

What I did, using my 3.2git build from last Friday, was set up a simple three-way distribute and create a directory with 100 files in it.  From one node (not a server) I did:

   for x in $(seq 0 99); do
      for f in file??; do
         sed -i 's/foo/bar/' $f
      done
      echo $x
   done

Simultaneously, from another node (this one was one of the servers) I did:

   for i in $(seq 0 99); do
      mount -t glusterfs localhost:test20 /mnt/test
      ls -alR /mnt/test/fu > /dev/null
      umount /mnt/test
      echo $i
   done

Within a few seconds, I'd start seeing ESTALE errors on the second node.  Once we get into this state, it's persistent.  After a mount, I get ESTALE the first time and then ENOENT thereafter, for a consistent set of files.  This remains true even through client remounts and server restarts.  Looking at the files, I see that there's one real file and one linkfile in each case (as I would expect based on "sed -i" using rename).  On the real file, I see the following xattrs:

   trusted.gfid=0x067a402005c14507a0fbef3ac4b997b3

On the linkfile:

   trusted.gfid=0x7868c6df737e48469db5d7ec77e59f30
   trusted.glusterfs.dht.linkto="test20-client-2

Now for the real fun part.  If I disable stat-prefetch the problem *does not happen*.  I went back and forth three times to be sure.  I don't fully understand the nature of the race here, but it seems pretty easy to hit with this method.  Let me know if there's anything else I can do to help debug.

Comment 1 shishir gowda 2011-05-24 09:01:27 UTC
Hi Jeff,

This seems to be a dup of bug 764254 related to dht-rename.

We also are aware of the bug in io-stats xlator, and have disabling stat-prefetch is a work around for time being.

The mismatching gfid is the bug, which causes invalid args.

*** This bug has been marked as a duplicate of bug 2522 ***

Comment 2 Jeff Darcy 2011-05-24 11:51:25 UTC
Created attachment 498 [details]
The Kickstart configuration file being used (made by mkkickstart)

Comment 3 Jeff Darcy 2011-05-24 11:51:56 UTC
Created attachment 499

Comment 4 Jeff Darcy 2011-05-24 11:52:53 UTC
It's highly unfortunate that I (and other interested parties) can't view bug764254, so it's impossible to know what progress is being made toward a fix.  During my own investigation, I discovered two things which led to the two attached patches.

(1) The client translator returns a spurious ESTALE when it detects that the GFID has changed.  As explained in the comment to the attached gfid.patch, the value we get back is authoritative and if the value we have cached in the inode differs then it should simply be updated.  Running the test procedure I've described with this patch applies generates only the transient errors I'd expect while renames are in progress, avoiding the persistent ESTALE/ENOENT errors that are the subject of this bug.  Note that the patch doesn't address the issue of the inode number also changing in this scenario, which could be the source of other problems.

(2) The DHT rename code is, to put it delicately, strange.  There seems to be little justification for the creation of extra linkfiles and hardlinks as part of a rename.  In several places the code that decides which subvolume should receive a particular request seems quite wrong, and even inconsistent with other places which should be making related decisions the same way.  The attached rename.patch provides an alternate rename path which seems much more likely to yield correct results in all of the weird src/dst/hashed/cached cases, in addition to being noticeably more efficient and (by virtue of not creating any new objects at all) verifiably incapable of creating bogus linkfiles and hardlinks that might lead to errors and/or need to be cleaned up.  I made it so that the old and new paths can coexist, chosen by translator option, so users can choose which behavior they'll get if they consider either broken.

Comment 5 Anand Avati 2011-05-31 09:11:01 UTC
PATCH: http://patches.gluster.com/patch/7241 in master (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached gfid for a path.)

Comment 6 Anand Avati 2011-05-31 13:11:33 UTC
PATCH: http://patches.gluster.com/patch/7321 in release-3.1 (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached)

Comment 7 Anand Avati 2011-05-31 13:12:54 UTC
PATCH: http://patches.gluster.com/patch/7262 in release-3.2 (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as cached gfid for a path.)

Comment 8 mohitanchlia 2011-05-31 14:03:17 UTC
(In reply to comment #7)
> PATCH: http://patches.gluster.com/patch/7262 in release-3.2
> (performance/stat-prefetch: return ESTALE if inode's gfid is not the same as
> cached gfid for a path.)

Thanks! Is this going to be part of next 3.2.1?

Comment 9 Jeff Darcy 2011-05-31 15:24:16 UTC
Verified that with git-current (including Raghavendra G's commit 411aa2902d304495a4a374a09b767e588b330e88) the problem no longer occurs on my systems.

Comment 10 Lakshmipathi G 2011-06-03 06:15:45 UTC
tested with 3.2.1.qa2 with qa-rename scripts.


Note You need to log in before you can comment on or make changes to this bug.