Bug 765242 - (GLUSTER-3510) FH hash conflicts for dirents in the same directory result in ESTALEs
FH hash conflicts for dirents in the same directory result in ESTALEs
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
mainline
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Shehjar Tikoo
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-05 05:37 EDT by Shehjar Tikoo
Modified: 2015-12-01 11:45 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: RTP
Mount Type: nfs
Documentation: DNR
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Shehjar Tikoo 2011-09-05 02:42:05 EDT
(In reply to comment #0)
> Intro Note
> ==========
> The current code in hard fh resolution takes the first-match approach, i.e.
> which ever dirent either matches the hash or matches the gfid is the one chosen
> as the result for the next step of fh resolution. In the latter case, we the
> next step is to conclude the fh resolution by returning the entry whose gfid
> matched. In the former, we choose the hash-matching dirent as the next
> directory to descend into, for searching the file to be operated upon.
> 
> 
> Problem
> =======
> When performing hard fh resolution, there can be a situation where:
> 
> o the hash of the primary entry,i.e. the entry we're looking for and the hash
> of another sibling directory, match. Note the use of "sibling", meaning both
> the primary entry and the hash matching one are in the same directory, i.e.,
> their filehandle.hashcount will be same.
> 
> o the sibling directory is encountered first during the dir search.
> 
> Because of the current code described in "Intro", we'll end up descending into
> the sibling directory even though the correct behaviour is to ignore this and
> wait till we encounter the primary entry in the same parent directory.
>

Once we end up descending into this sibling directory, the directory depth validation check fails. The check fails because it notices that the resolution is attempting to open a directory that is deeper in the fs tree than the file we're looking for. When this check fails, we return an ESTALE. So basically, a false-positive results in an estale to Specsfs.

> This is not a theoretical situation. Me and Avati saw this on specsfs test
> where sfs created terabytes-sized file system for its tests. The number of
> files was so huge in a single directory that the hashes of two entries ended up
> colliding.
Comment 1 Shehjar Tikoo 2011-09-05 05:37:50 EDT
Intro Note
==========
The current code in hard fh resolution takes the first-match approach, i.e. which ever dirent either matches the hash or matches the gfid is the one chosen as the result for the next step of fh resolution. In the latter case, we the next step is to conclude the fh resolution by returning the entry whose gfid matched. In the former, we choose the hash-matching dirent as the next directory to descend into, for searching the file to be operated upon.


Problem
=======
When performing hard fh resolution, there can be a situation where:

o the hash of the primary entry,i.e. the entry we're looking for and the hash of another sibling directory, match. Note the use of "sibling", meaning both the primary entry and the hash matching one are in the same directory, i.e., their filehandle.hashcount will be same.

o the sibling directory is encountered first during the dir search.

Because of the current code described in "Intro", we'll end up descending into the sibling directory even though the correct behaviour is to ignore this and wait till we encounter the primary entry in the same parent directory.

This is not a theoretical situation. Me and Avati saw this on specsfs test where sfs created terabytes-sized file system for its tests. The number of files was so huge in a single directory that the hashes of two entries ended up colliding.
Comment 2 Anand Avati 2011-09-07 23:53:47 EDT
CHANGE: http://review.gluster.com/357 (Intro Note) merged in master by Anand Avati (avati@gluster.com)
Comment 3 Anand Avati 2011-09-07 23:53:57 EDT
CHANGE: http://review.gluster.com/358 (Intro Note) merged in release-3.2 by Anand Avati (avati@gluster.com)
Comment 4 Anand Avati 2011-09-07 23:54:05 EDT
CHANGE: http://review.gluster.com/359 (Intro Note) merged in release-3.1 by Anand Avati (avati@gluster.com)
Comment 5 Amar Tumballi 2011-09-14 07:55:31 EDT
Patches committed. If the issue persists, re-open again.
Comment 6 Vijaykumar 2011-09-22 07:27:45 EDT
To reproduce, i tried to open some 100000 files at time, i couldn't hit the bug.  so i tried following steps -

- On mount point i created a dir,
- then at the backend, i copied some ten dir with the same dir using
   " cp -a " so that gfid of all the dir will be same
- then i unmounted the mount point, and killed the nfs server
- again started the server and mounted the same mount point.
- then in one of the directory which i created with same gfid, i copied /etc/passwd ,
 - then did tail -f passwd ,so that it will be open all the time,
- Later i killed server process and started it again.

- But i couldn't reproduce the bug, in 3.2.4qa2 and even in 3.2.0

- Now krishna is working on verifying this bug.
Comment 7 Krishna Srinivas 2011-09-22 08:17:09 EDT
Hey Guys,

I forced hash clash by the following code and I see that we still get ESTALE on the client.

diff --git a/xlators/nfs/server/src/nfs-fops.c b/xlators/nfs/server/src/nfs-fops.c
index 95a657a..2457fcd 100644
--- a/xlators/nfs/server/src/nfs-fops.c
+++ b/xlators/nfs/server/src/nfs-fops.c
@@ -235,10 +235,13 @@ nfs_gfid_dict (inode_t *inode)
         dict_t  *dictgfid = NULL;
         int     ret = -1;
         uuid_t  rootgfid = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1};
+        char *tmp = NULL;
 
         dyngfid = GF_CALLOC (1, sizeof (uuid_t), gf_common_mt_char);
         uuid_generate (newgfid);
-
+        tmp = &newgfid;
+        tmp[14]=1;
+        tmp[15]=2;
         if (uuid_compare (inode->gfid, rootgfid) == 0)
                 memcpy (dyngfid, rootgfid, sizeof (uuid_t));
         else
diff --git a/xlators/nfs/server/src/nfs3-fh.c b/xlators/nfs/server/src/nfs3-fh.c
index 9aea881..68bf46e 100644
--- a/xlators/nfs/server/src/nfs3-fh.c
+++ b/xlators/nfs/server/src/nfs3-fh.c
@@ -128,7 +128,8 @@ nfs3_fh_hash_entry (uuid_t gfid)
         nfs3_hash_entry_t       genls23b = 0;
 
         memcpy (&ino, &gfid[8], 8);
-        hash = ino;
+        hash = *(nfs3_hash_entry_t*)&gfid[14];
+        return hash;
         while (shiftsize != 0) {
                 hash ^= (ino >> shiftsize);
                 shiftsize -= 16;
Comment 8 Krishna Srinivas 2011-09-22 08:20:31 EDT
Here is the TRACE, note that it is trying to do hard resolution for /test.10/passwd file, but it fails after checking in /test.0


[2011-09-22 20:38:52.548033] T [nfs3-helpers.c:3089:nfs3_fh_resolve_root_lookup_cbk] 0-nfs-nfsv3: Root looked up: /
[2011-09-22 20:38:52.548075] T [nfs3-helpers.c:3017:nfs3_fh_resolve_inode] 0-nfs-nfsv3: FH needs inode resolution
[2011-09-22 20:38:52.548129] T [nfs3-helpers.c:2946:nfs3_fh_resolve_inode_hard] 0-nfs-nfsv3: FH hard resolution for: gfid 0xdbde1577-a93b-47ff-aeb6-6f7eb8c60102, hashcount: 2, current hashidx 1
[2011-09-22 20:38:52.548175] T [nfs3-helpers.c:2951:nfs3_fh_resolve_inode_hard] 0-nfs-nfsv3: Dir will be opened: /
[2011-09-22 20:38:52.548236] T [nfs-fops.c:519:nfs_fop_opendir] 0-nfs: Opendir: /
[2011-09-22 20:38:52.548342] T [nfs3-helpers.c:2652:nfs3_fh_resolve_opendir_cbk] 0-nfs-nfsv3: Reading directory: /
[2011-09-22 20:38:52.548383] T [nfs3-helpers.c:2664:nfs3_fh_resolve_opendir_cbk] 0-nfs-nfsv3: resolve new fd refed: 0x7f378df2802c, ref: 1
[2011-09-22 20:38:52.548423] T [nfs-fops.c:608:nfs_fop_readdirp] 0-nfs: readdir
[2011-09-22 20:38:52.548854] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.1, gfid: 62321139-de74-4fe9-8062-2165dbbd0102
[2011-09-22 20:38:52.548900] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.1: 513, hashidx: 1
[2011-09-22 20:38:52.548945] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.12, gfid: fda5e19e-0bb8-4a28-b588-6bf689dc0102
[2011-09-22 20:38:52.548983] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.12: 513, hashidx: 1
[2011-09-22 20:38:52.549026] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.7, gfid: cff37b2c-e9ae-40ef-9d63-bc4ec7e60102
[2011-09-22 20:38:52.549063] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.7: 513, hashidx: 1
[2011-09-22 20:38:52.549106] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.5, gfid: e685e437-0e5b-4920-a5af-486108640102
[2011-09-22 20:38:52.549152] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.5: 513, hashidx: 1
[2011-09-22 20:38:52.549196] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.16, gfid: ee52ba35-5532-4583-8176-41ad14e00102
[2011-09-22 20:38:52.549259] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.16: 513, hashidx: 1
[2011-09-22 20:38:52.549301] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.8, gfid: 10f20c28-e7d7-462a-bb48-62704cd80102
[2011-09-22 20:38:52.549337] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.8: 513, hashidx: 1
[2011-09-22 20:38:52.549378] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.19, gfid: 1fc2f5fd-ba74-4bea-9faa-71c0582a0102
[2011-09-22 20:38:52.549413] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.19: 513, hashidx: 1
[2011-09-22 20:38:52.549454] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: .., gfid: 00000000-0000-0000-0000-000000000001
[2011-09-22 20:38:52.549494] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.9, gfid: e62ea23a-17fe-4207-9720-19ef212e0102
[2011-09-22 20:38:52.549530] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.9: 513, hashidx: 1
[2011-09-22 20:38:52.549571] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.20, gfid: 8c0cf7a7-7a6e-4a17-b1c0-d8129dbc0102
[2011-09-22 20:38:52.549606] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.20: 513, hashidx: 1
[2011-09-22 20:38:52.549647] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.15, gfid: 98cc87bb-4e75-45d4-9114-d493a3e00102
[2011-09-22 20:38:52.549682] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.15: 513, hashidx: 1
[2011-09-22 20:38:52.549723] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.11, gfid: b5cb2954-e17d-4000-9f51-fb8ff1cb0102
[2011-09-22 20:38:52.549758] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.11: 513, hashidx: 1
[2011-09-22 20:38:52.549799] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.14, gfid: 44ea089a-c49a-4779-a782-08a315930102
[2011-09-22 20:38:52.549834] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.14: 513, hashidx: 1
[2011-09-22 20:38:52.549875] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.6, gfid: ab78d498-6893-4316-92cf-a83b21e00102
[2011-09-22 20:38:52.549910] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.6: 513, hashidx: 1
[2011-09-22 20:38:52.549951] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.2, gfid: e049e1f5-09b6-4e3d-92b0-56af74590102
[2011-09-22 20:38:52.549987] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.2: 513, hashidx: 1
[2011-09-22 20:38:52.550028] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.13, gfid: d936ccec-966f-4484-b4ea-0c2462730102
[2011-09-22 20:38:52.550063] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.13: 513, hashidx: 1
[2011-09-22 20:38:52.550104] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.3, gfid: 07ca6084-e2b1-466d-98b1-635c63590102
[2011-09-22 20:38:52.550140] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.3: 513, hashidx: 1
[2011-09-22 20:38:52.550181] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.18, gfid: 2ef5041c-ae19-4c13-ac63-7592df0b0102
[2011-09-22 20:38:52.550216] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.18: 513, hashidx: 1
[2011-09-22 20:38:52.550257] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.4, gfid: 0256856a-2b0b-4a8d-87b8-9db7011c0102
[2011-09-22 20:38:52.550301] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.4: 513, hashidx: 1
[2011-09-22 20:38:52.550356] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: ., gfid: 00000000-0000-0000-0000-000000000001
[2011-09-22 20:38:52.550397] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.10, gfid: fe330bfe-35ea-46c4-a967-4af40d860102
[2011-09-22 20:38:52.550433] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.10: 513, hashidx: 1
[2011-09-22 20:38:52.550473] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.17, gfid: 47706155-78a2-48d7-be3e-546a82ac0102
[2011-09-22 20:38:52.550509] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.17: 513, hashidx: 1
[2011-09-22 20:38:52.550550] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: test.0, gfid: 5ab981a0-3c81-46ae-a19d-e6c2d0480102
[2011-09-22 20:38:52.550585] T [nfs3-helpers.c:2470:nfs3_fh_resolve_check_entry] 0-nfs-nfsv3: Found hash match: test.0: 513, hashidx: 1
[2011-09-22 20:38:52.550622] T [nfs-fops.c:608:nfs_fop_readdirp] 0-nfs: readdir
[2011-09-22 20:38:52.550684] T [nfs3-helpers.c:2906:nfs3_fh_resolve_readdir_cbk] 0-nfs-nfsv3: Directory read done: /: Success
[2011-09-22 20:38:52.550722] T [nfs3-helpers.c:2841:nfs3_fh_resolve_check_response] 0-nfs-nfsv3: resolve fd unrefing: 0x7f378df2802c, ref: 1
[2011-09-22 20:38:52.550785] T [nfs3-helpers.c:2769:nfs3_fh_resolve_dir_hard] 0-nfs-nfsv3: FH hard dir resolution: gfid: 00000000-0000-0000-0000-000000000001, entry: test.0, next hashcount: 2
[2011-09-22 20:38:52.550812] T [posix-helpers.c:738:posix_janitor_thread_proc] 0-delposix: janitor: closing dir fd=0x1cf3a90
[2011-09-22 20:38:52.550840] T [nfs3-helpers.c:2780:nfs3_fh_resolve_dir_hard] 0-nfs-nfsv3: Dir needs lookup: /test.0
[2011-09-22 20:38:52.550935] T [nfs-fops.c:336:nfs_fop_lookup] 0-nfs: Lookup: /test.0
[2011-09-22 20:38:52.551085] T [nfs3-helpers.c:2693:nfs3_fh_resolve_dir_lookup_cbk] 0-nfs-nfsv3: Dir will be opened: /test.0
[2011-09-22 20:38:52.551133] T [nfs-fops.c:519:nfs_fop_opendir] 0-nfs: Opendir: /test.0
[2011-09-22 20:38:52.551190] T [nfs3-helpers.c:2652:nfs3_fh_resolve_opendir_cbk] 0-nfs-nfsv3: Reading directory: /test.0
[2011-09-22 20:38:52.551227] T [nfs3-helpers.c:2664:nfs3_fh_resolve_opendir_cbk] 0-nfs-nfsv3: resolve new fd refed: 0x7f378df2802c, ref: 1
[2011-09-22 20:38:52.551263] T [nfs-fops.c:608:nfs_fop_readdirp] 0-nfs: readdir
[2011-09-22 20:38:52.551375] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: .., gfid: 00000000-0000-0000-0000-000000000001
[2011-09-22 20:38:52.551425] T [nfs3-helpers.c:2878:nfs3_fh_resolve_search_dir] 0-nfs-nfsv3: Candidate: ., gfid: 5ab981a0-3c81-46ae-a19d-e6c2d0480102
[2011-09-22 20:38:52.551462] T [nfs-fops.c:608:nfs_fop_readdirp] 0-nfs: readdir
[2011-09-22 20:38:52.551522] T [nfs3-helpers.c:2906:nfs3_fh_resolve_readdir_cbk] 0-nfs-nfsv3: Directory read done: /test.0: Success
[2011-09-22 20:38:52.551562] T [nfs3-helpers.c:2841:nfs3_fh_resolve_check_response] 0-nfs-nfsv3: resolve fd unrefing: 0x7f378df2802c, ref: 1
[2011-09-22 20:38:52.551612] T [nfs3-helpers.c:2730:nfs3_fh_resolve_validate_dirdepth] 0-nfs-nfsv3: Hash index is beyond: idx 3, fh idx: 2
[2011-09-22 20:38:52.551620] T [posix-helpers.c:738:posix_janitor_thread_proc] 0-delposix: janitor: closing dir fd=0x1cf3bb0
[2011-09-22 20:38:52.551661] T [nfs3-helpers.c:2760:nfs3_fh_resolve_dir_hard] 0-nfs-nfsv3: Dir depth validation failed
[2011-09-22 20:38:52.551772] E [nfs3.c:735:nfs3_getattr_resume] 0-nfs-nfsv3: Unable to resolve FH: del : 00000000-0000-0000-0000-000000000000
[2011-09-22 20:38:52.551825] D [nfs3-helpers.c:2302:nfs3_log_common_res] 0-nfs-nfsv3: XID: 7874a15e, GETATTR: NFS: 70(Invalid file handle), POSIX: 14(Bad address)
[2011-09-22 20:38:52.551920] T [rpcsvc.c:1030:rpcsvc_submit_generic] 0-rpc-service: Tx message: 4
[2011-09-22 20:38:52.551980] T [rpcsvc.c:667:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 28, payload: 4, rpc hdr: 24
[2011-09-22 20:38:52.552164] T [rpcsvc.c:1067:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x2020909406x, Program: NFS3, ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server)
[2011-09-22 20:43:53.388855] D [socket.c:193:__socket_rwv] 0-socket.nfs-server: EOF from peer 127.0.0.1:939
[2011-09-22 20:43:53.388984] D [socket.c:1796:socket_event_handler] 0-transport: disconnecting now
[2011-09-22 20:43:53.389133] T [socket.c:2703:fini] 0-socket.nfs-server: transport 0x1cf3680 destroyed
Comment 9 Krishna Srinivas 2011-09-22 08:21:53 EDT
Shehjar, Also check for the comment on the patch http://review.gluster.com/#change,357
Comment 10 Vijay Bellur 2011-09-23 06:57:55 EDT
Re-opening this as more patches are needed.
Comment 11 Krishna Srinivas 2011-09-23 07:31:25 EDT
Another way to explain the bug .. when there is a situation where there are lot of subdirs/files in a directory, and when we do hard resolution on a FH there is a chance that we recurse into one of the subdirs because of hash match. The fix makes sure that all the dir contents are read to ensure that there is no entry match before doing a recursive search on subdir on which there was a hash match.
Comment 12 Shehjar Tikoo 2011-10-07 00:07:13 EDT
(In reply to comment #6)
> To reproduce, i tried to open some 100000 files at time, i couldn't hit the
> bug.  so i tried following steps -
> 
> - On mount point i created a dir,
> - then at the backend, i copied some ten dir with the same dir using
>    " cp -a " so that gfid of all the dir will be same
> - then i unmounted the mount point, and killed the nfs server
> - again started the server and mounted the same mount point.


Re-mounting makes this test pointless. After a remount all files and directories will be looked up fresh.

Leave this test for now. I'll try to come up with some test case. The problem is  reproducing this test case involves having the entries in a directory stored in the directory in a very peculiar order. It is impossible to force the kernel or the filesystem to store the entries on the disk in a user-defined order.

> - then in one of the directory which i created with same gfid, i copied
> /etc/passwd ,
>  - then did tail -f passwd ,so that it will be open all the time,
> - Later i killed server process and started it again.
> 
> - But i couldn't reproduce the bug, in 3.2.4qa2 and even in 3.2.0
> 
> - Now krishna is working on verifying this bug.
Comment 13 Krishna Srinivas 2011-10-07 00:11:38 EDT
> 
> Leave this test for now. I'll try to come up with some test case. The problem
> is  reproducing this test case involves having the entries in a directory
> stored in the directory in a very peculiar order. It is impossible to force the
> kernel or the filesystem to store the entries on the disk in a user-defined
> order.

Shehjar, Check the pasted diff in this bug log (on 2011-09-22 08:17:09 IST) that was able to force the situation where hash of gfid clashes.
Comment 14 Shehjar Tikoo 2011-10-07 00:43:17 EDT
The patch that fixes this problem assumes that there will only be a hash
conflict and hence stores only the last-hash-match-entry. In the patch
submitted by Kris, the problem is that multiple entries end up matching the
hash. Since only the last hash matched entry is stored, we end up descending
that entry but because that is a false positive, we cannot do anything but
return an estale when we realize that this was a false positive.

The reason we only account for a single hash match is because we can never know
which among the many hash-matched entries to choose from. The patch i sent
accounts only for the situation where the directory contains the entry we're
looking for as well as a matching-hash directory and that we run into the
matching-hash directory first.

This bug can be closed knowing that the hash-matching scheme runs into its
limits when there are multiple hash matches.

This will remain to be a problem in 3.1 and 3.2 branches while the master will
soon have a fix based on hard links maintenance in posix.

(In reply to comment #8)
> Here is the TRACE, note that it is trying to do hard resolution for
> /test.10/passwd file, but it fails after checking in /test.0

Note You need to log in before you can comment on or make changes to this bug.