Bug 765444 (GLUSTER-3712)

Summary: Spurious "Stale NFS file handle" when executing find -noleaf -print0 | xargs --null stat >/dev/null
Product: [Community] GlusterFS Reporter: Peter Linder <peter>
Component: nfsAssignee: Krishna Srinivas <krishna>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.4CC: aavati, gluster-bugs, rme, saurabh
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Peter Linder 2011-10-11 10:29:50 EDT
Replica volume (count 2) mounted using NFS with 100's of thousands of files on it, in various directories. Backend is ext4. Volume created fresh using version 3.2.4, no options changed except nfs enabled. 

After running said command for a few seconds, sometimes up to a minute, one or up to a few hundres lines will appear with "Stale NFS file handle" error messages. Stat-ing the file "manually" immediately afterwards sometimes results in the same problem, and sometimes works well. 

The find command moves on and after a little while a new chunk of Stale NFS file handles show up, and so on. It does not happen to the same file every single time, but it seems that some files (or all files in a directory) are "favoured".

FUSE client does not experience any problems. I tried setting the log level to "TRACE", but that produces around 1GB worth of logs before the first error messages show up (and I abort the find command within seconds). Please advise on what I can do to help provide adequate information. 

/Peter
Comment 1 Krishna Srinivas 2011-10-14 03:31:37 EDT
Peter, can you apply the following diff and see if it produces the logs put in the diff:


diff --git a/xlators/nfs/server/src/nfs3-helpers.c b/xlators/nfs/server/src/nfs3-helpers.c
index 5cdcbec..5a2ef83 100644
--- a/xlators/nfs/server/src/nfs3-helpers.c
+++ b/xlators/nfs/server/src/nfs3-helpers.c
@@ -2833,6 +2833,7 @@ nfs3_fh_resolve_check_response (nfs3_call_state_t *cs)
 
         case GF_NFS3_FHRESOLVE_DIRFOUND:
                 nfs3_fh_resolve_close_cwd (cs);
+                cs->dirfound = 0;
                 nfs3_fh_resolve_dir_hard (cs, cs->resolvedloc.inode->gfid,
                                           cs->hashmatch->d_name);
                 break;
@@ -2875,6 +2876,14 @@ nfs3_fh_resolve_search_dir (nfs3_call_state_t *cs, gf_dirent_t *entries)
                 if (ret == GF_NFS3_FHRESOLVE_FOUND)
                         cs->entrymatch = gf_dirent_for_name (candidate->d_name);
                 else if (ret == GF_NFS3_FHRESOLVE_DIRFOUND) {
+                        if (cs->dirfound) {
+                                gf_log (GF_NFS, GF_LOG_WARNING,
+                                        "dir hash clash in %s: %s and %s",
+                                        cs->resolvedloc.path,
+                                        cs->hashmatch->d_name,
+                                        candidate->d_name);
+                        }
+                        cs->dirfound = 1;
                         if (cs->hashmatch)
                                 gf_dirent_free (cs->hashmatch);
                         cs->hashmatch = gf_dirent_for_name (candidate->d_name);
diff --git a/xlators/nfs/server/src/nfs3.h b/xlators/nfs/server/src/nfs3.h
index d6f20eb..2cdf425 100644
--- a/xlators/nfs/server/src/nfs3.h
+++ b/xlators/nfs/server/src/nfs3.h
@@ -206,6 +206,7 @@ struct nfs3_local {
         gf_dirent_t             *hashmatch;
         gf_dirent_t             *entrymatch;
         off_t                   lastentryoffset;
+        int                     dirfound;
 };
 
 #define nfs3_is_revalidate_lookup(cst) ((cst)->lookuptype == GF_NFS3_REVALIDATE)
Comment 2 Peter Linder 2011-10-14 04:17:12 EDT
Sure, I will try. I installed the debian package before, so I must compile from source and reproduce the problem first, to make sure I have the same environment.
Comment 3 Peter Linder 2011-10-15 02:11:51 EDT
No joy I'm afraid. Recompiled with the below patch and was able to reproduce the problem, but the line did not show up. I did some filtering on nfs.log with log level TRACE and saw this (some paths anonymized). I do not know if its related. I find that the files that experience this problem is usually many directories down in the structure, often like 15 directories or so.

[2011-10-15 10:55:48.411114] D [rpcsvc.c:1373:nfs_rpcsvc_program_actor] 0-nfsrpc: Actor found: NFS3 - ACCESS
[2011-10-15 10:55:48.411127] D [nfs3-helpers.c:2166:nfs3_log_common_call] 0-nfs-nfsv3: XID: 2bff4d0e, ACCESS: args: FH: hashcount 8, exportid 6d2d7e6e-35fc-43aa-9c20-ba8d5900ac5e, gfid 2da2f139-8753-4f9f-aa8f-c092d3e4aa0f
[2011-10-15 10:55:48.411325] D [nfs3-helpers.c:2298:nfs3_log_common_res] 0-nfs-nfsv3: XID: 2bff4d0e, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2011-10-15 10:55:48.411534] D [rpcsvc.c:1956:nfs_rpcsvc_request_create] 0-nfsrpc: RPC XID: 2cff4d0e, Ver: 2, Program: 100003, ProgVers: 3, Proc: 1
[2011-10-15 10:55:48.424214] D [rpcsvc.c:1956:nfs_rpcsvc_request_create] 0-nfsrpc: RPC XID: 2dff4d0e, Ver: 2, Program: 100003, ProgVers: 3, Proc: 1
[2011-10-15 10:55:48.436225] D [afr-dir-read.c:129:afr_examine_dir_readdir_cbk] 0-home-replicate-0: [.......]/joomla/plugins/editors/tinymce/jscripts/tiny_mce/langs: no entries found in home-client-0
[2011-10-15 10:55:48.436301] D [afr-dir-read.c:129:afr_examine_dir_readdir_cbk] 0-home-replicate-0: [.......]joomla/plugins/editors/tinymce/jscripts/tiny_mce/langs: no entries found in home-client-1
[2011-10-15 10:55:48.465717] D [rpcsvc.c:1956:nfs_rpcsvc_request_create] 0-nfsrpc: RPC XID: 2eff4d0e, Ver: 2, Program: 100003, ProgVers: 3, Proc: 4
[2011-10-15 10:55:48.465741] D [rpcsvc.c:1373:nfs_rpcsvc_program_actor] 0-nfsrpc: Actor found: NFS3 - ACCESS
[2011-10-15 10:55:48.465753] D [nfs3-helpers.c:2166:nfs3_log_common_call] 0-nfs-nfsv3: XID: 2eff4d0e, ACCESS: args: FH: hashcount 9, exportid 6d2d7e6e-35fc-43aa-9c20-ba8d5900ac5e, gfid 66ce9d80-b024-467d-bce3-dd551c60cd4f
[2011-10-15 10:55:48.465963] D [nfs3-helpers.c:2298:nfs3_log_common_res] 0-nfs-nfsv3: XID: 2eff4d0e, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2011-10-15 10:55:48.466159] D [rpcsvc.c:1956:nfs_rpcsvc_request_create] 0-nfsrpc: RPC XID: 2fff4d0e, Ver: 2, Program: 100003, ProgVers: 3, Proc: 1
[2011-10-15 10:55:48.506857] D [rpcsvc.c:1956:nfs_rpcsvc_request_create] 0-nfsrpc: RPC XID: 30ff4d0e, Ver: 2, Program: 100003, ProgVers: 3, Proc: 4
[2011-10-15 10:55:48.506881] D [rpcsvc.c:1373:nfs_rpcsvc_program_actor] 0-nfsrpc: Actor found: NFS3 - ACCESS
[2011-10-15 10:55:48.506893] D [nfs3-helpers.c:2166:nfs3_log_common_call] 0-nfs-nfsv3: XID: 30ff4d0e, ACCESS: args: FH: hashcount 10, exportid 6d2d7e6e-35fc-43aa-9c20-ba8d5900ac5e, gfid 3501a091-9c0a-4aa7-9900-d13fad5b9783
[2011-10-15 10:55:48.507108] D [nfs3-helpers.c:2298:nfs3_log_common_res] 0-nfs-nfsv3: XID: 30ff4d0e, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2011-10-15 10:55:48.507301] D [rpcsvc.c:1956:nfs_rpcsvc_request_create] 0-nfsrpc: RPC XID: 31ff4d0e, Ver: 2, Program: 100003, ProgVers: 3, Proc: 4
[2011-10-15 10:55:48.507325] D [rpcsvc.c:1373:nfs_rpcsvc_program_actor] 0-nfsrpc: Actor found: NFS3 - ACCESS
[2011-10-15 10:55:48.507337] D [nfs3-helpers.c:2166:nfs3_log_common_call] 0-nfs-nfsv3: XID: 31ff4d0e, ACCESS: args: FH: hashcount 11, exportid 6d2d7e6e-35fc-43aa-9c20-ba8d5900ac5e, gfid 5fd3349b-cbe3-4981-8151-093f170ed41b
[2011-10-15 10:55:48.507543] D [nfs3-helpers.c:2298:nfs3_log_common_res] 0-nfs-nfsv3: XID: 31ff4d0e, ACCESS: NFS: 0(Call completed successfully.), POSIX: 7(Argument list too long)
[2011-10-15 10:55:48.507816] D [rpcsvc.c:1956:nfs_rpcsvc_request_create] 0-nfsrpc: RPC XID: 32ff4d0e, Ver: 2, Program: 100003, ProgVers: 3, Proc: 1
[2011-10-15 10:55:48.519827] D [afr-dir-read.c:129:afr_examine_dir_readdir_cbk] 0-home-replicate-0: [.......]/joomla/plugins/editors/tinymce/jscripts/tiny_mce/utils: no entries found in home-client-0
[2011-10-15 10:55:48.519904] D [afr-dir-read.c:129:afr_examine_dir_readdir_cbk] 0-home-replicate-0: [.......]/joomla/plugins/editors/tinymce/jscripts/tiny_mce/utils: no entries found in home-client-1


Like before, if I go a little further down into the directory tree and re-run the find command the problem sometimes goes away. 

Typical output from the find command looks like this:

stat: cannot stat `./plugins/directionality/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/directionality/index.html': Stale NFS file handle
stat: cannot stat `./plugins/directionality/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/save/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/save/index.html': Stale NFS file handle
stat: cannot stat `./plugins/save/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/table/row.htm': Stale NFS file handle
stat: cannot stat `./plugins/table/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/table/index.html': Stale NFS file handle
stat: cannot stat `./plugins/table/cell.htm': Stale NFS file handle
stat: cannot stat `./plugins/table/merge_cells.htm': Stale NFS file handle
stat: cannot stat `./plugins/table/table.htm': Stale NFS file handle
stat: cannot stat `./plugins/table/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/noneditable/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/noneditable/index.html': Stale NFS file handle
stat: cannot stat `./plugins/noneditable/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/insertdatetime/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/insertdatetime/index.html': Stale NFS file handle
stat: cannot stat `./plugins/insertdatetime/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/inlinepopups/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/inlinepopups/index.html': Stale NFS file handle
stat: cannot stat `./plugins/inlinepopups/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/searchreplace/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/searchreplace/index.html': Stale NFS file handle
stat: cannot stat `./plugins/searchreplace/replace.htm': Stale NFS file handle
stat: cannot stat `./plugins/searchreplace/search.htm': Stale NFS file handle
stat: cannot stat `./plugins/searchreplace/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/searchreplace/searchreplace.htm': Stale NFS file handle
stat: cannot stat `./plugins/layer/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/layer/index.html': Stale NFS file handle
stat: cannot stat `./plugins/layer/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/preview/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/preview/index.html': Stale NFS file handle
stat: cannot stat `./plugins/preview/example.html': Stale NFS file handle
stat: cannot stat `./plugins/preview/editor_plugin_src.js': Stale NFS file handle
stat: cannot stat `./plugins/flash/editor_plugin.js': Stale NFS file handle
stat: cannot stat `./plugins/flash/index.html': Stale NFS file handle
stat: cannot stat `./plugins/flash/flash.htm': Stale NFS file handle
stat: cannot stat `./plugins/flash/editor_plugin_src.js': Stale NFS file handle
Comment 4 Anand Avati 2011-10-15 10:32:01 EDT
Can you check if the directory under which you see this behaviour has mismatching gfids on the two servers? the way get the gfid of a directroy from the backend -

server1# getfattr -e hex -n trusted.gfid /export/dir/plugins
...
server2# getfattr -e hex -n trusted.gfid /export/dir/plugins

and compare the two.

Thanks,
Avati
Comment 5 Peter Linder 2011-10-15 11:07:01 EDT
They are all the same. First i found a file that gave the error message, and ran getfattr on it on both bricks. I then erased first the file from the command line (so checking the directory its in) and then also for every directory in the path. 

If re-run the find command so that it passes a directory that is prone to this thing, but start much "closer" by cd'ing about half-way into it or so, then I can't trigger the problem. It seems to require a certain path length as far as I can tell, or perhaps stating enough files.
Comment 6 Peter Linder 2011-10-15 11:26:20 EDT
If I re-run the command several times, the problem seems to go away for a while. The nfs clients probably cached it or something. This may explain why I don't see the problem when I move in to inspect a directory more closely.
Comment 7 Peter Linder 2011-10-17 12:50:03 EDT
Silly me, looking for the log file in the wrong place :(

the "hash clash" messages does show up now. But they do not mention the files with stale nfs file handle messages (or perhaps they do, but those would be a minority of files). 

The nfs log file from a "find" lasting a few hours contain 8 such lines, out of a few thousand stale nfs file handle messages (that mention other files). So it would seem it's something else?

It could be worth saying that my servers are somewhat larger than perhaps what is common, each having 144GB ram. Could I be hitting some corner case based on that?
Comment 8 Anand Avati 2011-10-17 19:35:51 EDT
(In reply to comment #7)
> Silly me, looking for the log file in the wrong place :(
> 
> the "hash clash" messages does show up now. But they do not mention the files
> with stale nfs file handle messages (or perhaps they do, but those would be a
> minority of files). 
> 
> The nfs log file from a "find" lasting a few hours contain 8 such lines, out of
> a few thousand stale nfs file handle messages (that mention other files). So it
> would seem it's something else?

Not necessarily. If the hash collision happened on a directory you would get ESTALE errors for all the entries below it.

Avati
Comment 9 Peter Linder 2011-10-17 23:17:13 EDT
Most of the time when i get an ESTALE there is no corresponding hash clash line in the log file anywhere near the directory and time. 

If you wish, I can set things up so that you may log in and test things for yourself, this would probably save a lot of time :). If so, please send me your SSH public key and i'll set it up.
Comment 10 Krishna Srinivas 2011-10-17 23:28:01 EDT
(In reply to comment #9)
> Most of the time when i get an ESTALE there is no corresponding hash clash line
> in the log file anywhere near the directory and time. 
> 
> If you wish, I can set things up so that you may log in and test things for
> yourself, this would probably save a lot of time :). If so, please send me your
> SSH public key and i'll set it up.

Thanks!

Here is the public key:

------
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAl9GoKAT9SJ76LlAHQhgia69grx/UOs+eOeqSGEMYsUUAkBofGGa0HWFNgabpvgTxkIQporXyLI3g8AxlS+gMYtCvSEibivxnJXuwlupUmdh7CiIFG6ty8MJlmwR1FJ52B+cBVGOf97pO55z3XMl4qyBVewpg5an/nZl80Knny9CIpT3iVNQcKMy7oWv5CmjA0GxogIuGzDU0TikBppV/7ImZMlL+OaoZSmVDPHvhkoU50flqjLY7UC+xufIyVLq0rZoReNncVvrTTsw31U98+Pv5IRbgmUNsKztrIpov22sVeNwysvt7CeXwLhCMidQiRjXewauPCeBnx54wPURmbQ== krishna@krishna-laptop
------
Comment 11 Peter Linder 2011-10-18 00:40:50 EDT
Good, you should now be able to ssh as root to 94.254.1.164 (please only log in from one location, and please provide a rudimentary report on modifications that you do, for security reasons. I hope you understand :))

You will find glusterfs started, and a volume mounted using nfs on /home_tmp. Log files on the server you end up on are in /usr/local/var/log/glusterfs or something like that. 

inside /home_tmp, execute "find -noleaf -print0 | xargs --null stat >/dev/null". You should not have to wait long :)

This is not a production system, so feel free to umount things and change options as you need.

Perhaps making this bug private would be a good thing and then this message can be removed when made public again?
Comment 12 Peter Linder 2011-10-19 07:10:25 EDT
I will have to turn the servers off for a while tonight (CEST), I will turn them on again tomorrow morning. I will issue /etc/init.d/glusterfs start on all servers involved, so that you may resume testing then.
Comment 13 Peter Linder 2011-10-20 13:23:47 EDT
Do you need additional access? I had too much work to do to remember turning the servers on this morning, unfortunately. I will turn them back on right away if you need to check something :)
Comment 14 Krishna Srinivas 2011-10-21 04:07:47 EDT
(In reply to comment #13)
> Do you need additional access? I had too much work to do to remember turning
> the servers on this morning, unfortunately. I will turn them back on right away
> if you need to check something :)

Can you turn it on again please?
Comment 15 Peter Linder 2011-10-21 04:40:06 EDT
They have been started. Glusterfs is started on the second brick (that you may ssh to with the public key of the regdoll brick, if you require something from there), but not on the "ragdoll" brick. I was thinknig you probably want to start it yourself. 

Nothing mounted either, mount away as you require :)
Comment 16 Peter Linder 2011-10-24 12:59:02 EDT
I must unfortunately turn the servers off at night, but I turn them on again in the mornings. 

They will be moved to their final destinations in the end of the week hopefully. They will then receive a new ip-addresses (I will email it to you), and be running all the time for you to test :)
Comment 17 Krishna Srinivas 2011-10-30 23:15:14 EDT
Peter, There is a limitation on the directory depth when NFS is used.
This is because we have to encode the directory path in the file handle used by
NFS client to communicate with the NFS server. The file handle can be 64 bytes
max (limitation by RFC). We are bringing in some changes to overcome this
limitation and the directory depth issue will be fixed in future releases.

When you run NFS server in TRACE mode you can grep for "Dir depth validation
failed" which results in ESTALE ("Stale NFS file handle") errors on NFS client.
Comment 18 Peter Linder 2011-10-31 00:03:18 EDT
Ah, thank you, that is good news, in a way :)

Do you have any idea when it might be included in a release? Could I perhaps sponsor you/gluster, if its not too much at least, to help it along?
Comment 19 Peter Linder 2011-11-02 05:24:38 EDT
"Dir depth validation failed" does show up in fairly large numbers (and mem-factor 40 has no effect, as you predicted :)).  Do you need to do more testing? If not, I will reinstall these machines and start deploying them (using the native fuse client) for now, as everything else is working fine.
Comment 20 Krishna Srinivas 2011-11-02 05:58:30 EDT
Thanks for the access Peter, I won't need the access now. You can reinstall and use native fuse client.
Comment 21 Saurabh 2011-11-03 01:13:01 EDT
I have reproduced this in-house,

stat: cannot stat `./d1/dd1/dd1-d1/dd1-d2/dd1-d3/dd1-d4/dd1-d5/dd1-d6/dd1-d7/dd1-d8/dd1-d9/dd1-d10/dd1-d11/dd1-d12/dd1-d13/dd1-d14/dd1-d15/dd1-d16/dd1-d17/dd1-d18/3880': Stale NFS file handle
stat: cannot stat `./d1/dd1/dd1-d1/dd1-d2/dd1-d3/dd1-d4/dd1-d5/dd1-d6/dd1-d7/dd1-d8/dd1-d9/dd1-d10/dd1-d11/dd1-d12/dd1-d13/dd1-d14/dd1-d15/dd1-d16/dd1-d17/dd1-d18/941': Stale NFS file handle
stat: cannot stat `./d1/dd1/dd1-d1/dd1-d2/dd1-d3/dd1-d4/dd1-d5/dd1-d6/dd1-d7/dd1-d8/dd1-d9/dd1-d10/dd1-d11/dd1-d12/dd1-d13/dd1-d14/dd1-d15/dd1-d16/dd1-d17/dd1-d18/768': Stale NFS file handle
stat: cannot stat `./d1/dd1/dd1-d1/dd1-d2/dd1-d3/dd1-d4/dd1-d5/dd1-d6/dd1-d7/dd1-d8/dd1-d9/dd1-d10/dd1-d11/dd1-d12/dd1-d13/dd1-d14/dd1-d15/dd1-d16/dd1-d17/dd1-d18/3992': Stale NFS file handle
stat: cannot stat `./d1/dd1/dd1-d1/dd1-d2/dd1-d3/dd1-d4/dd1-d5/dd1-d6/dd1-d7/dd1-d8/dd1-d9/dd1-d10/dd1-d11/dd1-d12/dd1-d13/dd1-d14/dd1-d15/dd1-d16/dd1-d17/dd1-d18/1950': Stale NFS file handle
^C
Comment 22 Peter Linder 2011-11-03 03:34:40 EDT
I want to thank all of you for your help in tracking this down. Please tell me if there is anything else I can do to help. Looking forward to a future version then, hopefully 3.3 that i'm eager to try for vm storage also :)

Thanks also for making and supporting your excellent product.
Comment 23 Peter Linder 2011-11-16 11:34:41 EST
Is there a patch I can try/use? Turns out I could really use the performance for small files and using NFS would also make it much easier for me to replace our current storage system :)