Description of problem: At some point since upgrading to Red Hat Enterprise 4.5 (from 4.4), we started experiencing a strange cache issue on two of our NFS clients -- namely, on our web server and our undergraduate time sharing server. What ends up happening is that a user makes a change to some files on the NFS client, and then at times, one of the other two nfs clients end up seeing the OLD copy of the file/directory. The file server copy is completely valid. The copy of the file/directory on our other NFS clients is completely valid. If you rename the file or directory on the file server, the bad NFS client now sees the "new" copy, but if you rename it back, the client sees the old copy. The new copy on these clients is in essence hidden because the cache on the NFS client refuses to update. In at least one instance, when this problem occurred on a directory, recopying the entire directory ABOVE the problem directory fixes the problem. Rebooting of the client obviously fixes the problem as well. All of our other machines are also running an identical copy of RHEL 4.5 and we don't seem to be experiencing this problem elsewhere. Whether the problem is load related, we do not know. RHEL 4.5 came with kernel 2.6.9-55.EL. Since then, we have upgraded to 2.6.9-55.0.2.EL. Unfortunately, we haven't been paid enough attention to the details of the system at each instance of the problem, or it might have been easier to isolate what kernel update has caused the issue. However, we do believe the problem started occurring some time after upgrading from CentOS 4.4 to 4.5 or after the kernel update. Until we made this change, we believe that the problem never occurred. When the problem occurs, our users end up getting flustered, and contacting our technical team. In total, the problem has occurred 5 times, 2 of which occurred yesterday. The NFS server is still running Linux kernel 2.4.32, but we don't believe this is an issue since the problem was not occurring before 4.5. The NFS clients are running: nfs-utils-1.0.6-80.EL4 nfs-utils-lib-1.0.6-8 (It's not clear whether this is a kernel issue, or one of nfsutils). How reproducible: We cannot determine how to reproduce the problem. It occurred twice yesterday though. We are very anxious to solve the problem. Please ask for any other necessary details.
A few items that I left out: The last two times the problem occurred: 1) A faculty member uploaded a NEW assignment file. (As a result, nobody would have even opened this file yet.) 2) A faculty member changed the contents of a directory, removing 2 tar files, and creating a couple of new directories. On the bad NFS client, you still see the tar files. It looks like NFS cache consistency is not always guaranteed as per this article: http://nfs.sourceforge.net/#faq_a8 ... but in these two particular cases, only one client was making changes. (In both cases, it is possible that students were accessing other files in the directory at the same time.) Our NFS mounts are as follows: server:/obj/dept /cs/dept nfs hard,rw,intr,bg,nfsvers=3,tcp 0 0 On the NFS server, this is exported like this: /obj/dept client.cs.yorku.ca(rw,sync) ....
There are several cache issues referred to on this page... http://www.linux-nfs.org/Linux-2.6.x/2.6.21/ ... that have to do with caching. One in particular ... http://www.linux-nfs.org/Linux-2.6.x/2.6.21/linux-2.6.21-021-fix_readdir_stale_cache.dif ... sounds like a definate possibility. The test case does produce the anticipated results. Might this be the problem?
> Might this be the problem? Possibly. There were a number of NFS cache-consistency patches that went into 4.5, and it's possible that one of them has introduced a regression. Without some idea of what's happening on these boxes when this occurs, it's going to be very hard to track this down. My suggestion is to work on finding a way to reproduce this problem. > If you rename the file or directory on the file server, the bad NFS client now > sees the "new" copy, but if you rename it back, the client sees the old copy. That's just bizarre. NFS operates on filehandles, so I have to wonder if the filehandle is changing when the rename occurs (or something has gone very strange with the lookup cache).
We haven't been aware of the problem again since October 11. This isn't to say that it hasn't happened, but usually, if one of our uses experiences the problem, we're the first to know about it. Interestingly enough, maybe invoked my fear of your message, the problem happened again, and this time for me! I was working on a temporary server that I have allocated to phpBB3 testing. An Apache2 web server was running on the box. I decided to completely re-install phpBB3. On MY host, I changed into the directory containing phpBB (a directory that is nfs shared out to all clients), just did: mv forum forum.orig On all hosts, forum.orig appeared fine. I then re-uncompressed/untarred phpBB3 on MY host, and renamed "phpBB3" to "forum". Now, on all hosts EXCEPT for the test host: forum was inode 20432204 and forum.orig was 16860794. On the test host, both the inode for forum and forum.orig were 16860794!! Just for the heck of it, I DELETED the directory "forum" on "phpbb". Of course deleting this copy shouldn't really be a problem. I got an error that that the directory cannot be deleted because it is not empty. At this point, now doing an ls -i revealed that "forum" was now pointing to inode 20432204! I have tried to repeat this again and again, but it's not happening. This must have something to do with something being cached/open in apache, and then moving it underneath. There is no other explanation since I did all the work on MY host, yet the problem only occurred on the host that was running the web server.
Which host was seeing the problem? The test host? If inode numbers are changing when you rename the file then I tend to think that something is very odd on the server. You may want to do some network captures of file renames and have a look at them with wireshark. If the inode numbers are changing when you rename then that may help show it. You mentioned that the server is running kernel 2.4.32. What architecture is it (x86, x86_64, etc), and what is the underlying filesystem that's being exported here (ext3, xfs, reiser?).
The host seeing the problem is the test host, not the host where I was actually doing the rename... Just to be clear... On my host, I renamed forum to forum.orig. forum.orig maintained the original inode number on all hosts INCLUDING the test host. The problem, which sounds like an nfs client cache isssue, is that when I uncompresssed/untarred the file again (creating phpBB3 directory) and renamed phpBB3 "forum" on my BOX, all the other boxes saw the "real" "new" forum with NEW inode where-as the "test box" saw the "old" copy of forum (inode of now forum.orig). The test box could see any other directory that I created, but it was "stuck" on the forum directory itself. In all cases, (including previous cases), the copy of the directory on the server is fine. The x86 2.4.32 server "could" be the problem, but I highly doubt it since none of our Solaris boxes have ever experienced this problem, and none of our machines experienced this before the upgrade to 4.5. The filesystem is ext3.
We have seen the exact same issue using AS4.5 and nfs. Our NFS server is a NetApp Filer and the issue only seems to be coming from 4.5 clients. Here is some associated version info from an affected client: rpm -qa nfs*; uname -r nfs-utils-1.0.6-80.EL4 nfs-utils-lib-1.0.6-8 2.6.9-55.ELsmp We change a symlink to point from 1 file to another and on the affected clients, sometimes it shows the symlink pointing to files it hasn't pointed to for several weeks. This is a serious regression
Given that this seems to occur only sporadically it sounds like a race of some sort. We're going to have to learn something about the conditions under which it occurs. Since we don't have a way to reproduce this (unless one of you do), what might be best for the time being is to collect some nfs_debug info when this problem occurs. When you find a host that is behaving this way: # echo 32767 > /proc/sys/rpc/nfs_debug ...then try to access the inode that's having problems: # stat /problem/file/or/directory ...after that, turn of debugging: # echo 0 > /proc/sys/rpc/nfs_debug ...and capture dmesg output and attach it here. Also, get the output from the stat command, and also stat the directory above it. The idea is that we want to capture what the client is doing at that time and perhaps why it's not revalidating the inode correctly. It would also be helpful to try touching the parent directory of the file you're having trouble with and see if that causes it revalidate the entry. If there's a simple way to force a revalidation of the file then that might also tell us something about why it's not being revalidated to begin with. Finally, if you have a support contract, please open a case with our support organization so that this can be prioritized as a customer impacting issue. They may also be able to help you track down a reproducer (or gather more info about when and why this occurs).
echo 32767 > /proc/sys/sunrpc/nfs_debug; stat /ptx/SYS/conf/software.conf > stat.txt; echo 0 > /proc/sys/sunrpc/nfs_debug; dmesg -c > dmesg.txt ... cat stat.txt; echo ------------------------------; cat dmesg.txt File: `/ptx/SYS/conf/software.conf' -> `/software/scripts/conf/Persephone/software.conf-rhea-2.0' Size: 56 Blocks: 0 IO Block: 32768 symbolic link Device: 1fh/31d Inode: 3115833 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 500/ ptx) Gid: ( 500/ ptx) Access: 2007-10-30 07:46:26.302902000 -0700 Modify: 2007-08-01 23:20:45.743523000 -0700 Change: 2007-10-16 22:48:11.198072000 -0700 ------------------------------ NFS call access NFS: nfs_update_inode(0:1f/104 ct=1 info=0x6) NFS reply access, status = 0 NFS: revalidating (0:1f/2670994) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:1f/2670994 ct=1 info=0x6) NFS: (0:1f/2670994) revalidation complete NFS call access NFS reply access, status = 0 NFS: revalidating (0:1f/3115833) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:1f/3115833 ct=2 info=0x6) NFS: (0:1f/3115833) revalidation complete NFS: dentry_delete(conf/software.conf, 8) NFS: dentry_delete(conf/software.conf, 8) The symlink should be pointing to: /software/scripts/conf/Persephone/software.conf-shoulao-1.0. mount -o,remount /nfs/share is a quick hack to refresh the nfs cache and clear out the problem.
Very good info. I forgot to ask for one other bit though... Could you run a stat against this symlink from a "working" host (or from the server itself) and post the output here? It would be good to know what (if any) attributes have changed. From what you've posted already, the client tries to revalidate the directory: NFS: revalidating (0:1f/3115833) It does a getattr call against the inode: NFS call getattr NFS reply getattr It then calls nfs_update_inode to interpret the results: NFS: nfs_update_inode(0:1f/3115833 ct=2 info=0x6) NFS: (0:1f/3115833) revalidation complete ...and looks like it probably decides that the cached copy is ok (hard to tell, the printk's aren't that comprehensive here). Though if this inode had changed (and the filehandle didn't change), then the mtime would almost certainly have changed, and that should have given us another printk. Also, what's the uptime on this client?
So on the broken client, the uptime is 34 days. This is when we deployed 4.5 and rebooted to use the new kernel. Since the server is a NetApp Filer (BSD Deriv.) stat doesn't work. From a client with no issues, here is the stat: stat /ptx/SYS/conf/software.conf File: `/ptx/SYS/conf/software.conf' -> `/software/scripts/conf/Persephone/software.conf-shoulao-1.0' Size: 59 Blocks: 0 IO Block: 32768 symbolic link Device: 1fh/31d Inode: 2973033 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 500/ ptx) Gid: ( 500/ ptx) Access: 2007-10-30 12:12:15.598210000 -0700 Modify: 2007-10-15 19:18:13.930958000 -0700 Change: 2007-10-16 22:48:11.241073000 -0700
Ok thanks. The main thing here is that the inode number has changed: 2973033 on the good client vs. 3115833 on the bad one ...and evidently, the old inode is still valid on the server (or you'd be getting an ESTALE back from it when the getattr call is done). So somehow the symlink was deleted and recreated but the client is not noticing that it needs to redo the lookup. I'll look through the code and see if I can determine how this might occur (and most importantly) see if I can come up with a way to reproduce this. One other thing that might be helpful -- if you've not already remounted the dir or rebooted the bad client, can you do a "stat" on the parent directory of the symlink from both the good and bad hosts? I'd like to see if their mtimes are identical, in particular.
Well we have this problem intermittently throughout a cluster so there are still a few affected hosts. We try to minimize downtime so just remounting a hot system works. Here is the basic workflow... system A and system B read the symlink that is changed on system C. Everything is mounting a /vol/sharename on filer 1. From system C, it is moved aside to ${name}.old and recreated pointing to a new configuration file. Sometimes, system A or system B doesn't recognize anything has changed. This might happen on 4 out of 50 systems. From a bad client File: `/ptx/SYS/conf' Size: 4096 Blocks: 8 IO Block: 32768 directory Device: 1fh/31d Inode: 2670994 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 500/ ptx) Gid: ( 500/ ptx) Access: 2007-10-30 07:46:26.295902000 -0700 Modify: 2007-10-16 22:48:11.241073000 -0700 Change: 2007-10-16 22:48:11.241073000 -0700 And a good client: File: `/ptx/SYS/conf' Size: 4096 Blocks: 8 IO Block: 32768 directory Device: 1fh/31d Inode: 2670994 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 500/ ptx) Gid: ( 500/ ptx) Access: 2007-10-30 07:46:26.295902000 -0700 Modify: 2007-10-16 22:48:11.241073000 -0700 Change: 2007-10-16 22:48:11.241073000 -0700 That looks pretty worthless to me.
> That looks pretty worthless to me. Not to me. It tells me that both clients have the directory attributes cached for both. So the question becomes: Why did one client decide to redo the lookup (and get a new filehandle) for the software.conf symlink and the other did not? I'll have a look over the code and see if I determine a situation where that might occur. When you open the support case for this, then please make note that you've also reported this via the BZ here.
From a previous comment that points to this link... http://www.linux-nfs.org/Linux-2.6.x/2.6.21/linux-2.6.21-021-fix_readdir_stale_cache.dif The tar command in this test script does indeed fail on affected clients with this error: tar: newdir/innerdir/innerfile: file changed as we read it If you remount the affected shares or try this script on an AS4.4 server, it works as expected. So the script at that link is how to reproduce the bug reliably. Does the patch in that link make any sense as to fixing this issue? Would it be worth my time to unroll an srpm for the 4.5 kernel, have the spec file apply that patch and reroll it? Surely I could just run that test case script and see if it fails or not. Signed-off-by: Neil Brown <neilb> Signed-off-by: Trond Myklebust <Trond.Myklebust> --- fs/nfs/dir.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index d971547..e59fd31 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -865,6 +865,10 @@ static int nfs_dentry_delete(struct dentry *dentry) static void nfs_dentry_iput(struct dentry *dentry, struct inode *inode) { nfs_inode_return_delegation(inode); + if (S_ISDIR(inode->i_mode)) + /* drop any readdir cache as it could easily be old */ + NFS_I(inode)->cache_validity |= NFS_INO_INVALID_DATA; + if (dentry->d_flags & DCACHE_NFSFS_RENAMED) { lock_kernel(); drop_nlink(inode);
Interesting that it doesn't work on 4.4... I had e-mailed Neil Brown about my particular problem, and he didn't think that his patch as above would solve it, but very interesting that the results under 4.4 and 4.5 are different ....
This was not an issue until we updated from 4.4 to 4.5.
4.5 included a rather large patch to fix a different issue with cache consistency (particularly when dealing with files that were mmap'ed). It's certainly possible that that patch introduced the bug that's mentioned in comment #2. For now I'd recommend opening a separate BZ for that since we don't know that that's related to this issue. In looking over Neil's description of that patch, I also don't think it's the same issue. In his case, it was a matter of keeping inode info from a READDIRPLUS cached for too long. In this one, I think the problem is related to this dentry not being flushed out properly when the parent directory changes. Moving the symlink should cause the mtime on the directory to change, and that should force a new lookup for the dentry. That seems to not be happening here for some reason. That said, I could be wrong. If the filehandle for the symlink is staying the same while the fileid is changing, then it's possible that Neil's patch may be what's needed. In that situation though, this printk should be firing: printk(KERN_ERR "%s: inode number mismatch\n" "expected (%s/0x%Lx), got (%s/0x%Lx)\n", __FUNCTION__, inode->i_sb->s_id, (long long)nfsi->fileid, inode->i_sb->s_id, (long long)fattr->fileid); ...so if you're not seeing any of those in the logs then I'm more skeptical of this being the problem. The patch in comment #2 looks sane enough, though. Testing it might not be a bad idea in the absence of a better understanding of this problem. At the very least it should rule it out as the issue you're seeing.
It might also be worthwhile testing some of the kernels on my people page: http://people.redhat.com/jlayton ...particularly if you have a way to reproduce this.
Created attachment 244831 [details] attempted reproducer Spent a bit of time trying to reproduce this today (with no luck so far). I was hoping that running this script which does something like what Jeff described would reproduce it. I was able to reproduce something that looked like it when mounting a server with an ext3 filesystem underlying it, but not when the filesystem was xfs. The difference there is that xfs has sub-second timestamps but ext3 does not. When the directory is on ext3 and is revalidated and then changed within the same second, the client doesn't have a way to know that its cache is invalid and will keep using the cached dir contents. xfs (and Netapp's OnTap) have timestamp granularity of 1ms or less. That's small enough to generally prevent races like this. Timestamp granularity might explain the problem seen by the original reporter (since he's using a linux server with ext3), but I don't think it can explain the situation where the server is a Netapp. One thing that might be helpful is to know whether you can "fix" this problem by touching the parent directory of the dentries showing the problem. If so, then a race of this sort might help explain things. So Jeff's case, when this problem occurs if you do this on a "working client" $ touch /ptx/SYS/conf ...does the problem on the "bad" client clear itself up?
touch the parent directory from a "working client" and stat the same directory on a "bad client" to make sure it picked up that the directory was touched. It did. Now on the "bad client" the symlink has not changed and is still pointing to the wrong place. Do you want me to open a new bug about this?
On the bug with the known reproducer: http://www.linux-nfs.org/Linux-2.6.x/2.6.21/linux-2.6.21-021-fix_readdir_stale_cache.dif ...yes, please do. Let's keep this bug to deal with these dentries that aren't properly getting new lookups.
I have to wonder if this patch may be relevant to this issue: http://linux.bkbits.net:8080/linux-2.6/?PAGE=gnupatch&REV=1.1966.3.7 ...though given that we don't understand this problem well, I can't offer any evidence that it is. If you have an environment where you can reproduce this readily, it might be worthwhile to test: 1) one of the test kernels on my people page: http://people.redhat.com/jlayton 2) the same kernel with a backported version of the patch in comment #2 3) patch with a backport of the patch in the link above ...though the most effective way to get this resolved is going to be to find a way to reproduce the problem reliably. With that I should be able to track down the cause and fix it.
I wonder if this problem I am having right now is the same as the original problem. It actually seems a little worse! I have been working on setting up phpbb and storing my instructions in a text file called "phpbb" in a directory called "php" in my home directory. On the file server, and on my local host, the file looks like this in the directory listing: 23331609 -rw------- 1 jas tech 4335 Oct 29 16:03 phpbb ... but on one host, that one file in the directory listing appears like this: ? ?--------- ? ? ? ? ? phpbb All the other files in the directory are displaying fine. Actually, here are the other files with inodes if it helps: 23330845 drwxr-xr-x 3 jas tech 4096 Nov 1 09:29 . 1278095 drwx--x--x 128 jas tech 12288 Nov 1 09:24 .. 23331617 -rw-r--r-- 1 jas tech 801 Oct 31 13:22 README 23331610 -rw-r--r-- 1 root tech 5361 Oct 30 09:52 auth_cse.php 23331740 -rw-r--r-- 1 jas tech 427 Oct 30 14:49 config.php 23331572 -rw-r--r-- 1 jas tech 4491 Oct 29 16:06 cse.php ? ?--------- ? ? ? ? ? phpbb 1278932 -rw-r--r-- 1 root tech 36100 Oct 29 16:05 phpbb.gz 23331141 -rw-r--r-- 1 jas tech 10023 Oct 31 13:31 phpbb.php 23331140 -rwxr-xr-x 1 jas tech 3102 Oct 31 13:31 phpbbsync 23331615 -rwxr-xr-x 1 jas tech 880 Oct 31 16:13 test 23331614 -rw-r--r-- 1 jas tech 862 Oct 31 16:07 test.php 23331587 -rw-r--r-- 1 jas tech 801 Oct 31 16:04 test.php~ 23331613 -rwxr-xr-x 1 jas tech 870 Oct 31 16:05 test~ 6292183 drwxr-xr-x 2 root tech 4096 Oct 29 16:08 tmp On the host that is having trouble: # echo 32767 > /proc/sys/sunrpc/nfs_debug # stat /cs/home/jas/php/phpbb stat: cannot stat `/cs/home/home/php/phpbb': No such file or directory # echo 0 > /proc/sys/sunrpc/nfs_debug dmesg output: NFS call access NFS: nfs_update_inode(0:13/2801958 ct=1 info=0x6) NFS reply access, status = 0 NFS call access NFS: nfs_update_inode(0:1b/23330845 ct=1 info=0x6) NFS reply access, status = 0 NFS: dentry_delete(php/phpbb, 8) NFS: revalidating (0:13/12714036) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:13/12714036 ct=1 info=0x6) NFS: (0:13/12714036) revalidation complete NFS: revalidating (0:13/12714036) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:13/12714036 ct=1 info=0x6) NFS: (0:13/12714036) revalidation complete NFS: revalidating (0:13/4980787) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:13/4980787 ct=1 info=0x6) NFS: (0:13/4980787) revalidation complete NFS: revalidating (0:13/4980787) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:13/4980787 ct=1 info=0x6) NFS: (0:13/4980787) revalidation complete Now I run a stat on the parent directory (from the bad host): # stat /cs/home/jas/php File: `/cs/home/jas/php' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 1bh/27d Inode: 23330845 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 1004/ jas) Gid: ( 1000/ tech) Access: 2007-11-01 09:12:24.000000000 -0400 Modify: 2007-11-01 09:10:03.000000000 -0400 Change: 2007-11-01 09:10:03.000000000 -0400 I touch the parent directory (from the bad host): % touch /cs/home/jas/php steel 352 % stat /cs/home/jas/php/phpbb stat: cannot stat `/cs/home/jas/php/phpbb': No such file or directory Now I stat the file from a working host: hop 301 % stat /cs/home/jas/php/phpbb File: `/cs/home/jas/php/phpbb' Size: 4335 Blocks: 16 IO Block: 8192 regular file Device: 1bh/27d Inode: 23331609 Links: 1 Access: (0600/-rw-------) Uid: ( 1004/ jas) Gid: ( 1000/ tech) Access: 2007-11-01 09:11:01.000000000 -0400 Modify: 2007-10-29 16:03:28.000000000 -0400 Change: 2007-10-29 16:03:28.000000000 -0400 I stat the parent directory from a working host: hop 302 % stat /cs/home/jas/php File: `/cs/home/jas/php' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 1bh/27d Inode: 23330845 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 1004/ jas) Gid: ( 1000/ tech) Access: 2007-11-01 09:29:21.000000000 -0400 Modify: 2007-11-01 09:29:08.000000000 -0400 Change: 2007-11-01 09:29:08.000000000 -0400 I touch the file on the good client: hop 312 % touch phpbb I try to stat the file on the bad client: steel 356 % stat /cs/home/jas/php/phpbb stat: cannot stat `/cs/home/jas/php/phpbb': No such file or directory uptime on the bad client is: 09:31:37 up 99 days, 23:54 I will keep the system in its current state so that you can ask for other details.
The output with the ???'s for that dirent indicates that we have a stale dentry in this directory that's not being flushed out correctly. This bug may be related: https://bugzilla.redhat.com/show_bug.cgi?id=228801 The patch to fix that should be in the 4.6 kernels. The kernels on my people page also have them if you wish to test those. Also, with a long uptime like this I suspect that the problem could also be an issue with jiffies wraparound (bug 247905, which is unfortunately marked private). Since touching the dir doesn't clear it up, then that may very well be the case. bug 247905 is on the 4.7 proposed list. If I get some time later today, I may pull some of these patches into my test kernels.
Do you have netdump or diskdump configured for this client? If you can collect a vmcore from it then I might be able to get a bit more info about why this dentry isn't getting flushed out.
Unfortunately, we do not have netdump/diskdump configured, and I couldn't configure it on any one system without rebooting since I would need to add another disk which would make the problem go away. I'm also not sure re: company policy on submission of a core (which could contain other confidential info).. :( Is there anything else I can do? If the last bug that I found is different, I will wait until I find the first bug again, and I will provide more details...
Oh you'll love this one... I "created" /cs/home/jas/php/phpbb on the bad host and inserted "hello world<enter>" at the top. The original file showing on the other hosts started like this: phpbb setup on webapp<enter> Now, the file everywhere (including the bad host) sees: hello world<enter> on webapp<enter> ... and the rest of the file!
WRT company policy on sending cores. We have a confidential FTP server for receiving them. Our support folks can fill you in on the details if you open a case with them. Aside from that you can try running crash or another kernel debugger on the live kernel and see if you can get at dentry for the dentry showing problems and its parent. From that you can check it against the logic in nfs_check_verifier() and see why the directory data isn't getting invalidated properly (which is what I suspect is happening).
The behavior when creating the file makes some sense: 1) Client thinks file doesn't exist, but it does on server. 2) Client opens file with O_CREAT, server just sends back filehandle of existing file (probably no O_EXCL flag was used) 3) Client does a write and server dutifully overwrites the bytes at the beginning of the file. ...not unexpected given the situation, I don't think. The crucial bit is that the negative dentry is sticking around when it shouldn't be and that seems to be occuring because the client isn't noticing that the directory has changed.
Jeff, I created a new bug and hope you don't mind that it is assigned to you. Bug #362241
I can't run a crash on the kernel now since after re-writing the file, the problem is gone ;)
Ok, that makes sense. The create would have instantiated the dentry and so you wouldn't have the negative dentry stuck there anymore. You could have probably gotten similar results by doing a 'touch phpbb' on the bad client.
I've added some new test kernels to my people page: http://people.redhat.com/jlayton ...these are essentially 4.6 kernels with some extra patches. Of interest here are these two patches: 247905 0019-BZ-247905-NFS-Attribute-timeout-handling-and-wrapp.patch 327591 0020-BZ-327591-NFS-discard-pagecache-data-for-dirs-on-d.patch ...the patch for bug 247905 is a patch that's already proposed for inclusion in 4.7, and the other one is the fix for the known reproducer in the patch description. Testing these might be helpful, but this is less than ideal since we have no reproducer. If the problem goes away we can't be sure what fixed it (or even if it is fixed). At this point, the thing that might be most helpful when this occurs is to force a crash on an affected client and collect a coredump. From that I may be able to drill down into it and determine why the directory isn't being revalidated properly. If you do this, please also log into a "working" client and collect: output from stat on the bad entry itself (the one not showing up correctly on the client) output from stat on the parent directory of the "bad" dentry presuming that the old file was moved when this occurs, a stat against the original file ...that info will be needed to compare with the info that the malfunctioning client has about these dentries/inodes.
*** Bug 362241 has been marked as a duplicate of this bug. ***
We are seeing very similar issues. We have a cluster of RHEL3 and RHEL4 machines. We made a small but critical filesystem change over the weekend, and all the RHEL4.5 machines saw the old layout, not the new one. Note that this filesystem is read-only everywhere on the cluster, except for one data processing machine. First the RHEL3 client: [root dat]# /data03/datfix/dat [root dat]# stat . File: `.' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 13h/19d Inode: 115835091 Links: 14 Access: (2750/drwxr-s---) Uid: ( 1009/ UNKNOWN) Gid: ( 820/raw) Access: 2007-11-09 18:29:31.000000000 +1100 Modify: 2007-11-04 12:44:40.000000000 +1100 Change: 2007-11-04 13:20:45.000000000 +1100 Then the RHEL4.5 client. [root dat]# pwd /data03/datfix/dat [root dat]# stat . File: `.' Size: 4096 Blocks: 8 IO Block: 8192 Directory Device: ah/10d Inode: 105873409 Links: 14 Access: (2750/drwxr-s---) Uid: ( 1009/ UNKNOWN) Gid: ( 820/raw) Access: 2007-11-09 18:31:36.000000000 +1100 Modify: 2007-11-04 12:59:37.000000000 +1100 Change: 2007-11-04 13:20:55.000000000 +1100
Setting to needinfo for a vmcore or results from the test kernel.
So this is an ultrasimplified setup of our setup: NetApp Filer(Volume A)---snapmirror---> NetApp Filer (Readonly Volume A) | / | | | / Server C | | Server B | Server A Server D So we will say that on Server A we do something like mkdir /nfs/dir1. A week later on Server A, we do mkdir /nfs/dir2. Now after dir2 was created and the snapmirror has updated, servers B and C see dir2 on their readonly mounts. Server D can NOT see this file. mount -o,remount /nfspath generally works, but yesterday, we had to actually umount and then mount to get the cache to update. This seems to be quite similar to Jon's issue and setup. Perhaps it has something to do with the readonly mount. Note that we mount the r/o mount as r/w from the linux clients. For good reason, management is not ok with running test kernels on transactional production hosts. Since this is completely non-deterministic and happening in production, our solution is to rollback to AS4.4 and rekickstart hundreds of boxes (thank god for Sun's ILOMS). We do still have the 2 original hosts you were helping me troubleshoot marked for maintenance and not being used. Since vmcore can contain sensitive memory and things like ssh keys perhaps, I'd rather not upload it publically. Where can I upload a vmcore from the affected host you helped us with earlier? Does redhat have an sftp server for this or something like that?
The only means I know of is via non-encrypted FTP. I believe our support organization has some other methods of transferring cores in a more secure way.
bug 401551 may also be related to this issue, but it's again hard to tell without some conclusive info.
I recently ran into a NFS directory caching problem on 5.0 as well as 5.1. It appears that I have encountered bug 364351. In researching my problem I found bug 327591 and by extension bug 364351. I used the test script documented in bug 327591 and found at the following URL. http://www.linux-nfs.org/Linux-2.6.x/2.6.21/linux-2.6.21-021-fix_readdir_stale_cache.dif I was able to reproduce the problem on my 5.0, and 5.1 boxes, but not on my 3.8 boxes. I went ahead and tested your RHEL5 test kernels found at the following URL. http://people.redhat.com/jlayton I can confirm that the following kernels work in my dev environment and that they fix bug 364351. kernel-PAE-2.6.18-58.el5.jtltest.17 kernel-2.6.18-58.el5.jtltest.17 kernel-2.6.18-58.el5.jtltest.17.x86_64 I am cross posting this to both bug bug 364351 and bug 327591 as they are possibly related and the kernels tested above contain potential fixes for both bugs, though no test case exists for bug 327591.
So is this an issue in both RHEL 4 (originally reported) or just RHEL 5?
Whoops, forget that - I now see the RHEL 4.7 bug 247905, and the 5.2 bug 364351.
> In researching my problem I found bug 327591 and by extension bug 364351. I > used the test script documented in bug 327591 and found at the following > URL. > > http://www.linux-nfs.org/Linux-2.6.x/2.6.21/linux-2.6.21-021-fix_readdir_stale_cache.dif Aye, that's the rub. We can easily reproduce *a* problem on both RHEL4 and RHEL5, and that patch fixes it. I'm proposing that patch for the next set of updates (see bug 364351 and bug 364361). The problem is that we have no way to know whether the problem originally reported in this BZ has anything to do either bug 247905 or bug 364351. I'll also note that there's another patch that might be related in bug 401551, but I'm planning to devel_nack that patch at this point since it doesn't seem to fix the problem reported in that BZ. I've asked the original reporters to produce a core dump if possible, but no one has come across with one as of yet. In fact, I'm going to set this back to needinfo since we still don't have a clear understanding of the problem reported in this bug and I need something to go on at this point. A way to reproduce this at will would be ideal, but a crash dump may also be helpful.
unfortunately (I guess), it hasn't happened again. It's very very strange.
Ok, setting this to NEEDINFO until it does...
Ok. Finally, it has happened again. I haven't had time to look into setting up kernel dumps since this would have required setting this up on every machine in our department "just in case", and my schedule hasn't allowed me this time. Here are the present details. I can send the output of commands that you like... I modified /cs/local/lib/udb/get-classlist on an NFS client, hop today: % ls -ali /cs/local/lib/udb/get-classlist 19858066 -rwxr-x--- 1 root tech 433 Jan 15 11:49 /cs/local/lib/udb/get-classlist From all of our other NFS clients you see the old copy: % ls -ali /cs/local/lib/udb/get-classlist 19858379 -rwxr-x--- 1 root tech 590 Jan 2 2007 /cs/local/lib/udb/get-classlist
What kernel is on this client? How long has it been up? > I haven't had time to look into setting up kernel dumps since this would have > required setting this up on every machine in our department "just in case", > and my schedule hasn't allowed me this time. I don't think we can tell much here without a core. The other alternative is to run crash on the live kernel, but that's really too much to do by "remote control", not to mention that the data structures in the live kernel might change while you're looking at the problem. Though if you want to take a crack at it, get the debuginfo packages for your kernel (talk to your support person, I'm not sure if they're available through RHN), install the "crash" kernel debugger on it and then run crash on the live kernel. Dave Anderson's people page has some good documentation for starters: http://people.redhat.com/anderson The idea will be to track down the inode that's not showing the changes, and its parent inode. From that, you'll want to walk through the validation logic in nfs_revalidate_inode and nfs_update_inode, track down the corresponding data structures in kernel memory and see if you can tell why things aren't being revalidated like they should. Let me know if you have questions or get a core...
Regarding Comment #48, yes, please, do post the commands required to get to this state. I am wondering whether some changes for bz157266, https://bugzilla.redhat.com/show_bug.cgi?id=157266 might resolve this. That bugzilla references metadata cache handling and better handles stale file handles and situations where a file has changed state in some unexpected fashion.
Hi Peter. I think my message wasn't clear. I was saying that I could send the output of any commands that you would like me to try to provide details necessary to solve the problem. I only edited the file to get to this particular state, and can't really provide other feedback. I am looking into whether I can provide a kernel dump if diskdump/netdump were not previously installed.
Oops, sorry, I misread your comment. Perhaps too much hopeful thinking on my part... :-)
To clarify, when this problem occurs, here's what I need: 1) run the "stat" command against the "problem" file, and its parent directory, and its parent directory, etc. all the way back to the mountpoint on the client. This needs to be done on the machine showing the problem and from another machine that isn't (the server is fine for the machine that isn't). The idea is that I need to understand what the client thinks about the file's metadata and what the real metadata should be. Collect the output from both machines. 2) After that, collect a coredump. This means setting up either netdump or diskdump, and forcing a crash. The easiest way to crash the box is to do: # echo c > /proc/sysrq-trigger ...it's highly recommended that you test your ability to collect crash dumps before relying on them. Getting them working correctly can be tricky. I recommend getting our support people involved as well. They've helped a lot of people set them up.
My hope there is that I'll be able to track down the inode that's showing the issue and the NFS-specific inode info and can determine why the file isn't being revalidated as it should be. ...actually, I have a 1.5 here as well: 1.5) just before crashing the box, do this: # echo 32767 > /proc/sys/sunrpc/nfs_debug ...and then do a final: # stat /path/to/problem/file ...this should dump a bit of NFS debug info into the ring buffer. The core should then have that info and that should help me to track it down the inode inside the coredump. It may also give me a hint as to why the inode isn't being revalidated.
As it happens, I can now provide everything that you have requested including the core. on server experiencing the problem: # cd /cs/local/lib/udb # stat get-classlist File: `get-classlist' Size: 590 Blocks: 8 IO Block: 8192 regular file Device: 14h/20d Inode: 19858379 Links: 1 Access: (0750/-rwxr-x---) Uid: ( 0/ root) Gid: ( 1000/ tech) Access: 2008-01-16 03:24:53.000000000 -0500 Modify: 2007-01-02 13:12:09.000000000 -0500 Change: 2008-01-15 11:49:16.000000000 -0500 # cd .. # stat udb File: `udb' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 14h/20d Inode: 19858363 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 1000/ tech) Access: 2008-01-16 03:14:13.000000000 -0500 Modify: 2008-01-15 13:12:24.000000000 -0500 Change: 2008-01-15 13:12:24.000000000 -0500 # cd .. # stat lib File: `lib' Size: 36864 Blocks: 72 IO Block: 8192 directory Device: 14h/20d Inode: 6505094 Links: 125 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 1000/ tech) Access: 2008-01-16 03:08:29.000000000 -0500 Modify: 2007-12-11 11:24:47.000000000 -0500 Change: 2007-12-11 11:24:47.000000000 -0500 # cd .. # stat local File: `local' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 14h/20d Inode: 14385919 Links: 19 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 1/ bin) Access: 2008-01-16 03:08:22.000000000 -0500 Modify: 2007-05-17 09:08:36.000000000 -0400 Change: 2007-05-17 09:08:36.000000000 -0400 # cd .. # stat cs File: `cs' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 13h/19d Inode: 2801958 Links: 9 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2008-01-16 09:17:05.000000000 -0500 Modify: 2008-01-02 11:05:48.000000000 -0500 Change: 2008-01-02 11:05:48.000000000 -0500 On another server not experiencing the problem: # cd /cs/local/lib/udb # stat get-classlist File: `get-classlist' Size: 433 Blocks: 8 IO Block: 8192 regular file Device: 14h/20d Inode: 19858066 Links: 1 Access: (0750/-rwxr-x---) Uid: ( 0/ root) Gid: ( 1000/ tech) Access: 2008-01-16 03:24:53.000000000 -0500 Modify: 2008-01-15 11:49:16.000000000 -0500 Change: 2008-01-15 11:49:16.000000000 -0500 # cd .. # stat udb File: `udb' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 14h/20d Inode: 19858363 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 1000/ tech) Access: 2008-01-16 09:42:34.000000000 -0500 Modify: 2008-01-15 13:12:24.000000000 -0500 Change: 2008-01-15 13:12:24.000000000 -0500 # cd .. # stat lib File: `lib' Size: 36864 Blocks: 72 IO Block: 8192 directory Device: 14h/20d Inode: 6505094 Links: 125 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 1000/ tech) Access: 2008-01-16 03:08:29.000000000 -0500 Modify: 2007-12-11 11:24:47.000000000 -0500 Change: 2007-12-11 11:24:47.000000000 -0500 # cd .. # stat local File: `local' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 14h/20d Inode: 14385919 Links: 19 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 1/ bin) Access: 2008-01-16 03:08:22.000000000 -0500 Modify: 2007-05-17 09:08:36.000000000 -0400 Change: 2007-05-17 09:08:36.000000000 -0400 # cd .. # stat cs File: `cs' Size: 4096 Blocks: 8 IO Block: 8192 directory Device: 13h/19d Inode: 2801958 Links: 9 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2008-01-16 09:17:05.000000000 -0500 Modify: 2008-01-02 11:05:48.000000000 -0500 Change: 2008-01-02 11:05:48.000000000 -0500 On both systems in /etc/fstab: forest-mrpriv:/obj/cs /cs nfs hard,rw,intr,nfsvers=3,tcp 0 0 forest-mrpriv:/obj/local-intel-linux /cs/local nfs hard,rw,intr,nfsvers=3,tcp 0 0 On the bad system: # echo 32767 >! /proc/sys/sunrpc/nfs_debug # stat /cs/local/lib/udb/get-classlist # sysctl -w kernel.sysrq=1 # echo c >! /proc/sysrq-trigger Where do I securely upload the 4.2 GB core and associated log file?
Please bzip2 the core file and log, and follow these instructions: To upload file(s): >lftp dropbox.redhat.com >cd incoming >put unique-filename Or >mput unique-filename1 unique-filename2 ... unique-filenameX ...after it's uploaded, email me the exact filenames that they were given (it's a blind directory and I can't easily list the files within it).
As you aware, the core file and log have been uploaded and you have been e-mailed as instructed. Let the debugging begin :) :)
I've unzipped the core -- can you elaborate where I can get the sources, kernel package and debuginfo for this kernel? 2.6.9-55.0.2.EL.CSE.smp ...in particular, the i686 version. Note that this kernel is unsupported by Red Hat, but presuming that it doesn't have any NFS related patches, I'd still like to have a look and see if I can determine the cause of this problem. Even without that, there's some interesting info in the logs: All of the containing directory info seems to be identical on both machines. The main difference seems to be the file itself. The inode number has changed, which suggests that the file was silly-deleted or renamed and then replaced with a new copy. There are several cases where the "udb" directory is revalidated, but the cache is not purged. This one is interesting: NFS: nfs_do_filldir() filling ended @ cookie 4824; returning = -22 NFS: nfs_update_inode(0:14/19858363 ct=1 info=0x6) NFS: dentry_delete(udb/get-classlist, 8) NFS: readdir_search_pagecache() searching for cookie 4824 NFS: find_dirent_page() searching directory page 0 NFS: found cookie 12 ... NFS: revalidating (0:14/19858363) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:14/19858363 ct=1 info=0x6) NFS: (0:14/19858363) revalidation complete NFS: readdir_search_pagecache() searching for offset 0 NFS: find_dirent_page() searching directory page 0 NFS: found cookie 12 at index 0 ...so here we're doing a GETATTR against the directory to validate the info we have in cache. We're not invalidating the cache however, so that suggests that the directory contents in pagecache are tagged with the current timestamp. Obviously, however, the contents are old. There are a couple of possiblities here: 1) Given that this is a 32-bit kernel, this bug is a possibility: https://bugzilla.redhat.com/show_bug.cgi?id=228801 ...especially if the host has been up for a long time and this directory is only infrequently accessed. 2) another possibility is that we're looking at a race of some sort where a readdirplus was done against this directory just before the old inode was removed and this inode added to it. If all of that happened within the same second, then the client would have no way to know that its cached info was out of date. This is a limitation of ext3, so a different server filesystem that supports nanosecond timestamps might be worth considering. 3) finally, we can't rule out bug 364361 here. The log never shows any getattrs being done against either inode of get-classlist. I also never see any readdirplus calls being done. This suggests that the inode is being repopulated out of the cached readdirplus information. It's not clear to me that that info would be getting flushed out on a kernel with the patch in 364361, but it's possible it would help.
Although we do recompile the kernel, it is only because we enable a couple of built-in modules that are not in the default like ieee1394. We don't change the source at all. We used to use this for reiserfs as well, but we don't use reiser any more. For your purposes, you can assume identically: kernel-smp-2.6.9-55.0.2.EL.i686.rpm .. which was, I think, what came with 4.5... I can send you the debuginfo.rpm from the compile if you like. As for uptime, the host was up around 27 days, I believe.
By the way, the file get-classlist was updated by loading it into emacs/vi (I can't remember which I used), making a few changes, and then saving.
Changing tagline to something more descriptive...
Yes, please do upload the debuginfo RPM. There are differences in memory locations with this kernel and the stock debuginfo won't work. You can just FTP as you did the core and post the filename here.
jas327591.tar.bz2 uploaded (including kernel, devel, debuginfo)
Thanks. Got the debuginfo and I'm now able to look in the core. Some notes... Here's the mount in question: crash> mount /cs/local VFSMOUNT SUPERBLK TYPE DEVNAME DIRNAME f6c63e00 c31ad200 nfs forest-mrpriv:/obj/local-intel-linux /cs/local ...this tells us the top level dentry in the superblock: crash> struct super_block.s_root c31ad200 s_root = 0xf6c2a214, ..."lib" should be on the d_subdirs list: crash> struct dentry.d_subdirs 0xf6c2a214 d_subdirs = { next = 0xdbcb979c, prev = 0xf6703c5c }, walking through the list, the dentry addresses are: lib = 0xf67e217c udb = 0xf14af63c get-classlist = 0xf1d9f63c
...inode for get-classlist is here: crash> struct dentry.d_inode 0xf1d9f63c d_inode = 0xf1748e8c, ...abusing the "list" functionality in crash's struct command gives us this: crash> struct nfs_inode -l nfs_inode.vfs_inode 0xf1748e8c ...some interesting fields in the nfs_inode: cache_validity = 0, readdir_timestamp = 0, read_cache_jiffies = 2338890832, attrtimeo = 60000, attrtimeo_timestamp = 2338890832, change_attr = 0, last_updated = 2338890832, cache_change_attribute = 2262627382, data_updates = { counter = 0 }, ... i_ino = 19858379, ...at this point I need to look over the revalidation logic and make sense of what we're seeing here. I'll also probably need to look over similar info for the "udb" directory.
For udb directory: cache_validity = 0, readdir_timestamp = 2273924542, read_cache_jiffies = 2338941777, attrtimeo = 60000, attrtimeo_timestamp = 2338890515, change_attr = 17388445753563687752, last_updated = 2338941777, cache_change_attribute = 2273924542, data_updates = { counter = 0 }, ....many of the values here are in units of jiffies -- at the time of the core: crash> p jiffies jiffies = $1 = 2339201735 ...from the info in the core: DATE: Wed Jan 16 10:10:54 2008 UPTIME: 27 days, 01:51:41
So what I want to do is to determine from the timestamps in the inode fields, at exactly what wallclock times we did the last readdir on this directory and revalidated these inodes. So I need to establish the exact timeofday at boot time, so we can interpret the values in jiffies to wallclock times. current time at time of crash: crash> p xtime xtime = $1 = { tv_sec = 1200496254, tv_nsec = 661257126 } Add wall_to_monotonic to determine uptime: crash> p wall_to_monotonic wall_to_monotonic = $2 = { tv_sec = -1198157670, tv_nsec = 332117792 } So that gives boot time as roughly this many seconds ago: tv_sec = 1200496254-1198157670 + (661257126-332117792)/10^9 = 2338584.329139334 ...which should be ~ this many jiffies: 2338584329.139334000 ...but we're off by quite a bit here from the jiffies value in the kernel. This implies that we have 617405 more timer interrupts than we should (a little over 10 minutes). 617405.86066600 ...so perhaps wall_to_monotonic was skewed? I see that ntpd is running, so it's possible that it's causing things not to add up exactly. I'll need to ponder how best to make sense of the jiffies values above. It may be best to not worry about wall_to_monotonic and to simply try to work backward from the time of the crash.
I'm guessing you're in Eastern timezone too, since the mtime that you posted on the file and in the core works out exactly to my local timezone... Working backward from jiffies at time of crash. The readdir_timestamp is set to jiffies just before a readdir/readdirplus call goes out onto the wire: (2339201735-2273924542)/1000 = 65277.19300000000000000000 1200496254-65277 = 1200430977 $ perl -e 'print localtime(1200430977) . "\n";' Tue Jan 15 16:02:57 2008 ...so that's when the last readdir was actually done on the wire. Well after the mtime on the directory, which is: [jlayton@tleilax ~]$ perl -e 'print localtime(1200420744) . "\n";' Tue Jan 15 13:12:24 2008
Just a note that the patch that was originally proposed for bug 401551 might be of use here. I need to check some other metadata first and see if I can verify that...
So this is interesting... The dentry's d_time field is set by nfs_renew_times(). If I read it correctly, then it gets set to jiffies whenever the dentry is revalidated or created. crash> struct dentry.d_time 0xf1d9f63c d_time = 1518062219, ...which is: Sun Jan 6 22:05:14 2008 ...so the dentry has existed since at least then and hasn't had its time renewed since. What I'd like to do now is have a look at the readdirplus data in pagecache and see if we can confirm that the OTW readdirplus call has the new fileid for the file. This is tricky though since the pages in question are indexed by a radix tree and I'm not sure how to walk a radix tree in crash...
Ok, this is odd... crash> struct inode.i_nlink,i_dentry 0xf1748e8c i_nlink = 1, i_dentry = { next = 0xcfb03384, prev = 0xf1d9f67c }, crash> list 0xcfb03384 cfb03384 f1d9f67c f1748e9c crash> struct dentry.d_name -l dentry.d_alias f1d9f67c d_name = { hash = 1734851129, len = 13, name = 0xf1d9f6b0 "get-classlist" }, crash> struct dentry.d_name -l dentry.d_alias cfb03384 d_name = { hash = 1903515488, len = 14, name = 0xcfb033b8 "get-classlist~" }, ...so i_nlink=1, which means that the inode shouldn't be hardlinked, but there are 2 dentries attached to it. This seems significant, but I'm not quite sure how this would occur...
Presuming that the contents of this directory have not changed, can you run a stat against the get-classlist~ file on the server and post the output here?
As a note, I'm starting to think that this might be another manifestation of bug 228801. Comment #24, in particular seems to show the exact same symptoms, but you also seem to be occasionally getting stale positive dentries as well.
Yes, I think the patch for 228801 would probably fix this. We're obviously occasionally getting a valid dcache lookup hit, so we're probably passing this dentry into nfs_lookup_revalidate() at least occasionally. That function does this: /* Force a full look up iff the parent directory has changed */ if (nfs_check_verifier(dir, dentry)) { if (nfs_lookup_verify_inode(inode, nd)) goto out_zap_parent; goto out_valid; } The patch for bug 228801 changes nfs_check_verifier. In particular, it adds this condition: + || verf != NFS_I(dir)->cache_change_attribute) ...from what I can tell, if this dentry passed through this function at the time of the core, this function would have returned 0 and we wouldn't be blindly accepting the dentry. It would instead be calling nfs_cached_lookup() on it and we'd be getting a new dentry (if I'm interpreting this correctly). I'm going to close this as a dupe of bug 228801. I recommend updating to the 4.6 kernel and seeing if you're still able to reproduce this. If so, then please reopen this bug and I'll have a second look. *** This bug has been marked as a duplicate of 228801 ***