This is a followup on my comments from https://bugzilla.redhat.com/show_bug.cgi?id=327591 Here is the basic workflow... system A and system B read the symlink that is changed on system C. Everything is mounting a /vol/sharename on filer 1. From system C, it is moved aside to ${name}.old and recreated pointing to a new configuration file. Sometimes, system A or system B doesn't recognize anything has changed. This might happen on 4 out of 50 systems. Relevant version info: # rpm -qa nfs*; uname -r nfs-utils-1.0.6-80.EL4 nfs-utils-lib-1.0.6-8 2.6.9-55.ELsmp # echo 32767 > /proc/sys/sunrpc/nfs_debug; stat /ptx/SYS/conf/software.conf > stat.txt; echo 0 > /proc/sys/sunrpc/nfs_debug; dmesg -c > dmesg.txt ... # cat stat.txt; echo ------------------------------; cat dmesg.txt File: `/ptx/SYS/conf/software.conf' -> `/software/scripts/conf/Persephone/software.conf-rhea-2.0' Size: 56 Blocks: 0 IO Block: 32768 symbolic link Device: 1fh/31d Inode: 3115833 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 500/ ptx) Gid: ( 500/ ptx) Access: 2007-10-30 07:46:26.302902000 -0700 Modify: 2007-08-01 23:20:45.743523000 -0700 Change: 2007-10-16 22:48:11.198072000 -0700 ------------------------------ NFS call access NFS: nfs_update_inode(0:1f/104 ct=1 info=0x6) NFS reply access, status = 0 NFS: revalidating (0:1f/2670994) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:1f/2670994 ct=1 info=0x6) NFS: (0:1f/2670994) revalidation complete NFS call access NFS reply access, status = 0 NFS: revalidating (0:1f/3115833) NFS call getattr NFS reply getattr NFS: nfs_update_inode(0:1f/3115833 ct=2 info=0x6) NFS: (0:1f/3115833) revalidation complete NFS: dentry_delete(conf/software.conf, 8) NFS: dentry_delete(conf/software.conf, 8) The symlink should be pointing to: /software/scripts/conf/Persephone/software.conf-shoulao-1.0. mount -o,remount /nfs/share is a quick hack to refresh the nfs cache and clear out the problem. We haven't found a way to reliably reproduce this bug.
Jeff, I actually don't think we need a separate bug for this issue. I think this is actually a duplicate of bug 327591. My request to open a new bug was for the problem described in this patch description: http://www.linux-nfs.org/Linux-2.6.x/2.6.21/linux-2.6.21-021-fix_readdir_stale_cache.dif That's a reproducible problem (apparently) and I'm not certain that it's related to bug 327591. I'll open a new bug for that and CC you on it.
I've opened bug 364361 to track the stale readdir cache testcase and patch from comment #1 here. I'm going to close this as a dupe of bug 327591 for now. If it turns out that this problem is different from the one originally reported there, we can reopen this bug. *** This bug has been marked as a duplicate of 327591 ***