Description of problem: After creating and deleting directories several times from two different clients, eventually "Stale NFS file handle" happens Version-Release number of selected component (if applicable): RHEL 4 Update 3 How reproducible: Always Steps to Reproduce: 1. Export a GFS filesystem via NFS 2. From 2 different clients, mount that filesystem. 3. Follow these steps in a loop: 1/ CLIENT 1: # Create dir /mnt/dir/data2 2/ CLIENT 1: # ls -l /mnt/dir/data1 3/ CLIENT 2: # ls -l /mnt/dir/data2 4/ CLIENT 2: # Remove dir /mnt/dir/data1 5/ CLIENT 1: # Remove dir /mnt/dir/data2 6/ CLIENT 2: # ls -l /mnt/dir/data1 7/ CLIENT 2: # Create dir /mnt/dir/data1 8/ CLIENT 2: # ls -l /mnt/dir/data2 Actual results: This error will happen after about 3 loops. Expected results: GFS/NFS correctly handles creation and deletion without having stale file handles. Additional info:
What's the client OS and NFS versions you use ? My RHEL3/4 NFS clients seem to be able to re-do lookup upon ESTALE in this case. The following is the script that I believe following what you described in the problem description ("wendyc" is RHEL3 client and "engcluster.gsslab" is RHEL4 nfs client). I've let it loop for 5 minutes without seeing any ESTALE error messages. Did you try this on ext3 filesystem ? while [[ $at_prom == "yes" ]] do ssh wendyc.rdu.redhat.com mkdir /mnt/nfs/data2 ssh wendyc.rdu.redhat.com ls -l /mnt/nfs/data1 ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data2 ssh engcluster2.gsslab.rdu.redhat.com rmdir /mnt/nfs/data1 ssh wendyc.rdu.redhat.com rmdir /mnt/nfs/data2 ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data1 ssh engcluster2.gsslab.rdu.redhat.com mkdir /mnt/nfs/data1 ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data2 done Sample output: ls: /mnt/nfs/data1: No such file or directory ls: /mnt/nfs/data2: No such file or directory total 0 total 0 ls: /mnt/nfs/data1: No such file or directory ls: /mnt/nfs/data2: No such file or directory total 0 total 0 ls: /mnt/nfs/data1: No such file or directory ls: /mnt/nfs/data2: No such file or directory
Maybe I mis-understand what you mean by "stale file handle" .. Are you referring the "No such file or directory" messages as "stale file handle" ?
Please be aware that NFS client normally has its own cache implemented and the protocol itself is not designed to have cache coherency across different (NFS) clients. Unless you specifically mount the NFS share with "sync" option, the changes made by one client may not even be seen by server itself for certain time interval. So a client can hold on to a filehandle it obtained sometime ago that no longer valid. When the client eventually uses that outdated filehandle to send requests, it is legitimate for a server (and GFS) to return ESTALE. Different clients have different ways to handle this error. Newer versions of Linux client seem to re-do lookup to update its filehandle. Let us know your NFS client setup and pass more details (such as both server/client OS versions, the test script itself and/or its output). If just based on the current problem description, I would say the behavior should be expected.
Tentatively close this bugzilla. Will re-opened if required.