Bug 208986 - Stale NFS file handle
Summary: Stale NFS file handle
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Wendy Cheng
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-10-02 20:52 UTC by Lenny Maiorani
Modified: 2010-01-12 03:13 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-10-17 20:13:35 UTC
Embargoed:


Attachments (Terms of Use)

Description Lenny Maiorani 2006-10-02 20:52:25 UTC
Description of problem:
After creating and deleting directories several times from two different
clients, eventually "Stale NFS file handle" happens

Version-Release number of selected component (if applicable):
RHEL 4 Update 3

How reproducible:
Always

Steps to Reproduce:
1. Export a GFS filesystem via NFS
2. From 2 different clients, mount that filesystem. 
3. Follow these steps in a loop:
   1/ CLIENT 1: # Create dir /mnt/dir/data2 
   2/ CLIENT 1: # ls -l /mnt/dir/data1

   3/ CLIENT 2: # ls -l /mnt/dir/data2
   4/ CLIENT 2: # Remove dir /mnt/dir/data1

   5/ CLIENT 1: # Remove dir /mnt/dir/data2

   6/ CLIENT 2: # ls -l /mnt/dir/data1
   7/ CLIENT 2: # Create dir /mnt/dir/data1
   8/ CLIENT 2: # ls -l /mnt/dir/data2
  
Actual results:
This error will happen after about 3 loops.

Expected results:
GFS/NFS correctly handles creation and deletion without having stale file handles.

Additional info:

Comment 4 Wendy Cheng 2006-10-02 22:43:13 UTC
What's the client OS and NFS versions you use ? My RHEL3/4 NFS clients 
seem to be able to re-do lookup upon ESTALE in this case. The following 
is the script that I believe following what you described in the problem 
description ("wendyc" is RHEL3 client and "engcluster.gsslab" is RHEL4 
nfs client). I've let it loop for 5 minutes without seeing any ESTALE
error messages. Did you try this on ext3 filesystem ? 

while [[ $at_prom == "yes" ]]
  do
        ssh wendyc.rdu.redhat.com mkdir /mnt/nfs/data2
        ssh wendyc.rdu.redhat.com ls -l /mnt/nfs/data1
                                                                                
        ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data2
        ssh engcluster2.gsslab.rdu.redhat.com rmdir /mnt/nfs/data1
                                                                                
        ssh wendyc.rdu.redhat.com rmdir /mnt/nfs/data2
                                                                                
        ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data1
        ssh engcluster2.gsslab.rdu.redhat.com mkdir /mnt/nfs/data1
        ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data2
done

Sample output:

ls: /mnt/nfs/data1: No such file or directory
ls: /mnt/nfs/data2: No such file or directory
total 0
total 0
ls: /mnt/nfs/data1: No such file or directory
ls: /mnt/nfs/data2: No such file or directory
total 0
total 0
ls: /mnt/nfs/data1: No such file or directory
ls: /mnt/nfs/data2: No such file or directory


Comment 5 Wendy Cheng 2006-10-02 22:45:31 UTC
Maybe I mis-understand what you mean by "stale file handle" .. Are you 
referring the "No such file or directory" messages as "stale file handle" ?

Comment 6 Wendy Cheng 2006-10-03 00:12:51 UTC
Please be aware that NFS client normally has its own cache implemented 
and the protocol itself is not designed to have cache coherency across 
different (NFS) clients. Unless you specifically mount the NFS share 
with "sync" option, the changes made by one client may not even be seen 
by server itself for certain time interval. So a client can hold on to a
filehandle it obtained sometime ago that no longer valid. When the client
eventually uses that outdated filehandle to send requests, it is legitimate 
for a server (and GFS) to return ESTALE. Different clients have different 
ways to handle this error. Newer versions of Linux client seem to re-do 
lookup to update its filehandle. Let us know your NFS client setup and 
pass more details (such as both server/client OS versions, the test script
itself and/or its output). If just based on the current problem description, 
I would say the behavior should be expected.

Comment 7 Wendy Cheng 2006-10-17 20:13:35 UTC
Tentatively close this bugzilla. Will re-opened if required. 


Note You need to log in before you can comment on or make changes to this bug.