Bug 208986

Summary: Stale NFS file handle
Product: [Retired] Red Hat Cluster Suite Reporter: Lenny Maiorani <lenny>
Component: gfsAssignee: Wendy Cheng <nobody+wcheng>
Status: CLOSED CANTFIX QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-17 20:13:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lenny Maiorani 2006-10-02 20:52:25 UTC
Description of problem:
After creating and deleting directories several times from two different
clients, eventually "Stale NFS file handle" happens

Version-Release number of selected component (if applicable):
RHEL 4 Update 3

How reproducible:
Always

Steps to Reproduce:
1. Export a GFS filesystem via NFS
2. From 2 different clients, mount that filesystem. 
3. Follow these steps in a loop:
   1/ CLIENT 1: # Create dir /mnt/dir/data2 
   2/ CLIENT 1: # ls -l /mnt/dir/data1

   3/ CLIENT 2: # ls -l /mnt/dir/data2
   4/ CLIENT 2: # Remove dir /mnt/dir/data1

   5/ CLIENT 1: # Remove dir /mnt/dir/data2

   6/ CLIENT 2: # ls -l /mnt/dir/data1
   7/ CLIENT 2: # Create dir /mnt/dir/data1
   8/ CLIENT 2: # ls -l /mnt/dir/data2
  
Actual results:
This error will happen after about 3 loops.

Expected results:
GFS/NFS correctly handles creation and deletion without having stale file handles.

Additional info:

Comment 4 Wendy Cheng 2006-10-02 22:43:13 UTC
What's the client OS and NFS versions you use ? My RHEL3/4 NFS clients 
seem to be able to re-do lookup upon ESTALE in this case. The following 
is the script that I believe following what you described in the problem 
description ("wendyc" is RHEL3 client and "engcluster.gsslab" is RHEL4 
nfs client). I've let it loop for 5 minutes without seeing any ESTALE
error messages. Did you try this on ext3 filesystem ? 

while [[ $at_prom == "yes" ]]
  do
        ssh wendyc.rdu.redhat.com mkdir /mnt/nfs/data2
        ssh wendyc.rdu.redhat.com ls -l /mnt/nfs/data1
                                                                                
        ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data2
        ssh engcluster2.gsslab.rdu.redhat.com rmdir /mnt/nfs/data1
                                                                                
        ssh wendyc.rdu.redhat.com rmdir /mnt/nfs/data2
                                                                                
        ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data1
        ssh engcluster2.gsslab.rdu.redhat.com mkdir /mnt/nfs/data1
        ssh engcluster2.gsslab.rdu.redhat.com ls -l /mnt/nfs/data2
done

Sample output:

ls: /mnt/nfs/data1: No such file or directory
ls: /mnt/nfs/data2: No such file or directory
total 0
total 0
ls: /mnt/nfs/data1: No such file or directory
ls: /mnt/nfs/data2: No such file or directory
total 0
total 0
ls: /mnt/nfs/data1: No such file or directory
ls: /mnt/nfs/data2: No such file or directory


Comment 5 Wendy Cheng 2006-10-02 22:45:31 UTC
Maybe I mis-understand what you mean by "stale file handle" .. Are you 
referring the "No such file or directory" messages as "stale file handle" ?

Comment 6 Wendy Cheng 2006-10-03 00:12:51 UTC
Please be aware that NFS client normally has its own cache implemented 
and the protocol itself is not designed to have cache coherency across 
different (NFS) clients. Unless you specifically mount the NFS share 
with "sync" option, the changes made by one client may not even be seen 
by server itself for certain time interval. So a client can hold on to a
filehandle it obtained sometime ago that no longer valid. When the client
eventually uses that outdated filehandle to send requests, it is legitimate 
for a server (and GFS) to return ESTALE. Different clients have different 
ways to handle this error. Newer versions of Linux client seem to re-do 
lookup to update its filehandle. Let us know your NFS client setup and 
pass more details (such as both server/client OS versions, the test script
itself and/or its output). If just based on the current problem description, 
I would say the behavior should be expected.

Comment 7 Wendy Cheng 2006-10-17 20:13:35 UTC
Tentatively close this bugzilla. Will re-opened if required.