Bug 190542

Summary: NFS file handles point at wrong fs after server reboot
Product: Red Hat Enterprise Linux 4 Reporter: Bevis King <brwk>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: urgent Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-31 01:09:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bevis King 2006-05-03 13:43:45 UTC
Description of problem:
An RHEL4 (or FC5) file server exports a significant number of filesystems (>12
at least).  The NFS server reboots, the NFS client has several file systems
mounted from the server when it goes down.  When the server comes back up, the
client still believes it has all the filesystems mounted, but contents of each
of the filesystems has been randomly swapped over with the contents of a
different one of the exported filesystems. 

Consider this:
server1 exports fs01, fs02, fs03, fs04, fs05, fs06, fs07, fs08, fs09, fs10,
fs11, fs12.
client1 mounts each one on say /import/fs01, ...

Reboot the server, when it comes back client1 sees the files that live in fs01
actually appearing in fs07, those files that live in fs07 are in fs04, etc. 
Basically the file systems are all still present, but the mapping is now
completely arbitary and each existing mount displays the content of a different
file systems.  It is as if the file handles are cached by the client, and the
server mapping of those file handle numbers is completely arbitary as it
reboots.  This bug has been seen with RHEL4, RHEL3, RH73, FC4 and FC5 clients,
and with both FC5 and RHEL4 NFS servers.

Version-Release number of selected component (if applicable):
RHEL4 - Red Hat Enterprise Linux AS release 4 (Nahant Update 3)
Kernel - kernel-smp-2.6.9-22.0.2.EL

How reproducible:
Every time so far in our tests.

Steps to Reproduce:
1. NFS export at least 12 filesystems from an RHEL4 or FC5 server.
   Place content within each file system to make them identifiable.
2. mount them via /etc/fstab or am-utils/autofs on a client -
   make sure they are "in use" and active
3. reboot the fileserver
4. Check the contents of the filesystems from the client once the
   server has come back.
  
Actual results:
The filesystem content visible will no longer reflect the actual content of the
file system purportedly mounted.  It will however reflect one of the filesystems
exported, and it's structure and permissions will be correct for the file system
actually being exported.

Expected results:
The file systems return to their previous exported locations so the client sees
uninterupted service.

Additional info:
Happens with both FC5 and RHEL4 NFS servers.  We continue to investigate this.

Comment 1 Bevis King 2006-05-05 10:46:39 UTC
It looks as if this is caused by the device manager renumbering the partitions
it exports and that an explict set of the fsid in /etc/exports may resolve the
operational issues we're seeing.  We'll test this further and report back.

Comment 2 Bevis King 2006-10-30 15:57:23 UTC
To avoid this issue, you need to export each file system with a static fsid=
flag in the /etc/exports file.  With that done, this problem doesn't occur.

I was going to mark this one as WORKSFORME because the workaround listed above
does solve the operational issue.  But Bugzilla won't let me....

Comment 3 Steve Dickson 2006-10-31 01:09:38 UTC
Sorry about that... I'll close it for you... Thank you for using RHEL!