Bug 624131
| Summary: | First attempt at nfs mounting an ext3/ext4/xfs filesystem always fails with stale NFS handle | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Barry Marson <bmarson> | ||||
| Component: | kernel | Assignee: | J. Bruce Fields <bfields> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Filesystem QE <fs-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 6.0 | CC: | bfields, esandeen, jlayton, kzhang, perfbz, rwheeler, steved | ||||
| Target Milestone: | rc | Keywords: | RHELNAK | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-09-14 12:23:20 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** Thank you for your bug report. This issue was evaluated for inclusion in the current release of Red Hat Enterprise Linux. Unfortunately, we are unable to address this request in the current release. Because we are in the final stage of Red Hat Enterprise Linux 6 development, only significant, release-blocking issues involving serious regressions and data corruption can be considered. If you believe this issue meets the release blocking criteria as defined and communicated to you by your Red Hat Support representative, please ask your representative to file this issue as a blocker for the current release. Otherwise, ask that it be evaluated for inclusion in the next minor release of Red Hat Enterprise Linux. While not formally bz'ed, this issue may be related to the problems where running into when executing the SPECsfs benchmark. Xfs presented filesystems on the NFS server return stale NFS handle to the clients within minutes (sometimes seconds) after starting. This is the only presented filesystem type that does this ... Barry The xfs issue you ran into on SPECsfs, and the fix for it, were entirely xfs-specific; if you're seeing this problem across multiple filesystems I doubt that it's related to Dave's patch for bug #624860. -Eric This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. Bruce, can you see if this still is an issue? If so, can we fix it for 6.1 or is this a 6.2 issue? The attached trace shows:
client sends MNT for /sfs1
server replies with filehandle
01:00:06:00:00:00:08:00:00:00:00:00:00:00:00:00:00:00:00:00
client sends FSINFO with that filehandle
server replies with NFS3ERR_STALE
So, clearly a server bug.
I tried running
mkfs.xfs -f /dev/vdb
mount /dev/vdb /exports
service nfs start
exportfs -orw '*:/exports'
mount -onfsvers3 localhost:/exports /mnt/
umount /mnt/
umount /exports
a few times in a loop on an rhel6 guest and didn't see any failures.
So I'm stuck for now.
Barry, are you still seeing this?
Bruce, Im still seeing this ... at least with the -71 kernel. I noticed that I had not re exportfs after building the filesystems like you did after bringing them nfs online. Doing so had no effect, nor did a showmount -e from the client just before the mount attempt. Barry This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. Is this still an issue with the latest 6.1 code? Thanks! While I should consider upgrading the client side kernel, I've locked it down for years for way back testing ... As of now, a 2.6.9-89.ELsmp client trying to mount a 2.6.32-105.el6.x86_64 still fails. Barry Could I get a look at the exact scripts that are doing the mkfs, nfsd start, etc.? I just want to make sure it's not doing anything unusual. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. Looking at /proc/net/rpc/nfsd.fh/content after a failed mount, it looks like mountd is failing to resolve the uuid; I wonder if this is the same problem as http://www.spinics.net/lists/linux-nfs/msg00876.html (or something similar). I found a similar problem on an RHEL6 test machine: if I shut down nfs, unmount /dev/vdb (which holds my exported filesystem), re-mkfs /dev/vdb, remount it, restart nfs, and try to mount it, the mount succeeds--but, interestingly, comparing 'blkid /dev/vdb' with the export cache (/proc/net/rpc/nfsd.export/content) shows that mountd is still using the uuid of the *old* filesystem. However, if instead of doing "service nfs stop" and "service nfs start" to stop and start nfs, I *just* stop and start rpc.mountd by hand, then mountd gets updated information. Stripping out code from /etc/init.d/nfs, I eventually replaced the "start" and "stop" cases by exactly the commands I was using to start and stop rpc.mountd by hand, and still saw the difference in behavior. My only remaining idea was that it could be some selinux rule; and indeed: looking at strace's of rpc.mountd in both cases, I see that in one an open of /dev/vdb fails, and in the other it succeeds; and after "setenforce 0", everything works. So in my case selinux appears to be preventing liblkid from getting a current uuid. Perhaps it is in your case too. Could you try turning off selinux and seeing if the problem is still reliably reproduceable? selinux is disabled in the clients via /etc/selinux/config the server has selinux=0 on the boot line Barry Looks like this is too late for 6.1... Sorry, I was never able to duplicate this or work out what's going on here; are you still seeing the problem? If not, let's close this BZ until we see it again.... |
Created attachment 438741 [details] ethereal file of failed mount attempt. cmd was: mount -t nfs -o nfsvers=3 sfss1:/sfs1 /mnt Description of problem: Ive been running into a stale NFS file handle issue during client mount since the beginning of RHEL6 testing but now the failure seems to be happening more and effecting certain testing. When I do my NFS server testing from test to test, the nfs service is stopped, the file systems are unmounted, they are then recreated (mkfs) and remounted, the networks supporting NFS are restarted and finally nfs is started. This is the way I have been doing it for years. The problem is, for ext3, ext4, xfs I get a stale NFS handle on the first mount attempt from a client. ext2 and gfs2 do not fail. If my test harness doesnt try this initially, the benchmark innards will fail. My RHEL6 server has been updated to SNAP 10 and running the -59 kernel. This issue never happened with RHEL5 server. The clients were all running an old version of RHEL4. In fact they were at 2.6.9-27.ELsmp. I brought them up to 2.6.9-89.ELsmp yet the problem persists. With steved's help, I captured the ethereal log attempt at mounting. It is attached. The reason I'm concerned so much now is I have been unable to test one of those specific file systems successfully because of "Stale NFS errors" shortly after the benchmark tries to start. Version-Release number of selected component (if applicable): RHEL6 - SNAP 10 -59 kernel How reproducible: every time Steps to Reproduce: 1. Running SPECsfs on the BIGI testbed 2. 3. Actual results: Expected results: Additional info: