Bug 604044
Summary: | NFS4 breaks when server returns NFS4ERR_FILE_OPEN | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Kai Mosebach <redhat-bugzilla> | ||||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||||
Status: | CLOSED ERRATA | QA Contact: | yanfu,wang <yanwang> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.5 | CC: | cward, eguan, jlayton, rwheeler, sprabhu, steved, tao, yanwang | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-01-13 21:37:19 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Kai Mosebach
2010-06-15 09:22:48 UTC
Since we heavily depend on the sun NFS4 service i looked for patches in later kernels myself and found the patch regarding this just in kernel 2.6.19 ... See here : https://kerneltrap.org/mailarchive/git-commits-head/2009/12/14/19033 Since this patch does not work with the current 2.6.18-194.3.1.el5 kernel i added the needed lines, rebuild it and the error is gone now. Please add this to future RHES5 kernels! Created attachment 428928 [details]
backport of the NFS4ERR_FILE_OPEN handling in Linux/NFS patch of Kernel 2.6.19
Thanks for the patch. I'm not sure that the part in nfs4xdr.c is really necessary to fix this, but it seems like a better mapping than -EIO. I'll plan to add this patch to my test kernels in the near future. Actually, now that I look more closely...this patch is broken: case -NFS4ERR_STALE_CLIENTID: case -NFS4ERR_STALE_STATEID: + case -NFS4ERR_FILE_OPEN: + if (exception->timeout > HZ) { + /* We have retried a decent amount, time to fail */ + ret = -EBUSY; + break; + } Because you've put this in after NFS4ERR_STALE_CLIENTID and NFS4ERR_STALE_STATEID, you're making the kernel handle those errors the same way. I don't think that's what we want here. Created attachment 436296 [details]
patch -- backport of NFS4ERR_FILE_OPEN handling patch (try #2)
This patch also makes it so that when you get this error, the kernel goes into state recovery. That's also not ideal. I think this patch is closer to what's needed.
Kai, could you test this and let me know if it also fixes the problem?
I've also added this patch to the test kernels on my people.redhat.com page: http://people.redhat.com/jlayton/ Kai, if you're not able to test this then it may not make 5.6. I'll need to set up a test environment for it and may not have time to do that before the patch submission deadline. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-214.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Tried with OpenSolaris 5.10, cannot reproduce this issue. # OpenSolaris NFS Server bash-3.00# uname -a SunOS unknown 5.10 Generic_141445-09 i86pc i386 i86pc bash-3.00# zfs get all | grep nbmand tank nbmand on local tank/fs nbmand on inherited from tank bash-3.00# share - /export/home rw "" bash-3.00# # NFS client [root@nec-em9 fs]# uname -a Linux nec-em9.rhts.eng.bos.redhat.com 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux [root@nec-em9 fs]# mount | grep nfs4 10.66.65.194:/ on /media type nfs4 (rw,addr=10.66.65.194) [root@nec-em9 fs]# pwd /media/export/home/fs Copied kernel source tree to nfs mount and grep for some string in the tree, then rm the whole tree. Also tried with svn checkout a large project. Ran fsstress on the NFS mount, no issue found on -233 kernel. Confirmed patch linux-2.6-fs-nfs-fix-nfs4err_file_open-handling-in-linux-nfs.patch is applied in kernnel 2.6.18-233.el5 correctly. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |