Bug 431253
Summary: | NFS CREATE fails on hp-ux clients with kernel 2.6.18-53.1.6.el5 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Dan Goetzman <dan_goetzman> | ||||||
Component: | kernel | Assignee: | Steve Dickson <steved> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | GFS Bugs <gfs-bugs> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.0 | CC: | arwin.tugade, jburke, jlayton, rpeterso, rwheeler, steved, tao | ||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-10-21 13:36:26 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Dan Goetzman
2008-02-01 19:55:35 UTC
Created attachment 293760 [details]
Patch to nfsd_setattr to ignore -EACCES error caused by the GFS filesystem
I don't have access to an hp-ux box, so I have no good way to debug this. However, I do know that the extended attribute code in GFS was recently changed to remove the permission() calls for a different reason. This is documented as bugzilla bug #323111, crosswritten from GFS2 to GFS1. Since this is an internal bugzilla record, I don't know if you will have permission to view it, and I don't have authority to change that. However, the fix went into RHEL in kmod-gfs-0.1.21-1, and Centos will probably be the same. Can you upgrade to that version or higher and see if this is still a problem? Thanks. How would I get this kmod-gfs-0.1.21-1? I just did a yum update and I have the latest, kmod-gfs-0.1.16-5 I will test against this gfs module if I can get it... Can you post the location of your yum repository? In other words, the contents of /etc/yum.repos.d/RHEL5.repo or the Centos equivalent. If there isn't a newer version, you can either wait for a new update (I don't know how Centos works or how often they do updates) or alternately, since you've done some experimentation, I suppose you could compile the latest gfs from source code by doing something like this: First, fetch the source code from CVS. Something like this: cvs -d :pserver:sources.redhat.com:/cvs/cluster checkout -r RHEL5 cluster Then compile by doing something like: cd cluster/gfs-kernel/src/gfs .configure --kernel_src=/usr/src/kernels/whatever-your-kernel-is make I don't recommend doing a make install, but you can manually load the gfs module by doing insmod gfs.ko from that directory. Then mount the file system and try to recreate the problem. Of course, you'll first have to boot to the kernel that doesn't have the circumvention patch you attached to see if it it's broken or fixed. I should have mentioned this in my update, I have built a Red Hat AS 5 system just to test and work on this bug. So I am using RHAS5, the release file lists; Red Hat Enterprise Linux Server release 5.1 (Tikanga) I installed 5.0 and then did a "yum update" to get current. When I ran the update that brought me up to kernel-2.6.18-53.1.6.el5. I have not changed any of the yum config files. The only thing I see on my system under /etc/yum.repos.d is the single file rhel-debuginfo.repo, so I assume that I am using the default repos? Tested a current build from the CVS repo, problem still exists. I used the cvs command given previously to pull the current cluster source. Compiled it and reloaded the gfs module, dmesg showed; GFS <CVS> (built Feb 6 2008 13:19:52) installed So it would appear the new module actually loaded OK. I mounted the GFS test filesystem, started NFSD and tested using the HP-UX NFS client. Same results, the latest updates to the GFS module do not appear to address this problem. Thanks for checking that out. I'm trying to find/borrow an hp-ux client machine within Red Hat so I can debug this. I've tried several channels already and come up empty handed, but I'm not out of options yet. In the mean time, it might help me if you collect an Ethereal / Wireshark trace of this problem so I can analyze the requests that hp-ux is actually sending to the NFS server. The smaller the trace, the easier it will be for me to read. I already have one, I should have attached it from the beginning. Attaching the tcpdump file now... Also, I added some simple printk debugging to the NFSD code in the kernel and found it to be following this path (much detail not shown); nfsd_proc_create nfsd_create_v3 vfs_create - Note: the create returns OK nfsd_setattr notify_change generic_permission - Note: returns a -EACCES error! In case that might help... Created attachment 294161 [details]
HP-UX NFS Client tcpdump trace
Here's the tcpdump I had from my testing against CentOS5. It should be the same
as my RHAS5 as it appears to have the same defect.
I finally got access to a real hp-ux machine and tried to recreate the problem. It did not fail. The client was 11.23 on ia64: # uname -a HP-UX xxxxx B.11.23 U ia64 1756071376 unlimited-user license For my server I used a RHEL5.2 prototype / pre-release system, i686. I copied three different files over NFS of various sizes: 1K, 150K and 1.5MB. I verified on the server that the files were copied successfully with no error messages and the contents were correct. I was running as root on both client and server, with no firewall between them and no root squash. My /etc/exports looked like this: [root@kool gfs]# cat /etc/exports /mnt/gfs *(rw,insecure,no_root_squash) I haven't tried it on a true 5.1 server machine. Dan, are you sure this wasn't a problem with selinux on the server or various firewalls (iptables, etc.) interfering with the copy? I am pretty much sure... I normally configure my Linux NFS servers with iptables off and selinux disabled. And, I was able to trace the NFS call into the nfsd layer in the kernel and down to the generic_permission() kernel function. So I know the NFS CREATE client RPC is not being blocked. If I recall, a simple cp of a file that does not yet exist on the NFS server using gfs filesystems is all that is required to get the failure. I think a simple "mkdir" also fails. I carefully looked at your test setup and I missed a detail on the first read... It will NOT fail if you are running as root on the client. I knew that but I did not read your posting carefully enough. Try it again using a regular account on the HP-UX client. By using a non-root user, I've recreated the problem using the hp-ux machine. I've also compared what's happening against a Fedora 8 nfs client with suggestions from Steve Dickson. The difference between the two calls is this: The Fedora 8 nfs client specifies a create mode of 2 (exclusive) whereas the hp-ux client specifies a create mode of 0 (unchecked). Now I'll backtrack how create mode 0 is handled by nfs and gfs as opposed to ext3. I tested the same set of commands using: (1) gfs, (2) gfs2, (3) ext3, (4) xfs All of them behaved the same way with the same client/server pair. If the client is hp-ux and the user was not root, they gave permission denied. If the user was root, they worked properly. If the client was f8, they worked properly. So this looks to me like an nfs problem, not a gfs problem. I ran my tests again on my RHEL 5 server using both gfs and ext3. On my server the hp-ux client fails when the backing store is; gfs - fails as described ext3 - no failure Maybe your test system is using a ext3 update where they also started calling generic_permission? Just a guess on my part... Anyhow, it may really be a nfsd problem (that's where my simple patch was made) that just surfaced with GFS first, and then in later as other filesystems were updated? I tried out an nfsd patch that had been posted for bug #432690, but it didn't solve the problem. I also verified that it still fails on ext3 with my NFS server. If what Dan says is true, that ext3 works for him, then we may be dealing with two problems: one that's keeping nfsd from working on my server, and one that's making gfs fail on Dan's. Something you may want to consider... In my limited research on this problem I discovered that this generic_permission() was introduced starting in kernel 2.6.10 and that it was recommended that all filesystems start using this function instead of older methods (in the filesystem code I believe). I read that XFS was already changed, but when I tested with RHEL5 it appeared that change was not yet made for XFS or ext3. Just GFS was calling this generic_permission and when it returns with -EACCES that causes the failure in the NFSD code to be returned. So, you may want to verify if you are running more recent kernels than I am that this generic_permission() has not been introduced into ext3, XFS, etc... Just a thought, if it's true then RHEL5 will be broken for all HP-UX NFS clients for more filesystems than just GFS as is my current case at 2.6.18-53.1.6.el5 Any update on this one? I know Bob looked at it from a GFS viewpoint, and when it was found to also be in other filesystems the problem was tagged to be a NFS kernel problem. Since then no activity and I see Bob is still the "Assigned To" person. I was expecting it to be reassigned to another kernel type person... Any updates? I think we're dealing with two problems here, but my progress is being held up by the problem that keeps nfsd from working in this case on ext3. Therefore, I'm reassigning to Steve D. Still trying to locate a HP-UX client... This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Changing component from gfs2-kmod to kernel so that it stops showing up on Kevin Anderson's list. Updating PM score. Has there been any update to this issue. I'm seeing this exact issue: Client: HP-UX wren B.11.11 U 9000/889 NFS Server: Red Hat Enterprise Linux Server release 5.3 (Tikanga) - 2.6.18-128.1.6.el5 x86_64 GFS is the underlying filesystem for the NFS server. Arwin *** This bug has been marked as a duplicate of bug 605720 *** |