Description of problem: HP-UX NFS clients fail creating a new file on a CentOS 5 NFS server updated with the 2.6.18-53.1.6.el5 when GFS is used as the backing filesystem. Version-Release number of selected component (if applicable): kmod-gfs-0.1.16-5.2.6.18_8.el5 kernel-2.6.18-53.1.6.el5 How reproducible: Using any HP-UX client. I tested using both 11.00 and 11.23 Steps to Reproduce: 1. Create a GFS filesystem on the server 2. NFS export the filesystem 3. Mount on any hp-ux client 4. hp-ux$ cp anyfile /to/nfs/fs/on/rhas5 Actual results: hp-ux$ cp: cannot create anyfile: Permission denied Expected results: No error Additional info: HP-UX clients first create the new file using the NFS procedure CREATE (using UNCHECKED and mode=0) and the server returns NFS3_EACCESS. I traced the -EACCES error to the generic_permission kernel function. Apparently ext3 and xfs filesystems do not use generic_permission but gfs does and it returns -EACCES to the nfsd. I have created a simple patch to check for the -EACCES error and allow access if the FSUID=Inode-UID. This resolves the problem but is probably not the best way to fix the bug. I will attach the patch for reference. Hopefully those more knowlegable than I can determine the correct fix needed to resolve the root cause of this bug.
Created attachment 293760 [details] Patch to nfsd_setattr to ignore -EACCES error caused by the GFS filesystem
I don't have access to an hp-ux box, so I have no good way to debug this. However, I do know that the extended attribute code in GFS was recently changed to remove the permission() calls for a different reason. This is documented as bugzilla bug #323111, crosswritten from GFS2 to GFS1. Since this is an internal bugzilla record, I don't know if you will have permission to view it, and I don't have authority to change that. However, the fix went into RHEL in kmod-gfs-0.1.21-1, and Centos will probably be the same. Can you upgrade to that version or higher and see if this is still a problem? Thanks.
How would I get this kmod-gfs-0.1.21-1? I just did a yum update and I have the latest, kmod-gfs-0.1.16-5 I will test against this gfs module if I can get it...
Can you post the location of your yum repository? In other words, the contents of /etc/yum.repos.d/RHEL5.repo or the Centos equivalent. If there isn't a newer version, you can either wait for a new update (I don't know how Centos works or how often they do updates) or alternately, since you've done some experimentation, I suppose you could compile the latest gfs from source code by doing something like this: First, fetch the source code from CVS. Something like this: cvs -d :pserver:sources.redhat.com:/cvs/cluster checkout -r RHEL5 cluster Then compile by doing something like: cd cluster/gfs-kernel/src/gfs .configure --kernel_src=/usr/src/kernels/whatever-your-kernel-is make I don't recommend doing a make install, but you can manually load the gfs module by doing insmod gfs.ko from that directory. Then mount the file system and try to recreate the problem. Of course, you'll first have to boot to the kernel that doesn't have the circumvention patch you attached to see if it it's broken or fixed.
I should have mentioned this in my update, I have built a Red Hat AS 5 system just to test and work on this bug. So I am using RHAS5, the release file lists; Red Hat Enterprise Linux Server release 5.1 (Tikanga) I installed 5.0 and then did a "yum update" to get current. When I ran the update that brought me up to kernel-2.6.18-53.1.6.el5. I have not changed any of the yum config files. The only thing I see on my system under /etc/yum.repos.d is the single file rhel-debuginfo.repo, so I assume that I am using the default repos?
Tested a current build from the CVS repo, problem still exists. I used the cvs command given previously to pull the current cluster source. Compiled it and reloaded the gfs module, dmesg showed; GFS <CVS> (built Feb 6 2008 13:19:52) installed So it would appear the new module actually loaded OK. I mounted the GFS test filesystem, started NFSD and tested using the HP-UX NFS client. Same results, the latest updates to the GFS module do not appear to address this problem.
Thanks for checking that out. I'm trying to find/borrow an hp-ux client machine within Red Hat so I can debug this. I've tried several channels already and come up empty handed, but I'm not out of options yet. In the mean time, it might help me if you collect an Ethereal / Wireshark trace of this problem so I can analyze the requests that hp-ux is actually sending to the NFS server. The smaller the trace, the easier it will be for me to read.
I already have one, I should have attached it from the beginning. Attaching the tcpdump file now... Also, I added some simple printk debugging to the NFSD code in the kernel and found it to be following this path (much detail not shown); nfsd_proc_create nfsd_create_v3 vfs_create - Note: the create returns OK nfsd_setattr notify_change generic_permission - Note: returns a -EACCES error! In case that might help...
Created attachment 294161 [details] HP-UX NFS Client tcpdump trace Here's the tcpdump I had from my testing against CentOS5. It should be the same as my RHAS5 as it appears to have the same defect.
I finally got access to a real hp-ux machine and tried to recreate the problem. It did not fail. The client was 11.23 on ia64: # uname -a HP-UX xxxxx B.11.23 U ia64 1756071376 unlimited-user license For my server I used a RHEL5.2 prototype / pre-release system, i686. I copied three different files over NFS of various sizes: 1K, 150K and 1.5MB. I verified on the server that the files were copied successfully with no error messages and the contents were correct. I was running as root on both client and server, with no firewall between them and no root squash. My /etc/exports looked like this: [root@kool gfs]# cat /etc/exports /mnt/gfs *(rw,insecure,no_root_squash) I haven't tried it on a true 5.1 server machine. Dan, are you sure this wasn't a problem with selinux on the server or various firewalls (iptables, etc.) interfering with the copy?
I am pretty much sure... I normally configure my Linux NFS servers with iptables off and selinux disabled. And, I was able to trace the NFS call into the nfsd layer in the kernel and down to the generic_permission() kernel function. So I know the NFS CREATE client RPC is not being blocked. If I recall, a simple cp of a file that does not yet exist on the NFS server using gfs filesystems is all that is required to get the failure. I think a simple "mkdir" also fails.
I carefully looked at your test setup and I missed a detail on the first read... It will NOT fail if you are running as root on the client. I knew that but I did not read your posting carefully enough. Try it again using a regular account on the HP-UX client.
By using a non-root user, I've recreated the problem using the hp-ux machine. I've also compared what's happening against a Fedora 8 nfs client with suggestions from Steve Dickson. The difference between the two calls is this: The Fedora 8 nfs client specifies a create mode of 2 (exclusive) whereas the hp-ux client specifies a create mode of 0 (unchecked). Now I'll backtrack how create mode 0 is handled by nfs and gfs as opposed to ext3.
I tested the same set of commands using: (1) gfs, (2) gfs2, (3) ext3, (4) xfs All of them behaved the same way with the same client/server pair. If the client is hp-ux and the user was not root, they gave permission denied. If the user was root, they worked properly. If the client was f8, they worked properly. So this looks to me like an nfs problem, not a gfs problem.
I ran my tests again on my RHEL 5 server using both gfs and ext3. On my server the hp-ux client fails when the backing store is; gfs - fails as described ext3 - no failure Maybe your test system is using a ext3 update where they also started calling generic_permission? Just a guess on my part... Anyhow, it may really be a nfsd problem (that's where my simple patch was made) that just surfaced with GFS first, and then in later as other filesystems were updated?
I tried out an nfsd patch that had been posted for bug #432690, but it didn't solve the problem. I also verified that it still fails on ext3 with my NFS server. If what Dan says is true, that ext3 works for him, then we may be dealing with two problems: one that's keeping nfsd from working on my server, and one that's making gfs fail on Dan's.
Something you may want to consider... In my limited research on this problem I discovered that this generic_permission() was introduced starting in kernel 2.6.10 and that it was recommended that all filesystems start using this function instead of older methods (in the filesystem code I believe). I read that XFS was already changed, but when I tested with RHEL5 it appeared that change was not yet made for XFS or ext3. Just GFS was calling this generic_permission and when it returns with -EACCES that causes the failure in the NFSD code to be returned. So, you may want to verify if you are running more recent kernels than I am that this generic_permission() has not been introduced into ext3, XFS, etc... Just a thought, if it's true then RHEL5 will be broken for all HP-UX NFS clients for more filesystems than just GFS as is my current case at 2.6.18-53.1.6.el5
Any update on this one? I know Bob looked at it from a GFS viewpoint, and when it was found to also be in other filesystems the problem was tagged to be a NFS kernel problem. Since then no activity and I see Bob is still the "Assigned To" person. I was expecting it to be reassigned to another kernel type person... Any updates?
I think we're dealing with two problems here, but my progress is being held up by the problem that keeps nfsd from working in this case on ext3. Therefore, I'm reassigning to Steve D.
Still trying to locate a HP-UX client...
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Changing component from gfs2-kmod to kernel so that it stops showing up on Kevin Anderson's list.
Updating PM score.
Has there been any update to this issue. I'm seeing this exact issue: Client: HP-UX wren B.11.11 U 9000/889 NFS Server: Red Hat Enterprise Linux Server release 5.3 (Tikanga) - 2.6.18-128.1.6.el5 x86_64 GFS is the underlying filesystem for the NFS server. Arwin
*** This bug has been marked as a duplicate of bug 605720 ***