Bug 431253

Summary: NFS CREATE fails on hp-ux clients with kernel 2.6.18-53.1.6.el5
Product: Red Hat Enterprise Linux 5 Reporter: Dan Goetzman <dan_goetzman>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED DUPLICATE QA Contact: GFS Bugs <gfs-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 5.0CC: arwin.tugade, jburke, jlayton, rpeterso, rwheeler, steved, tao
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-21 13:36:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to nfsd_setattr to ignore -EACCES error caused by the GFS filesystem
none
HP-UX NFS Client tcpdump trace none

Description Dan Goetzman 2008-02-01 19:55:35 UTC
Description of problem:

 HP-UX NFS clients fail creating a new file on a CentOS 5 NFS server updated
with the 2.6.18-53.1.6.el5 when GFS is used as the backing filesystem.

Version-Release number of selected component (if applicable):

 kmod-gfs-0.1.16-5.2.6.18_8.el5
 kernel-2.6.18-53.1.6.el5


How reproducible:

 Using any HP-UX client.
 I tested using both 11.00 and 11.23

Steps to Reproduce:
1. Create a GFS filesystem on the server
2. NFS export the filesystem
3. Mount on any hp-ux client
4. hp-ux$ cp anyfile /to/nfs/fs/on/rhas5
  
Actual results:

 hp-ux$ cp: cannot create anyfile: Permission denied

Expected results:

 No error

Additional info:

 HP-UX clients first create the new file using the NFS procedure CREATE (using
UNCHECKED and mode=0) and the server returns NFS3_EACCESS. I traced the -EACCES
error to the generic_permission kernel function. Apparently ext3 and xfs
filesystems do not use generic_permission but gfs does and it returns -EACCES to
the nfsd.

 I have created a simple patch to check for the -EACCES error and allow access
if the FSUID=Inode-UID. This resolves the problem but is probably not the best
way to fix the bug. I will attach the patch for reference. Hopefully those more
knowlegable than I can determine the correct fix needed to resolve the root
cause of this bug.

Comment 1 Dan Goetzman 2008-02-01 19:55:35 UTC
Created attachment 293760 [details]
Patch to nfsd_setattr to ignore -EACCES error caused by the GFS filesystem

Comment 2 Robert Peterson 2008-02-05 22:23:28 UTC
I don't have access to an hp-ux box, so I have no good way to debug
this.  However, I do know that the extended attribute code in GFS was
recently changed to remove the permission() calls for a different
reason.  This is documented as bugzilla bug #323111, crosswritten from
GFS2 to GFS1.  Since this is an internal bugzilla record, I don't know
if you will have permission to view it, and I don't have authority to
change that.  However, the fix went into RHEL in kmod-gfs-0.1.21-1, and
Centos will probably be the same.  Can you upgrade to that version or
higher and see if this is still a problem?  Thanks.


Comment 3 Dan Goetzman 2008-02-06 14:37:21 UTC
How would I get this kmod-gfs-0.1.21-1?
I just did a yum update and I have the latest, kmod-gfs-0.1.16-5
I will test against this gfs module if I can get it...

Comment 4 Robert Peterson 2008-02-06 16:09:22 UTC
Can you post the location of your yum repository?  In other words, the
contents of /etc/yum.repos.d/RHEL5.repo or the Centos equivalent.
If there isn't a newer version, you can either wait for a new update
(I don't know how Centos works or how often they do updates)
or alternately, since you've done some experimentation, I suppose you
could compile the latest gfs from source code by doing something like
this:

First, fetch the source code from CVS.  Something like this:

cvs -d :pserver:sources.redhat.com:/cvs/cluster checkout -r RHEL5 cluster

Then compile by doing something like:

cd cluster/gfs-kernel/src/gfs
.configure --kernel_src=/usr/src/kernels/whatever-your-kernel-is
make

I don't recommend doing a make install, but you can manually load the
gfs module by doing insmod gfs.ko from that directory.
Then mount the file system and try to recreate the problem.
Of course, you'll first have to boot to the kernel that doesn't have
the circumvention patch you attached to see if it it's broken or fixed.


Comment 5 Dan Goetzman 2008-02-06 16:34:51 UTC
I should have mentioned this in my update, I have built a Red Hat AS 5 system
just to test and work on this bug. So I am using RHAS5, the release file lists;

Red Hat Enterprise Linux Server release 5.1 (Tikanga)

I installed 5.0 and then did a "yum update" to get current. When I ran the
update that brought me up to kernel-2.6.18-53.1.6.el5. I have not changed any of
the yum config files. The only thing I see on my system under /etc/yum.repos.d
is the single file rhel-debuginfo.repo, so I assume that I am using the default
repos? 

Comment 6 Dan Goetzman 2008-02-06 19:57:57 UTC
Tested a current build from the CVS repo, problem still exists.

I used the cvs command given previously to pull the current cluster source.
Compiled it and reloaded the gfs module, dmesg showed;

GFS <CVS> (built Feb 6 2008 13:19:52) installed

So it would appear the new module actually loaded OK.
I mounted the GFS test filesystem, started NFSD and tested using the HP-UX NFS
client.

Same results, the latest updates to the GFS module do not appear to address this
problem.

Comment 7 Robert Peterson 2008-02-06 21:29:41 UTC
Thanks for checking that out.  I'm trying to find/borrow an hp-ux client
machine within Red Hat so I can debug this.  I've tried several channels
already and come up empty handed, but I'm not out of options yet.

In the mean time, it might help me if you collect an Ethereal / Wireshark
trace of this problem so I can analyze the requests that hp-ux is actually
sending to the NFS server.  The smaller the trace, the easier it will be
for me to read.


Comment 8 Dan Goetzman 2008-02-06 21:50:47 UTC
I already have one, I should have attached it from the beginning.
Attaching the tcpdump file now...

Also, I added some simple printk debugging to the NFSD code in the kernel and
found it to be following this path (much detail not shown);

nfsd_proc_create
 nfsd_create_v3
  vfs_create - Note: the create returns OK
 nfsd_setattr
  notify_change
   generic_permission - Note: returns a -EACCES error!

In case that might help...

Comment 9 Dan Goetzman 2008-02-06 21:52:36 UTC
Created attachment 294161 [details]
HP-UX NFS Client tcpdump trace

Here's the tcpdump I had from my testing against CentOS5. It should be the same
as my RHAS5 as it appears to have the same defect.

Comment 12 Robert Peterson 2008-02-13 00:30:45 UTC
I finally got access to a real hp-ux machine and tried to recreate the
problem.  It did not fail.  The client was 11.23 on ia64:

# uname -a
HP-UX xxxxx B.11.23 U ia64 1756071376 unlimited-user license

For my server I used a RHEL5.2 prototype / pre-release system, i686.
I copied three different files over NFS of various sizes: 1K,
150K and 1.5MB.  I verified on the server that the files were
copied successfully with no error messages and the contents were
correct.

I was running as root on both client and server, with no firewall
between them and no root squash.  My /etc/exports looked like this:

[root@kool gfs]# cat /etc/exports
/mnt/gfs          *(rw,insecure,no_root_squash)

I haven't tried it on a true 5.1 server machine.

Dan, are you sure this wasn't a problem with selinux on the server
or various firewalls (iptables, etc.) interfering with the copy?


Comment 13 Dan Goetzman 2008-02-13 03:11:32 UTC
I am pretty much sure...
I normally configure my Linux NFS servers with iptables off and selinux disabled.
And, I was able to trace the NFS call into the nfsd layer in the kernel and down to the generic_permission() 
kernel function. So I know the NFS CREATE client RPC is not being blocked.
If I recall, a simple cp of a file that does not yet exist on the NFS server using gfs filesystems is all that is 
required to get the failure. I think a simple "mkdir" also fails.

Comment 14 Dan Goetzman 2008-02-13 14:35:38 UTC
I carefully looked at your test setup and I missed a detail on the first read...
It will NOT fail if you are running as root on the client. I knew that but I did
not read your posting carefully enough. Try it again using a regular account on
the HP-UX client.

Comment 15 Robert Peterson 2008-02-13 17:10:10 UTC
By using a non-root user, I've recreated the problem using the hp-ux
machine.  I've also compared what's happening against a Fedora 8 nfs
client with suggestions from Steve Dickson.  The difference between
the two calls is this:

The Fedora 8 nfs client specifies a create mode of 2 (exclusive)
whereas the hp-ux client specifies a create mode of 0 (unchecked).
Now I'll backtrack how create mode 0 is handled by nfs and gfs as
opposed to ext3.


Comment 16 Robert Peterson 2008-02-13 18:32:15 UTC
I tested the same set of commands using:
(1) gfs, (2) gfs2, (3) ext3, (4) xfs

All of them behaved the same way with the same client/server pair.
If the client is hp-ux and the user was not root, they gave permission
denied.  If the user was root, they worked properly.  If the client was
f8, they worked properly.  So this looks to me like an nfs problem, not
a gfs problem.


Comment 17 Dan Goetzman 2008-02-13 19:32:16 UTC
I ran my tests again on my RHEL 5 server using both gfs and ext3. On my server
the hp-ux client fails when the backing store is;

 gfs - fails as described
 ext3 - no failure

 Maybe your test system is using a ext3 update where they also started calling
generic_permission? Just a guess on my part...

Anyhow, it may really be a nfsd problem (that's where my simple patch was made)
that just surfaced with GFS first, and then in later as other filesystems were
updated?

Comment 18 Robert Peterson 2008-02-14 16:00:12 UTC
I tried out an nfsd patch that had been posted for bug #432690,
but it didn't solve the problem.  I also verified that it still fails
on ext3 with my NFS server.  If what Dan says is true, that ext3 works
for him, then we may be dealing with two problems: one that's keeping
nfsd from working on my server, and one that's making gfs fail on Dan's.


Comment 19 Dan Goetzman 2008-02-14 16:14:25 UTC
Something you may want to consider...
In my limited research on this problem I discovered that this
generic_permission() was introduced starting in kernel 2.6.10 and that it was
recommended that all filesystems start using this function instead of older
methods (in the filesystem code I believe). I read that XFS was already changed,
but when I tested with RHEL5 it appeared that change was not yet made for XFS or
ext3. Just GFS was calling this generic_permission and when it returns with
-EACCES that causes the failure in the NFSD code to be returned.

So, you may want to verify if you are running more recent kernels than I am that
this generic_permission() has not been introduced into ext3, XFS, etc...

Just a thought, if it's true then RHEL5 will be broken for all HP-UX NFS clients
for more filesystems than just GFS as is my current case at 2.6.18-53.1.6.el5

Comment 20 Dan Goetzman 2008-03-05 20:51:25 UTC
Any update on this one?

I know Bob looked at it from a GFS viewpoint, and when it was found to also be
in other filesystems the problem was tagged to be a NFS kernel problem. Since
then no activity and I see Bob is still the "Assigned To" person. I was
expecting it to be reassigned to another kernel type person...

Any updates?

Comment 21 Robert Peterson 2008-03-05 23:00:19 UTC
I think we're dealing with two problems here, but my progress is being
held up by the problem that keeps nfsd from working in this case on
ext3.  Therefore, I'm reassigning to Steve D.


Comment 22 Steve Dickson 2008-03-06 13:44:28 UTC
Still trying to locate a HP-UX client... 

Comment 23 RHEL Program Management 2008-06-04 22:46:48 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 24 Robert Peterson 2008-07-09 18:37:54 UTC
Changing component from gfs2-kmod to kernel so that it stops showing
up on Kevin Anderson's list.


Comment 28 RHEL Program Management 2009-02-16 15:44:37 UTC
Updating PM score.

Comment 29 Arwin Tugade 2009-05-11 20:41:23 UTC
Has there been any update to this issue.  I'm seeing this exact issue:

Client: HP-UX wren B.11.11 U 9000/889
NFS Server: Red Hat Enterprise Linux Server release 5.3 (Tikanga) - 2.6.18-128.1.6.el5 x86_64

GFS is the underlying filesystem for the NFS server.

Arwin

Comment 34 Steve Dickson 2010-10-21 13:36:26 UTC

*** This bug has been marked as a duplicate of bug 605720 ***