Bug 429109

Summary: NFSD acl support broken due to typo
Product: Red Hat Enterprise Linux 5 Reporter: Steve Dickson <steved>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: aca21, coughlan, jlayton, j.s.peatfield, j, k.georgiou, orion, pasteur, person, sputhenp, staubach, tim.w.connors
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 15:06:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed rhel5 patch
none
Updated RHEL patch none

Description Steve Dickson 2008-01-17 12:26:22 UTC
Description of problem:
Due to a typo in the nfsd code, the NFS ACL support is disabled 

Version-Release number of selected component (if applicable):
kernel-2.6.18-69

How reproducible:
All the time.


Steps to Reproduce:
Using a F9 nfs client, mount a rhel5.1 server and notice
the "svc: unknown version (3)" log message.
  
Actual results:
The mount success but no ACL support.

Expected results:
The mount succeed but with ACL support.

Additional info:

Comment 1 Steve Dickson 2008-01-17 12:26:22 UTC
Created attachment 291988 [details]
proposed rhel5 patch

Comment 3 RHEL Program Management 2008-01-17 12:37:43 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Jeff Layton 2008-01-19 12:24:39 UTC
*** Bug 426138 has been marked as a duplicate of this bug. ***

Comment 6 Jonathan Peatfield 2008-01-21 10:22:08 UTC
Note that the proposed patch here is pretty much what Neil Brown suggested for
plain 2.6.19 (as I included last month) but this one doesn't correct the
.pg_name field of the structure for the nfs acl stuff from "nfsd" to "nfsacl".

That means that any errors logged by the kernel will still show the "nfsd"
string and make it slightly harder to diagnose future problems etc.

Is there a reason why that part of
https://bugzilla.redhat.com/attachment.cgi?id=289919 isn't appropriate?

BTW was there a reason for opening a new bugzilla report and marking mine a
duplicate of it rather than just putting the new info in that report?


Comment 7 Jonathan Peatfield 2008-01-21 10:29:10 UTC
Oh wait I see that your patch has an extra fix in it to correct the same typo at
case NFSD_AVAIL as well - assuming that is actually correct...



Comment 8 Steve Dickson 2008-01-21 14:48:26 UTC
Created attachment 292377 [details]
Updated RHEL patch

Updated the patch to removing the change to the
NFSD_AVAIL case and added the  '+ pg_name = "nfsacl",'
change.

Comment 9 Don Zickus 2008-01-22 18:52:25 UTC
in 2.6.18-72.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 11 Jonathan Peatfield 2008-01-29 04:11:42 UTC
In case it wasn't obvious from reading #426138 this is a _regression_ which
happened at some point between kernel-2.6.18-8.1.15 and kernel-2.6.18-53.1.4

I'd guess that it was actually introduced as part of update 1 (ie with the
update to kernel-2.6.18-53) but I may be wrong - I didn't look very closely at
the changes in all versions.

If the plan is to not roll this fix into a fix until update 2 then those of us
who use NFS will need to either run experimental (well not fully QA'd) kernels
or just apply the (fairly small) patch to each new security update kernel e.g.
kernel-2.6.18-53.1.6 etc etc.

I suppose one could argue that the lack of *working* ACLs over NFS is actually a
security problem so ought to be considered more serious than just a random bugfix.


Comment 12 Tim Connors 2008-02-20 04:50:33 UTC
re comment #8, you mention "removing the change to the
NFSD_AVAIL case", but the patch linked in that comment has that case still in.  
Of course, this is correct, as per http://www.ussg.iu.edu/hypermail/linux/
kernel/0701.1/1480.html so please make sure that the final fix for update 2 
does have the full version of this patch, and not the patch with the NFSD_AVAIL 
case removed.

Is the severity set to the correct level?  NFS servers that have to interact 
with anything other than other RHEL5 boxes need to run a kernel with this fix 
to be able to deal with acls, required for security, and also require a new 
kernel for the latest security update.  Both conditions can't be satisified 
currently (plus there are horrible throughput issues on the working kernel-
2.6.18-8.1.15 kernel).

Comment 15 Jonathan Peatfield 2008-02-20 20:49:12 UTC
Between me starting typing this and submitting it, the severity changed from
'low' to 'high'.  I don't know who did that...

The http://www.ussg.iu.edu/hypermail/linux/kernel/0701.1/1480.html seems to be
the same message as I pointed at in
https://bugzilla.redhat.com/show_bug.cgi?id=426138
ie http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-01/msg03478.html

Neither seems to have a change suggested for the NFSD_AVAIL case, though I'm
personally unclear if it is actually needed or not.  In case that isn't clear I
mean that the bit of https://bugzilla.redhat.com/attachment.cgi?id=291988 which
does:

...
 	case NFSD_AVAIL:
-		return nfsd_version[vers] != NULL;
+		return nfsd_versions[vers] != NULL;
 	}
...

isn't obviously in the patch that Neil Brown was suggesting at that point.  It
may well be that later he (or someone else) spotted that the same change needed
making in that branch too.  I'm afraid that I don't have a clue where the patch
291988 actually came from - since it was missing the fix to the pg_name it
obviously wasn't the same place that I'd already found and reported...

Re the severity the issues as I see them are that without this fix:

  talking to older linux or solaris (etc) clients causes huge numbers of errors

  talking to newer linux clients silently disables ACL support which indeed
could have security implications - suddenly people may gain access to stuff that
they shouldn't be allowed to see etc!

As to whether this makes it important enough to just add this patch into an
interim/security update isn't clear.  Currently we are forced to build our own
kernels based on -53.1.13 with this patch added and the nfs-silly-rename-fix
stuff (4 patches) disabled to have reasonable NFS client performance again.
(ie the ones that were suggested to be commented out in
https://bugzilla.redhat.com/show_bug.cgi?id=431092 comments #12, #43 etc.

I tried one of the test kernels from http://people.redhat.com/dzickus/el5/ and
it seemed to work under light load but others reported that it wasn't suitable
for production use (crashing) so we can't safely deploy that yet...

If the changes to fix these problems do make it into update 2 then good but that
is probably still some way off, and we might get a little bored of rebuilding
kernels (and having to test them ourselves) until then.


Comment 16 Tim Connors 2008-02-24 15:31:46 UTC
Jonathan, you are quite right - I managed to get myself confused (easily done 
these days) with all of these maze of twisty patches I had open on my display, 
all alike.

I just verified off a known good 2.6.24 kernel that the NFSD_AVAIL case is the 
unpluralised version of 
        case NFSD_AVAIL:
                return nfsd_version[vers] != NULL;


Comment 17 Don Howard 2008-03-11 01:40:17 UTC
*** Bug 431877 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2008-05-21 15:06:53 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html