Bug 114904

Summary: Incorrect NFS server response of RHEL kernel makes the solaris client abort NFS version 2 mounts
Product: Red Hat Enterprise Linux 3 Reporter: George B. Magklaras <georgios>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: anders.odberg, georgios, pere, petrides, riel, tao, t.h.amundsen, vcaruso
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-06-14 15:43:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description George B. Magklaras 2004-02-04 10:18:22 UTC
Description of problem:
When mounting NFS exported volumes from a SOLARIS 5.x clients using
the version 2 of the NFS protocol, the mount is aborted on the SOLARIS
client side. Other OS nfs clients do not experience the problem in the
process of NFS v2 mounting partitions from the RHEL server. NFS v3
mounting solves the problem, but we have to use NFS version 2 in our
environment.

Version-Release number of selected component (if applicable):
2.4.21-9EL kernel

How reproducible:


Steps to Reproduce:
1.Export a directory via NFS from the RHEL server. For example in my
RHEL server the /etc/exports file contains an entry of:

/mn/odin/u1 solarisbox(rw-sync)

2.From solarisbox, attempt to NFS mount the exported partition using
version 2 of the NFS protocol:

mount -o vers=2 linuxrhel:/mn/odin/u1 /mnt
NFS getattr failed for server linuxrhel: error 9 (RPC: Program/version
mismatch)
nfs mount: mount: /mn/odin/u1: I/O error


The same command completes with no problems on an IRIX 6.5 client.

On the contrary, from both solaris and IRIX nfs clients, v3 works:
mount -o vers=3 linuxrhel:/mn/odin/u1 /mnt


  
Actual results:


Expected results:


Additional info:

Comment 1 George B. Magklaras 2004-02-04 10:48:30 UTC
Just a little bit more info on why I think the NFS response is
"incorrect".

The problem is probably related to the fact that the Solaris Client 
asks for ACL info on the mount request. Then the response from RHEL is
something along the lines of what you can see below from the solaris logs:

solarisbox -> linuxrhel  NFS_ACL C GETATTR2 FH=033A
   linuxrhel-> solarisbox RPC R (#23) XID=4242363657 Program number
mismatch (low=2, high=3)

Is this the correct response? If I check from the solaris box via
rpcinfo, the RHEL server offers both versions of the NFS protocol:

solarisbox$ rpcinfo -t linuxrhel 100003
  program 100003 version 2 ready and waiting
  program 100003 version 3 ready and waiting

One of the Solaris admins here tells me that the nfs_acl RPC program
code is 100227:

  solarisbox$ rpcinfo -t linuxrhel 100227 
  rpcinfo: RPC: Program not registered
  program 100227 is not available
  

Obviously, I don't have ACL support with version 2 and I also know
that this is not standard in NFS. However, the Solaris team here
believes that the linuxrhel box should respond along the lines of:
program 100227 is not available 

as opposed to:
rpcinfo: RPC: Program not registered
program 100227 is not available

(the earlier part confuses SOLARIS to think that NFS Version 2 is not
available and it aborts the mount).

They also seem to think that this behavior is peculiar to the RHEL
kernel. 

So, two questions:

1) Knowing that ACL with version2 is not standard, do you think that
RHEL responds correctly in that case?

2) Would Linux vanilla kernels answer the same thing on the server
side of things? (diff indicates that there are various differences
amongst the files
net/sunrpc/svc.c
net/sunrpc/svcsock.c
include/linux/sunrpc/svcsock.h  

between 2.4.21-9EL and for example the pure 2.4.21.)

I hope that this justifies a bit better the "incorrect behavior" bit. 
GM


Comment 3 Vince Caruso 2004-03-29 18:33:37 UTC
I have recreated the exact same output as above.  I was previously 
using RH 8 to NFS mount to our Sun system and life was good.  But 
following a complete rebuild of my system to RHEL WS 3 the NFS 
problem mentioned above appeared.  I am running 2.4.21-9.0.1EL. This 
presents a serious problem as the previous suggestion to change over 
to pure 2.4.21 necesitates going to another environment other than 
RHEL since pure 2.4.21 is not offered by Red Hat. Is there a patch 
being considered by RHEL to correct this?  

I have logs of the Sun trying to mount to my Linux box but they seem 
to just be indicating I am maxing out my numbers of NFS mounts which 
is currently set to 8 by default in the NFS script file for example:

nfs mount: mount: /fullup1: I/O error
mount: /tmp is already mounted, swap is busy,
        or the allowable number of mount points has
been exceeded
NFS getattr failed for server linuxbox: error 9 (RPC:
Program/version mismatch)


Comment 4 Steve Dickson 2004-04-12 11:24:22 UTC
This is a Solaris bug.... It was identified at this years Connectathon..

The v2 Solaris client probes server with a NFS_ACL getattr to see
if version 2 of the NFS ACL protocol is supported. The RHEL 3 server
only supports version 3 of the NFS ACL protocol so an PROGMISMATCH
error is returned which is the correct thing to do... 



Comment 5 Steve Dickson 2004-04-12 11:31:56 UTC
> 1) Knowing that ACL with version2 is not standard, do you think that
>  RHEL responds correctly in that case?
Yes... 

> 2) Would Linux vanilla kernels answer the same thing on the server
> side of things? (diff indicates that there are various differences
> amongst the files
No... the Linux vanilla server would reply with would reply with
ENOTSUPPORTED which the Solaris server interprets correctly....

> Is there a patch being considered by RHEL to correct this?
Not at this moment since it is a Solaris bug and the problem
does not happen when using NFS v3...


Comment 6 Simon PC Frost 2004-07-16 18:43:48 UTC
There's an alternative workaround for this.  It's down to how the Sun 
boxes connect to the NFS share and how they rely on ACLs.  It seems 
that in Linux 9 ACLs were enabled by default - in Enterprise 3 they 
are not.  Thus, a generic mount does not provide the kind of info 
that a Sun box likes to see.  On your Linux box, when mounting the 
lump to be shared, from the command line add the -o acl switch to 
turn on the acls for that mount:-
    #mount -t ext3 -o acl /dev/sdx1 /mnt/point
or inside of /etc/fstab instead of defaults, define read/write with 
acls enabled:-
    /dev/sdx1   /mnt/point   ext3   rw,acl   1 2

Now set up the nfs shares in the normal way, and Sun boxes will love 
you forever.

-sf