Bug 441628

Summary: NFS getacl failed for server 192.168.0.22: error 9 (RPC: Program/version mismatch)
Product: Red Hat Enterprise Linux 5 Reporter: vinesh <vineshn>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact: yanfu,wang <yanwang>
Severity: urgent Docs Contact:
Priority: low    
Version: 5.1CC: joshua.bakerlepain, lloucks, mwalls, rwheeler
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-22 19:02:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tshark output none

Description vinesh 2008-04-09 06:24:24 UTC
Description of problem:
*    NFS server - RHEL 5.1(Linux blrxhomes4 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19
07:18:46 EST 2008 x86_64 x86_64 x86_64 GNU/Linux)
*    NFS client - Solaris 8
   NFS Server name: blrxhomes4 (RHEL 5.1)
   NFS client name: blrbld04 (OS - Solairs 5.8)
  
  I am sharing file from NFS server (/build) to all machines(/build 
blrbld04(rw,sync,no_root_squash)).

 And mounted NFS share to the NFS client (without any issue).

 When i try to touch/write a file, giving following error

bash-2.03# uname -a
SunOS blrbld04 5.8 Generic_108528-29 sun4u sparc SUNW,Sun-Fire-V210
bash-2.03# pwd
/build/hwdev
bash-2.03# touch test
NFS getacl failed for server 192.168.0.22: error 9 (RPC: Program/version mismatch)
touch: test cannot create


I can reproduce same error with same configuration on different machine (Same OS)



Version-Release number of selected component (if applicable):
Linux blrxhomes4 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008 x86_64
x86_64 x86_64 GNU/Linux

How reproducible:
 Yes it is reproducible.

Steps to Reproduce:
1. mount the same nfs share with same configuration of different machine.
2.
3.
  
Actual results:

bash-2.03# touch test
NFS getacl failed for server 192.168.0.22: error 9 (RPC: Program/version mismatch)
touch: test cannot create

Expected results:
Able to touch file on NFS client.

Additional info:

 Recently I have update all Patches for this NFS server machine then the issue
started. 
 This is happening only for this particular NFS client(SOLARIS 5.8), rest all
the machine is fine.

prevision version of NFS server( I was able to write file on the NFS client):
Linux blrxhomes4 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64
x86_64 GNU/Linux

Comment 1 Steve Dickson 2008-04-09 11:50:10 UTC
On the server could you please post a network trace of this problem,
something similar to:
    tshark -w /tmp/bz441628.pcap host <client> 
    bzip2 /tmp/bz441628.pcap



Comment 2 vinesh 2008-04-09 12:10:10 UTC
Steve, 
  Are you asking me to network trace from NFS server(blrxhomes4) to NFS
client(blrbld04).
if means I am able to do so. following output.

[root@blrxhomes4 ~]# traceroute blrbld04
traceroute to blrbld04 (192.168.1.16), 30 hops max, 40 byte packets
 1  192.168.0.252 (192.168.0.252)  0.862 ms  1.558 ms  2.237 ms
 2  192.168.1.16 (192.168.1.16)  0.332 ms  0.323 ms  0.309 ms
[root@blrxhomes4 ~]#



(In reply to comment #1)
> On the server could you please post a network trace of this problem,
> something similar to:
>     tshark -w /tmp/bz441628.pcap host <client> 
>     bzip2 /tmp/bz441628.pcap
> 
> 



Comment 3 vinesh 2008-04-11 05:03:48 UTC
Any work around on this issue?

please let me know.



-Vinesh 

Comment 4 Steve Dickson 2008-04-11 10:49:09 UTC
Vinesh,

No I asking for a Network packet trace that the command 'tshark' can do.

Running the following command on the server:
    tshark -w /tmp/bz441628.pcap host blrbld04

While you do the mount and touch command, tshark will capture
all he network traffic between the server and client and
store it in the /tmp/bz441628.pcap file.

These type of file can at times become fairly large. So in
general I ask people to compress them with the bzip2 command
before posting them
    bzip2 /tmp/bz441628.pcap



Comment 5 vinesh 2008-04-11 13:25:28 UTC
Created attachment 302116 [details]
tshark output

As mentioned your earlier request attaching the output of tshark.

Comment 6 vinesh 2008-04-14 02:04:30 UTC
Hi,
  Please fix the issue ASAP, It is highly critically for us.

-Vinesh

Comment 7 vinesh 2008-04-15 06:59:37 UTC
Hi anyone please fix it. We cant wait anymore.
Issue was pending more than a week..

IT IS HIGHLY CRITICAL......

-Vinesh

Comment 8 Monty Walls 2008-04-15 13:57:16 UTC
Same problem, some additionally information: error seems to only to occur on
Solaris systems prior to Sol 8-0204@117350-44(this worked), so my Solaris 7
(patched to current) voyager has this error.  CENTOS 5.1 is also reporting the
same problem.

Comment 9 vinesh 2008-04-15 14:34:47 UTC
My Solaris version 5.8 Generic_108528-29 .

I can't upgrade or degrade my OS level.

-Vinesh

Comment 10 Steve Dickson 2008-04-15 19:17:14 UTC
This a know problem with the Solaris 8 client. Something that
we found about three years ago. 

In the NFSACL protocol Sun came invented way back when, there 
are two version: NFSACLv2 and NFSACLv3. Solaris supports
both and the Linux NFS server only supports NFSACLv3, which
is legal with regard to the spec. 

So when the Solaris client sents a GETACL request it uses
NFSACLv2. The RHEL server fails this request with a
'remote can't support version 2' error and tells the 
client the minimum version 3. What should happen is the
Solaris client retry the GETACL request using NFSACLv3.
Unfortunately the Solaris client error out the call instead
of retrying. (Using wireshark, see packets 125 and 126s) 

Again, this bug was identified awhile back so there is a good
possibility SUN has a fix for it.


Comment 11 Monty Walls 2008-04-15 19:33:35 UTC
Ok, sounds reasonable, but works with kernel 2.6.23-80.fc7 but not with
rhel5 (2.6.18-53.1.14.el5) on Solaris7 (or early Sol8).  
  -Monty.

Comment 12 vinesh 2008-04-16 02:11:44 UTC
In my case With RHEL 5.0 kernel  2.6.18-8.el5  was working fine(solaris 8) after
updated to RHEL 5.1  2.6.18-53.1.14.el5, this issue started.

-Vinesh

Comment 13 Steve Dickson 2008-04-16 14:23:01 UTC
Ok... after further reivew, it appears the Solaris 8 client is using
NFSACLv3. Today I realized the problem in Comment #10 only happen with 
NFS version 2, not NFS version 3. Sorry for the confusion.

Question: On the RHEL system, can ACLs be set on the local filesystem?
Just try to see if ACL supported is turned on in the local filesysem.
(I'm assuming ext3 is the local filesystem)

Also, are there any type of error messages in /var/log/messages? 


Comment 14 vinesh 2008-04-16 15:23:09 UTC
I am able to set permission for the particular share.

*****************NFS share partition directory*************
[root@blrxhomes4 test]# pwd
/build/hwdev/test
[root@blrxhomes4 test]# ls
date  dir
[root@blrxhomes4 test]# setfacl -m user:vineshn:rwx date
setfacl: date: Operation not supported



*****************/tmp directory******************
If i tried it in /tmp directory it is working.

[root@blrxhomes4 test]# pwd
/tmp/test
[root@blrxhomes4 test]# ls
date
[root@blrxhomes4 test]# setfacl -m user:vineshn:rwx date
[root@blrxhomes4 test]# getfacl date
# file: date
# owner: root
# group: root
user::rw-
user:vineshn:rwx
group::rw-
mask::rwx
other::rw-

**************Partition is ext3**************


[root@blrxhomes4 test]# mount
.
.
.
/dev/mapper/vg01-lvol1 on /qasrv type ext3 (rw)
/dev/mapper/vg01-lvol2 on /build type ext3 (rw)
.
.


*******************************


********/var/log/messages*************** yesterday restarted the nfs service.


[root@blrxhomes4 ~]# cat /var/log/messages
Apr 13 04:02:02 blrxhomes4 syslogd 1.4.1: restart.
Apr 14 02:00:04 blrxhomes4 ntpdate[2048]: step time server 192.168.0.5 offset
2.953391 sec
Apr 15 02:00:04 blrxhomes4 ntpdate[5043]: step time server 192.168.0.5 offset
2.979186 sec
Apr 15 19:33:20 blrxhomes4 rpc.statd[2916]: Caught signal 15, un-registering and
exiting.
Apr 15 19:33:20 blrxhomes4 portmap[7501]: connect from 127.0.0.1 to
unset(status): request from unprivileged port
Apr 15 19:33:20 blrxhomes4 rpc.statd[7523]: Version 1.0.9 Starting
Apr 15 19:33:21 blrxhomes4 mountd[3317]: Caught signal 15, un-registering and
exiting.
Apr 15 19:33:21 blrxhomes4 kernel: nfsd: last server has exited
Apr 15 19:33:21 blrxhomes4 kernel: nfsd: unexporting all filesystems
Apr 15 19:33:21 blrxhomes4 kernel: NFSD: Using /var/lib/nfs/v4recovery as the
NFSv4 state recovery directory
Apr 15 19:33:21 blrxhomes4 kernel: NFSD: starting 90-second grace period
Apr 15 19:33:21 blrxhomes4 rpc.statd[7523]: Caught signal 15, un-registering and
exiting.
Apr 15 19:33:21 blrxhomes4 portmap[7692]: connect from 127.0.0.1 to
unset(status): request from unprivileged port
Apr 15 19:33:21 blrxhomes4 rpc.statd[7697]: Version 1.0.9 Starting
Apr 15 19:33:30 blrxhomes4 mountd[7654]: Caught signal 15, un-registering and
exiting.
Apr 15 19:33:30 blrxhomes4 kernel: nfsd: last server has exited
Apr 15 19:33:30 blrxhomes4 kernel: nfsd: unexporting all filesystems
Apr 15 19:33:30 blrxhomes4 kernel: NFSD: Using /var/lib/nfs/v4recovery as the
NFSv4 state recovery directory
Apr 15 19:33:30 blrxhomes4 kernel: NFSD: starting 90-second grace period
Apr 15 19:33:49 blrxhomes4 kernel: svc: unknown version (3)
Apr 16 02:00:04 blrxhomes4 ntpdate[8460]: step time server 192.168.0.5 offset
2.967682 sec


-Vinesh


Comment 15 vinesh 2008-04-16 15:24:48 UTC
I am UNABLE to set permission for the particular NFS share


(In reply to comment #14)
> I am able to set permission for the particular share.
> 
> *****************NFS share partition directory*************
> [root@blrxhomes4 test]# pwd
> /build/hwdev/test
> [root@blrxhomes4 test]# ls
> date  dir
> [root@blrxhomes4 test]# setfacl -m user:vineshn:rwx date
> setfacl: date: Operation not supported
> 
> 
> 
> *****************/tmp directory******************
> If i tried it in /tmp directory it is working.
> 
> [root@blrxhomes4 test]# pwd
> /tmp/test
> [root@blrxhomes4 test]# ls
> date
> [root@blrxhomes4 test]# setfacl -m user:vineshn:rwx date
> [root@blrxhomes4 test]# getfacl date
> # file: date
> # owner: root
> # group: root
> user::rw-
> user:vineshn:rwx
> group::rw-
> mask::rwx
> other::rw-
> 
> **************Partition is ext3**************
> 
> 
> [root@blrxhomes4 test]# mount
> .
> .
> .
> /dev/mapper/vg01-lvol1 on /qasrv type ext3 (rw)
> /dev/mapper/vg01-lvol2 on /build type ext3 (rw)
> .
> .
> 
> 
> *******************************
> 
> 
> ********/var/log/messages*************** yesterday restarted the nfs service.
> 
> 
> [root@blrxhomes4 ~]# cat /var/log/messages
> Apr 13 04:02:02 blrxhomes4 syslogd 1.4.1: restart.
> Apr 14 02:00:04 blrxhomes4 ntpdate[2048]: step time server 192.168.0.5 offset
> 2.953391 sec
> Apr 15 02:00:04 blrxhomes4 ntpdate[5043]: step time server 192.168.0.5 offset
> 2.979186 sec
> Apr 15 19:33:20 blrxhomes4 rpc.statd[2916]: Caught signal 15, un-registering and
> exiting.
> Apr 15 19:33:20 blrxhomes4 portmap[7501]: connect from 127.0.0.1 to
> unset(status): request from unprivileged port
> Apr 15 19:33:20 blrxhomes4 rpc.statd[7523]: Version 1.0.9 Starting
> Apr 15 19:33:21 blrxhomes4 mountd[3317]: Caught signal 15, un-registering and
> exiting.
> Apr 15 19:33:21 blrxhomes4 kernel: nfsd: last server has exited
> Apr 15 19:33:21 blrxhomes4 kernel: nfsd: unexporting all filesystems
> Apr 15 19:33:21 blrxhomes4 kernel: NFSD: Using /var/lib/nfs/v4recovery as the
> NFSv4 state recovery directory
> Apr 15 19:33:21 blrxhomes4 kernel: NFSD: starting 90-second grace period
> Apr 15 19:33:21 blrxhomes4 rpc.statd[7523]: Caught signal 15, un-registering and
> exiting.
> Apr 15 19:33:21 blrxhomes4 portmap[7692]: connect from 127.0.0.1 to
> unset(status): request from unprivileged port
> Apr 15 19:33:21 blrxhomes4 rpc.statd[7697]: Version 1.0.9 Starting
> Apr 15 19:33:30 blrxhomes4 mountd[7654]: Caught signal 15, un-registering and
> exiting.
> Apr 15 19:33:30 blrxhomes4 kernel: nfsd: last server has exited
> Apr 15 19:33:30 blrxhomes4 kernel: nfsd: unexporting all filesystems
> Apr 15 19:33:30 blrxhomes4 kernel: NFSD: Using /var/lib/nfs/v4recovery as the
> NFSv4 state recovery directory
> Apr 15 19:33:30 blrxhomes4 kernel: NFSD: starting 90-second grace period
> Apr 15 19:33:49 blrxhomes4 kernel: svc: unknown version (3)
> Apr 16 02:00:04 blrxhomes4 ntpdate[8460]: step time server 192.168.0.5 offset
> 2.967682 sec
> 
> 
> -Vinesh
> 



Comment 16 Peter Staubach 2008-04-16 19:38:01 UTC
It appears as if ACLs are not supported on the /build file system.  Is
this correct?  The attempt to set an ACL directly on to the /build
file system, on the server, would seem to indicate this.

If so, then the problem would appear to be in how the error is handled
and propagated back?

Comment 17 vinesh 2008-04-17 02:18:31 UTC
(In reply to comment #16)
> It appears as if ACLs are not supported on the /build file system.  Is
> this correct?  
     Yes

>The attempt to set an ACL directly on to the /build
> file system, on the server, would seem to indicate this.
> 
> If so, then the problem would appear to be in how the error is handled
> and propagated back?
  I am not getting the point can you explain briefly.


Some more information.
******As, I am able to set ACL on /tmp directory***********
***** I am sharing file from /tmp directory************

[root@blrxhomes4 test]# pwd
/tmp/test
[root@blrxhomes4 test]# ls -l
total 4
-rw-rwxrw-+ 1 root root 0 Apr 16 20:45 date


[root@blrxhomes4 test]# cat /etc/exports
/tmp/test *(rw,sync,no_root_squash)
/build  blrbld04(rw,sync,no_root_squash)


********NFS client blrbld04 (SOLARIS 8)*********
bash-2.03# uname -a
SunOS blrbld04 5.8 Generic_108528-29 sun4u sparc SUNW,Sun-Fire-V210

bash-2.03# pwd
/tmp/test

bash-2.03# mount blrxhomes4:/tmp/test/ /tmp/test/test_mount/

bash-2.03# cd test_mount/

bash-2.03# ls -l
NFS getacl failed for server blrxhomes4: error 9 (RPC: Program/version mismatch)
NFS getacl failed for server blrxhomes4: error 9 (RPC: Program/version mismatch)
total 8
-rw-rwxrw-   1 root     root           0 Apr 16 20:45 date


bash-2.03# touch datest
NFS getacl failed for server blrxhomes4: error 9 (RPC: Program/version mismatch)
touch: datest cannot create


-Vinesh




Comment 18 vinesh 2008-04-21 04:59:18 UTC
Hi,

 Any update on this?


-Vinesh



Comment 19 Steve Dickson 2008-04-21 12:04:19 UTC
Would you happen to know if this problem exists in a straight 
5.1 (2.6.18-53) kernel? 2.6.18-53.1.14 is a z-stream kernel so
I'm trying to figure out when this breakage occurred.



Comment 20 vinesh 2008-04-21 12:13:46 UTC
With old kernel( 2.6.18-8.el5 ) it was working fine, after updated to new
kernel( 2.6.18-53.1.14.el5) issue started.

This was i mentioning from the beginning.

  I have update old kernel( 2.6.18-8.el5 ) to new kernel( 2.6.18-53.1.14.el5)
directly.

-Vinesh

Comment 21 lloucks 2008-04-28 18:15:10 UTC
Any update on this?

Comment 22 Randy Zagar 2008-05-28 22:43:06 UTC
According to this:
    http://bugs.centos.org/view.php?id=2727
the problem may be as simple as having the min/max rpc program versions backwards
    Program Version Minimum: 3
    Program Version Maximum: 0

Comment 23 Peter Staubach 2008-05-29 12:15:49 UTC
This appears to work correctly in the current RHEL-5 kernels.  I tried it
on 2.6.18-93.el5 and the proper values were returned.  Perhaps RHEL-5.2
could be tried?

Comment 24 Steve Dickson 2010-09-20 13:39:42 UTC
Vinesh, 

Are you still seeing this problem a more recent kernel?