Bug 441628
Summary: | NFS getacl failed for server 192.168.0.22: error 9 (RPC: Program/version mismatch) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | vinesh <vineshn> | ||||
Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | yanfu,wang <yanwang> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.1 | CC: | joshua.bakerlepain, lloucks, mwalls, rwheeler | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-01-22 19:02:57 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
vinesh
2008-04-09 06:24:24 UTC
On the server could you please post a network trace of this problem, something similar to: tshark -w /tmp/bz441628.pcap host <client> bzip2 /tmp/bz441628.pcap Steve, Are you asking me to network trace from NFS server(blrxhomes4) to NFS client(blrbld04). if means I am able to do so. following output. [root@blrxhomes4 ~]# traceroute blrbld04 traceroute to blrbld04 (192.168.1.16), 30 hops max, 40 byte packets 1 192.168.0.252 (192.168.0.252) 0.862 ms 1.558 ms 2.237 ms 2 192.168.1.16 (192.168.1.16) 0.332 ms 0.323 ms 0.309 ms [root@blrxhomes4 ~]# (In reply to comment #1) > On the server could you please post a network trace of this problem, > something similar to: > tshark -w /tmp/bz441628.pcap host <client> > bzip2 /tmp/bz441628.pcap > > Any work around on this issue? please let me know. -Vinesh Vinesh, No I asking for a Network packet trace that the command 'tshark' can do. Running the following command on the server: tshark -w /tmp/bz441628.pcap host blrbld04 While you do the mount and touch command, tshark will capture all he network traffic between the server and client and store it in the /tmp/bz441628.pcap file. These type of file can at times become fairly large. So in general I ask people to compress them with the bzip2 command before posting them bzip2 /tmp/bz441628.pcap Created attachment 302116 [details]
tshark output
As mentioned your earlier request attaching the output of tshark.
Hi, Please fix the issue ASAP, It is highly critically for us. -Vinesh Hi anyone please fix it. We cant wait anymore. Issue was pending more than a week.. IT IS HIGHLY CRITICAL...... -Vinesh Same problem, some additionally information: error seems to only to occur on Solaris systems prior to Sol 8-0204@117350-44(this worked), so my Solaris 7 (patched to current) voyager has this error. CENTOS 5.1 is also reporting the same problem. My Solaris version 5.8 Generic_108528-29 . I can't upgrade or degrade my OS level. -Vinesh This a know problem with the Solaris 8 client. Something that we found about three years ago. In the NFSACL protocol Sun came invented way back when, there are two version: NFSACLv2 and NFSACLv3. Solaris supports both and the Linux NFS server only supports NFSACLv3, which is legal with regard to the spec. So when the Solaris client sents a GETACL request it uses NFSACLv2. The RHEL server fails this request with a 'remote can't support version 2' error and tells the client the minimum version 3. What should happen is the Solaris client retry the GETACL request using NFSACLv3. Unfortunately the Solaris client error out the call instead of retrying. (Using wireshark, see packets 125 and 126s) Again, this bug was identified awhile back so there is a good possibility SUN has a fix for it. Ok, sounds reasonable, but works with kernel 2.6.23-80.fc7 but not with rhel5 (2.6.18-53.1.14.el5) on Solaris7 (or early Sol8). -Monty. In my case With RHEL 5.0 kernel 2.6.18-8.el5 was working fine(solaris 8) after updated to RHEL 5.1 2.6.18-53.1.14.el5, this issue started. -Vinesh Ok... after further reivew, it appears the Solaris 8 client is using NFSACLv3. Today I realized the problem in Comment #10 only happen with NFS version 2, not NFS version 3. Sorry for the confusion. Question: On the RHEL system, can ACLs be set on the local filesystem? Just try to see if ACL supported is turned on in the local filesysem. (I'm assuming ext3 is the local filesystem) Also, are there any type of error messages in /var/log/messages? I am able to set permission for the particular share. *****************NFS share partition directory************* [root@blrxhomes4 test]# pwd /build/hwdev/test [root@blrxhomes4 test]# ls date dir [root@blrxhomes4 test]# setfacl -m user:vineshn:rwx date setfacl: date: Operation not supported *****************/tmp directory****************** If i tried it in /tmp directory it is working. [root@blrxhomes4 test]# pwd /tmp/test [root@blrxhomes4 test]# ls date [root@blrxhomes4 test]# setfacl -m user:vineshn:rwx date [root@blrxhomes4 test]# getfacl date # file: date # owner: root # group: root user::rw- user:vineshn:rwx group::rw- mask::rwx other::rw- **************Partition is ext3************** [root@blrxhomes4 test]# mount . . . /dev/mapper/vg01-lvol1 on /qasrv type ext3 (rw) /dev/mapper/vg01-lvol2 on /build type ext3 (rw) . . ******************************* ********/var/log/messages*************** yesterday restarted the nfs service. [root@blrxhomes4 ~]# cat /var/log/messages Apr 13 04:02:02 blrxhomes4 syslogd 1.4.1: restart. Apr 14 02:00:04 blrxhomes4 ntpdate[2048]: step time server 192.168.0.5 offset 2.953391 sec Apr 15 02:00:04 blrxhomes4 ntpdate[5043]: step time server 192.168.0.5 offset 2.979186 sec Apr 15 19:33:20 blrxhomes4 rpc.statd[2916]: Caught signal 15, un-registering and exiting. Apr 15 19:33:20 blrxhomes4 portmap[7501]: connect from 127.0.0.1 to unset(status): request from unprivileged port Apr 15 19:33:20 blrxhomes4 rpc.statd[7523]: Version 1.0.9 Starting Apr 15 19:33:21 blrxhomes4 mountd[3317]: Caught signal 15, un-registering and exiting. Apr 15 19:33:21 blrxhomes4 kernel: nfsd: last server has exited Apr 15 19:33:21 blrxhomes4 kernel: nfsd: unexporting all filesystems Apr 15 19:33:21 blrxhomes4 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Apr 15 19:33:21 blrxhomes4 kernel: NFSD: starting 90-second grace period Apr 15 19:33:21 blrxhomes4 rpc.statd[7523]: Caught signal 15, un-registering and exiting. Apr 15 19:33:21 blrxhomes4 portmap[7692]: connect from 127.0.0.1 to unset(status): request from unprivileged port Apr 15 19:33:21 blrxhomes4 rpc.statd[7697]: Version 1.0.9 Starting Apr 15 19:33:30 blrxhomes4 mountd[7654]: Caught signal 15, un-registering and exiting. Apr 15 19:33:30 blrxhomes4 kernel: nfsd: last server has exited Apr 15 19:33:30 blrxhomes4 kernel: nfsd: unexporting all filesystems Apr 15 19:33:30 blrxhomes4 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Apr 15 19:33:30 blrxhomes4 kernel: NFSD: starting 90-second grace period Apr 15 19:33:49 blrxhomes4 kernel: svc: unknown version (3) Apr 16 02:00:04 blrxhomes4 ntpdate[8460]: step time server 192.168.0.5 offset 2.967682 sec -Vinesh I am UNABLE to set permission for the particular NFS share (In reply to comment #14) > I am able to set permission for the particular share. > > *****************NFS share partition directory************* > [root@blrxhomes4 test]# pwd > /build/hwdev/test > [root@blrxhomes4 test]# ls > date dir > [root@blrxhomes4 test]# setfacl -m user:vineshn:rwx date > setfacl: date: Operation not supported > > > > *****************/tmp directory****************** > If i tried it in /tmp directory it is working. > > [root@blrxhomes4 test]# pwd > /tmp/test > [root@blrxhomes4 test]# ls > date > [root@blrxhomes4 test]# setfacl -m user:vineshn:rwx date > [root@blrxhomes4 test]# getfacl date > # file: date > # owner: root > # group: root > user::rw- > user:vineshn:rwx > group::rw- > mask::rwx > other::rw- > > **************Partition is ext3************** > > > [root@blrxhomes4 test]# mount > . > . > . > /dev/mapper/vg01-lvol1 on /qasrv type ext3 (rw) > /dev/mapper/vg01-lvol2 on /build type ext3 (rw) > . > . > > > ******************************* > > > ********/var/log/messages*************** yesterday restarted the nfs service. > > > [root@blrxhomes4 ~]# cat /var/log/messages > Apr 13 04:02:02 blrxhomes4 syslogd 1.4.1: restart. > Apr 14 02:00:04 blrxhomes4 ntpdate[2048]: step time server 192.168.0.5 offset > 2.953391 sec > Apr 15 02:00:04 blrxhomes4 ntpdate[5043]: step time server 192.168.0.5 offset > 2.979186 sec > Apr 15 19:33:20 blrxhomes4 rpc.statd[2916]: Caught signal 15, un-registering and > exiting. > Apr 15 19:33:20 blrxhomes4 portmap[7501]: connect from 127.0.0.1 to > unset(status): request from unprivileged port > Apr 15 19:33:20 blrxhomes4 rpc.statd[7523]: Version 1.0.9 Starting > Apr 15 19:33:21 blrxhomes4 mountd[3317]: Caught signal 15, un-registering and > exiting. > Apr 15 19:33:21 blrxhomes4 kernel: nfsd: last server has exited > Apr 15 19:33:21 blrxhomes4 kernel: nfsd: unexporting all filesystems > Apr 15 19:33:21 blrxhomes4 kernel: NFSD: Using /var/lib/nfs/v4recovery as the > NFSv4 state recovery directory > Apr 15 19:33:21 blrxhomes4 kernel: NFSD: starting 90-second grace period > Apr 15 19:33:21 blrxhomes4 rpc.statd[7523]: Caught signal 15, un-registering and > exiting. > Apr 15 19:33:21 blrxhomes4 portmap[7692]: connect from 127.0.0.1 to > unset(status): request from unprivileged port > Apr 15 19:33:21 blrxhomes4 rpc.statd[7697]: Version 1.0.9 Starting > Apr 15 19:33:30 blrxhomes4 mountd[7654]: Caught signal 15, un-registering and > exiting. > Apr 15 19:33:30 blrxhomes4 kernel: nfsd: last server has exited > Apr 15 19:33:30 blrxhomes4 kernel: nfsd: unexporting all filesystems > Apr 15 19:33:30 blrxhomes4 kernel: NFSD: Using /var/lib/nfs/v4recovery as the > NFSv4 state recovery directory > Apr 15 19:33:30 blrxhomes4 kernel: NFSD: starting 90-second grace period > Apr 15 19:33:49 blrxhomes4 kernel: svc: unknown version (3) > Apr 16 02:00:04 blrxhomes4 ntpdate[8460]: step time server 192.168.0.5 offset > 2.967682 sec > > > -Vinesh > It appears as if ACLs are not supported on the /build file system. Is this correct? The attempt to set an ACL directly on to the /build file system, on the server, would seem to indicate this. If so, then the problem would appear to be in how the error is handled and propagated back? (In reply to comment #16) > It appears as if ACLs are not supported on the /build file system. Is > this correct? Yes >The attempt to set an ACL directly on to the /build > file system, on the server, would seem to indicate this. > > If so, then the problem would appear to be in how the error is handled > and propagated back? I am not getting the point can you explain briefly. Some more information. ******As, I am able to set ACL on /tmp directory*********** ***** I am sharing file from /tmp directory************ [root@blrxhomes4 test]# pwd /tmp/test [root@blrxhomes4 test]# ls -l total 4 -rw-rwxrw-+ 1 root root 0 Apr 16 20:45 date [root@blrxhomes4 test]# cat /etc/exports /tmp/test *(rw,sync,no_root_squash) /build blrbld04(rw,sync,no_root_squash) ********NFS client blrbld04 (SOLARIS 8)********* bash-2.03# uname -a SunOS blrbld04 5.8 Generic_108528-29 sun4u sparc SUNW,Sun-Fire-V210 bash-2.03# pwd /tmp/test bash-2.03# mount blrxhomes4:/tmp/test/ /tmp/test/test_mount/ bash-2.03# cd test_mount/ bash-2.03# ls -l NFS getacl failed for server blrxhomes4: error 9 (RPC: Program/version mismatch) NFS getacl failed for server blrxhomes4: error 9 (RPC: Program/version mismatch) total 8 -rw-rwxrw- 1 root root 0 Apr 16 20:45 date bash-2.03# touch datest NFS getacl failed for server blrxhomes4: error 9 (RPC: Program/version mismatch) touch: datest cannot create -Vinesh Hi, Any update on this? -Vinesh Would you happen to know if this problem exists in a straight 5.1 (2.6.18-53) kernel? 2.6.18-53.1.14 is a z-stream kernel so I'm trying to figure out when this breakage occurred. With old kernel( 2.6.18-8.el5 ) it was working fine, after updated to new kernel( 2.6.18-53.1.14.el5) issue started. This was i mentioning from the beginning. I have update old kernel( 2.6.18-8.el5 ) to new kernel( 2.6.18-53.1.14.el5) directly. -Vinesh Any update on this? According to this: http://bugs.centos.org/view.php?id=2727 the problem may be as simple as having the min/max rpc program versions backwards Program Version Minimum: 3 Program Version Maximum: 0 This appears to work correctly in the current RHEL-5 kernels. I tried it on 2.6.18-93.el5 and the proper values were returned. Perhaps RHEL-5.2 could be tried? Vinesh, Are you still seeing this problem a more recent kernel? |