Bug 137221

Summary: nfs4 mount hang: decode_getfattr: xdr error 10008!
Product: [Fedora] Fedora Reporter: Robinson Tiemuqinke <hahaha_30k>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED NEXTRELEASE QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: k.georgiou, mattdm, orion
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-12 16:48:00 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
Several kernel oops up to a total freeze none

Description Robinson Tiemuqinke 2004-10-26 15:30:16 EDT
Description of problem:

After a typical FC3Test3 installed, I follow the instructions at 
http://www.vanemery.com/Linux/NFSv4/NFSv4-no-rpcsec.html to setup 
NFS4 shares and testing mounting them.

But The mount operation hangs forever at the FC3Test3 NFS4 client 
side, while on FC3Test3 NFS4 server side, there are error messages 
logged in /var/log/messages file:

Oct 26 12:00:18 FC3Test3testServer kernel: decode_getfattr: xdr error 
10008!

Only the FC3Test3 NFS4 server has the problem. while the following 
other client|server combinations works just fine.

 FC2 NFS4 Server <-> FC2 NFS4 client
 FC2 NFS4 Server <-> FC3Test3 NFS4 client

 while these two failes:
 FC3Test3 NFS4 Server <-> FC3Test3 client
 FC3Test3 NFS4 Server <-> FC2 client

 At FC2 NFS4 client side(when the server is FC3Test3 NFS4), there are 
more error messages logged in /var/log/messages:

Oct 26 12:15:24 FC2testClient kernel: decode_getfattr: xdr error 
10008!
Oct 26 12:15:24 FC2testClient kernel: nfs4_map_errors could not 
handle NFSv4 error 10008
Oct 26 12:15:24 FC2testClient kernel: nfs_get_root: getattr error = 5
Oct 26 12:15:24 FC2testClient kernel: nfs_read_super: get root inode 
failed

 the portmap, rpcidmapd, nfslock, nfs services are all started. And 
we can see that NFS4 shares are exported on FC3Test3 Server.

[root@FC3Test3testServer NFS4]# exportfs -v
/0/NFS4         localhost.localdomain
(rw,async,wdelay,insecure,root_squash,no_subtree_check,fsid=0)
/0/NFS4         10.0.0.0/255.255.240.0
(rw,async,wdelay,insecure,root_squash,no_subtree_check,fsid=0)


Version-Release number of selected component (if applicable):

    nfs-utils-1.0.6-34  util-linux-2.12a-10 kernel-2.6.8-1.541
 
How reproducible:
   Every Time.

Steps to Reproduce:

1.Install FC3Test3, select the NFS related compoment.
2.setup NFS4 share according instructions at 
http://www.vanemery.com/Linux/NFSv4/NFSv4-no-rpcsec.html .
3.testing NFS4 mounting with "mount -t nfs4 FC3Test3Server:/ /mnt" 
and it hangs.
  
Actual results:
 NFS4 mount operation hangs.

Expected results:
mount without any problems and return. just like NFSv2|v3. 

Additional info:
Comment 1 Steve Dickson 2004-10-28 15:58:12 EDT
Previously v4 clients didn't know how to handle DELAY errors 
(i.e. 10008). This was fixed in the 2.6.9 kernels 
(i.e. update your kernel)
Comment 2 Orion Poplawski 2005-10-10 11:19:59 EDT
I'm seeing this with 2.6.12-1.1456_FC4smp

Oct  7 03:45:17 alexandria kernel: decode_getfattr: xdr error 10008!
Oct  7 03:45:36 alexandria last message repeated 4 times
Oct  7 03:45:38 alexandria kernel: nfs4_cb: server 192.168.0.13 not responding,
timed out
Oct  7 03:45:38 alexandria kernel: nfs4_cb: server 192.168.0.13 not responding,
timed out
Oct  7 03:45:52 alexandria kernel: decode_getfattr: xdr error 10008!
Oct  7 03:46:08 alexandria kernel: decode_getfattr: xdr error 10008!
Oct  7 03:46:40 alexandria last message repeated 2 times
Oct  7 03:46:56 alexandria kernel: decode_getfattr: xdr error 10008!
Oct  7 03:47:12 alexandria kernel: decode_getfattr: xdr error 10008!
Oct  7 03:47:44 alexandria last message repeated 2 times
Oct  7 03:48:00 alexandria kernel: decode_getfattr: xdr error 10008!


The decode_getfattr messages are repeating steadily.  The server and client are
the same machine.  /data4/sw1 is the mount point

root     30655     1  0 Oct08 ?        00:00:00 find /data4/sw1/wine -name
wine-*fc4*.i686.rpm ! -name *-debuginfo-* -exec cp -fu {}
/data4/sw1/fedora/cora/4/i386/os/Fedora/RPMS ;
root     30656  2516  0 Oct08 ?        00:00:00 /usr/sbin/automount --timeout=60
--ghost /data4 yp auto.data4g fstype=nfs4,intr,rsize=32768,wsize=32768
root     30657 30656  0 Oct08 ?        00:00:00 /bin/mount -t nfs4 -s -o
intr,rsize=32768,wsize=32768 alexandriag:/data1 /data4/sw1
root     30658     1  0 Oct08 ?        00:00:00 [nfsv4-svc]
Comment 3 Bernd Petrovitsch 2005-11-02 11:58:51 EST
Created attachment 120648 [details]
Several kernel oops up to a total freeze

The same error is present on FC4 with kernel-2.6.13-1.1532_FC4 on a Athlon XP
2500+.

Further I get kernel oops and after a few total freezes of the machine. The
attached file has been copied directly out of /var/log/messages without reoving
lines in between.
Comment 4 Matthew Miller 2006-07-10 16:23:06 EDT
Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.

Thank you!
Comment 5 Matthew Miller 2006-07-11 14:14:02 EDT
Someone moved this to FC4 without leaving a comment. Clearly correct given
comment #2 and comment #3 above. However, I'm going to leave this in needinfo
state, as the FC4 kernel is now up to 2.6.17-1.2141_FC4 and this may resolve the
issue...
Comment 6 Orion Poplawski 2006-07-12 16:48:00 EDT
I cannot reproduce this anymore.  Closing.