Bug 473794 - NFS mount from Fedora 10 client to Fedora 8 server gets message mount.nfs: Unknown error 521
Summary: NFS mount from Fedora 10 client to Fedora 8 server gets message mount.nfs: Un...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 10
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
Assignee: Jeff Layton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-30 16:56 UTC by Bevis King
Modified: 2014-06-18 07:38 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-13 13:53:54 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Network capture of an attempt that demonstrates error 521 (1.45 KB, application/octet-stream)
2009-01-16 16:54 UTC, Ivan Mikhailov
no flags Details

Description Bevis King 2008-11-30 16:56:37 UTC
Description of problem:
When trying to mount an NFS partition from a Fedora 8 server on a Fedora 10 client, I get the error message:  mount.nfs: Unknown error 521.

[root@willow src]# mount gabrielle:/export/vol/intra /mnt/nfs/
mount.nfs: Unknown error 521
[root@willow src]# 

The same command from a Fedora 8 client system to the Fedora 8 server works fine.  The Fedora 10 machine works just fine as a server to the Fedora 8 machines being clients.  There is an issue with the Fedora 8 server which delays the response to a mount request that has been previously reported.

Version-Release number of selected component (if applicable):
ernel-2.6.27.5-117.fc10.x86_64

How reproducible:
Unsure at present - it's happened sometimes - other file systems are OK.

Steps to Reproduce:
1. f10server# mount f8server:/export/partition /mnt/nfs/

Actual results:
Error message: mount.nfs: Unknown error 521

Expected results:
A meaningful error message: Permission Denied, Time-Out, etc. or success.

Additional info:

Comment 1 Jeff Layton 2008-12-01 19:02:27 UTC
Strange, up until this past weekend when I updated my server to F10, I had the same configuration and NFS was working fine...

Does anything show up in dmesg when you do one of these failed mount attempts?

Comment 2 Bevis King 2008-12-02 11:06:52 UTC
Jeff

OK, I've been investigating further.  Turns out a corruption had crept into my /etc/exports file and the filesystem wasn't being exported to the f10client.  (The f10 machine is now my new home directory server and had started out life on a temporary IP address which was cached on the f8server...)

This would now appear to be a simple case of a misleading error message - if it had just said "Permission Denied" or quite possibly "Request Timed Out", that would have made life much easier.

I've moved this down to a severity low but maybe you could consider trying to fix it so that the "Unknown error 521" gets replaced by a permission denied; request timed out, or similar thing.

The situation may be somewhat unusual as the f8server takes a good thirty seconds to respond to a mount request the first time it's asked for due to the symptoms detailed in bugzilla bug 452430.  Basically due to the large number of lvm partitions (>40) the device scan gets into an endless loop, timing out before it's actually finished stat'ing that many mounted filesystems.  This means that rpc.mountd disappears into a 99% CPU load thrash for about thirty seconds, until it too times out somewhere in the code and decides that the requested node is indeed exported just fine, and makes such a return to the requesting host, just seriously belatedly.

I suspect that time out value maybe the same as that of the NFS client (in this case the f10 box) making the request.  Hence the unknown error 521 message...

Hope that clarifies things.  Thanks for looking at the bug.

Regards, Bevis.

Comment 3 Jeff Layton 2008-12-02 11:35:26 UTC
With a corrupt exports file, we may not be able to fix this. Do you have a way to reproduce this? Could you attach a copy of your corrupted exports file to this case?

On the other thing (30 seconds to respond to a mount request). Actually...that doesn't appear to be a bug in mountd, but rather one in libblkid1 that was fixed in Debian's e2fsprogs in 1.39+1.40-WIP-2007.04.07+dfsg-1. I vaguely recall a similar bug in Fedora/RHEL, but I thought it was fixed quite some time ago. F8 now uses e2fsprogs-1.40.4-3.fc8. Is this still a problem with that package?

If so you may want to transition bug 452430 to an e2fsprogs bug, though I'm not sure whether it'll be fixed this close to F8's EOL.

You may also (though I'm not certain) be able to work around that by assigning explicit fsid values to your exports. See the fsid option in exports(5).

Comment 4 Bevis King 2008-12-23 21:47:34 UTC
Jeff - I'm sorry I no longer have an appropriate exports file - I didn't keep an archive of older versions.  I think you'd actually get the same effect if the partition was simply not exported to that host.

Regards, Bevis.

Comment 5 Ivan Mikhailov 2009-01-16 12:06:05 UTC
I see this bug trying to connect to FC8 box from either of two FC10 boxes. Both FC10 boxes are connected to each other fine and FC8 box is connected to them as well. The problem did not exist while all of them were FC7 and FC8.

The update
 Package                 Arch       Version          Repository        Size
=============================================================================
Updating:
 e2fsprogs               i386       1.40.4-3.fc8     updates-newkey    610 k
Installing for dependencies:
 device-mapper-devel     i386       1.02.22-1.fc8    fedora            137 k
Updating for dependencies:
 e2fsprogs-devel         i386       1.40.4-3.fc8     updates-newkey    644 k
 e2fsprogs-libs          i386       1.40.4-3.fc8     updates-newkey    138 k

of FC8 box did not change anything.

What is really interesting is that sometimes I get NFS clients connected to the FC8 box, but it seems to me that this is possible only within some interval after reboot of FC8 but not right after the reboot.

/etc/exports file on FC8 is trivial:

/ *.iv.dev.null(rw,sync,no_root_squash)
/huge1 *.iv.dev.null(rw,sync,no_root_squash)
/huge2 *.iv.dev.null(rw,sync,no_root_squash)
/oldroot *.iv.dev.null(rw,sync,no_root_squash)

All four exports are whole mounts of ext3 partitions, no special mount parameters used. The corresponding lines of /etc/fstab of FC10 clients are

10.1.1.1:/              /master                 nfs     rsize=8192,wsize=8192,timeo=14,intr 0 0
10.1.1.1:/huge1/        /master/huge1           nfs     rsize=8192,wsize=8192,timeo=14,intr 0 0
10.1.1.1:/huge2/        /master/huge2           nfs     rsize=8192,wsize=8192,timeo=14,intr 0 0
10.1.1.1:/oldroot/      /master/oldroot         nfs     rsize=8192,wsize=8192,timeo=14,intr 0 0

The error 521 message is returned instantly, that is surely not a timeout.

So I have a live specimen, feel free to ask questions or ask to run some tests.

Comment 6 Jeff Layton 2009-01-16 12:25:31 UTC
This error is:

#define EBADHANDLE      521     /* Illegal NFS file handle */

...which sounds like either the client or the server is sending along bad filehandles. What might be best is a binary network capture of a mount attempt between these two hosts. Something like this from the client:

# tcpdump -i [ifname] -s0 -w /tmp/mount-attempt.pcap host [server]

...then attempt the mount. After it fails, ^c the capture and attach the file to the case so I can have a look at what's happening on the wire.

Comment 7 Ivan Mikhailov 2009-01-16 16:54:39 UTC
Created attachment 329229 [details]
Network capture of an attempt that demonstrates error 521

The FC8 NFS server is on box named master.iv.dev.null with IP address 10.1.1.1 .
The FC10 NFS client is on box named octo.iv.dev.null with IP address 10.1.1.16 .
The capture is made by
tcpdump -i eth0 -s0 -w /tmp/mount-attempt.pcap host 10.1.1.1 and port 2049
14 packets captured, 14 received by filter, 0 dropped by kernel.
The tail of /var/log/messages at server box contains only one relevant line:
Jan 16 22:43:02 master mountd[7404]: authenticated mount request from octo.iv.dev.null:817 for / (/)

Comment 8 Jeff Layton 2009-01-17 01:24:36 UTC
the "port 2049" clause here is excluding all of the mountd traffic. We'll need to see that to see whether the server is sending a bogus filehandle or the client is mangling it somehow:

I do see this in the capture:

 10   0.000978    10.1.1.16 -> 10.1.1.1     NFS V3 FSINFO Call, FH:0x00007f89
 11   0.001195     10.1.1.1 -> 10.1.1.16    NFS V3 FSINFO Reply (Call In 10) Error:NFS3ERR_BADHANDLE

...so that's where the error is coming from. The client is sending a filehandle and the server is rejecting it. The problem is that without the mountd communications we can't tell whether this is a client or server problem.

Please redo the capture w/o filtering on the port...

Comment 9 Ivan Mikhailov 2009-01-17 04:19:36 UTC
The traffic between these two boxes is too big to be captured entirely (they are part of instrumental server farm used for RDBMS development). I've schedule them for maintainamce for Tuesday and it will be possible to re-try with all moisy services switched off and only NFS in use.

Comment 10 Jeff Layton 2009-01-17 11:29:57 UTC
The capture shouldn't take long and I can filter out what we don't need to see. Another option is to determine what ports mountd is listening on the server and add those to the filter. Do:

$ rpcinfo -p [server]

...and look for the mountd service. Usually there will be 2 ports, one for TCP and one for UDP. Be sure to get both:

The filter will then look like:

tcpdump -i eth0 -s0 -w /tmp/mount-attempt.pcap host 10.1.1.1 and port 2049 and port mountd_udp_port and port mountd_tcp_port

...that may give enough info to go on here.

Comment 11 Jeff Layton 2009-01-26 20:53:37 UTC
Someone at RH pinged me internally on this. They had a RHEL5 server and a F10 client. The interesting parts from the mount attempt:

 18   0.004937 10.11.243.176 -> 10.11.243.135 MOUNT V3 MNT Call /exports
 19   0.005181 10.11.243.135 -> 10.11.243.176 MOUNT V3 MNT Reply (Call In 18) Error:ERR_ACCESS

...and...

 42   0.012930 10.11.243.176 -> 10.11.243.135 NFS V3 FSINFO Call, FH:0x00007f34
 43   0.012974 10.11.243.135 -> 10.11.243.176 NFS V3 FSINFO Reply (Call In 42) Error:NFS3ERR_BADHANDLE

...so it looks like we got an error from the mount request but the client still tried to do the mount anyway. In his case, he had exported /export (no "s" on the end) and had just fat-fingered the mount command.

This may just be bad error handling by the mount helper program.

Comment 12 Andreas Berendsen 2009-02-09 21:41:52 UTC
I had the same problem with my FC10. Onmy case, I have two boxes on the same subnet (192.168.1.0/24) and a third box which is a virtual box guest running in one of the previous servers.

I was able to NFS mount from my client to the NFS server.
I was NOT able to mount the NFS shares from the server into the server.
I was NOT able to mount the NFS shares from the virtual box guest.

Checking /var/log/messages, I found

Feb 10 10:19:35 storage mountd[19664]: refused mount request from 192.168.1.50 for /var/cache/yum/updates/packages (/var/cache/yum/updates/packages): illegal port 58490
Feb 10 10:19:41 storage mountd[19664]: refused mount request from 192.168.1.50 for /var/cache/yum/fedora/packages (/var/cache/yum/fedora/packages): illegal port 58497
Feb 10 10:19:41 storage mountd[19664]: refused mount request from 192.168.1.50 for /var/cache/yum/updates/packages (/var/cache/yum/updates/packages): illegal port 58501
Feb 10 10:19:41 storage mountd[19664]: refused mount request from 192.168.1.50 for /store (/store): illegal port 58505

After a little research, the problem was caused by the VB NAT. Adding "insecure"  to my /etc/exports entries fixed the problem.

I hope this can help with your problem.

Comment 13 Jeff Layton 2009-02-13 13:53:54 UTC
This turns out to be a kernel problem. The mount client there isn't recognizing the error properly. F11 doesn't have this problem. When I try to mount an export that doesn't exist, I get a proper error:

# mount -t nfs salusa:/exportfoo /mnt/test
mount.nfs: access denied by server while mounting salusa:/exportfoo

...looks like this was fixed in 2.6.28. I'm going to close this with a resolution of UPSTREAM. Should be fixed when f10 moves to 2.6.28 kernels or later.


Note You need to log in before you can comment on or make changes to this bug.