Bug 740024
Summary: | nfs: Mount occasionally fails with EIO | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Anders Blomdell <anders.blomdell> | ||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 15 | CC: | gansalmon, itamar, jlayton, jonathan, kernel-maint, madhu.chinakonda, pasteur, steved | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-09-21 12:58:51 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anders Blomdell
2011-09-20 18:11:37 UTC
Created attachment 524079 [details]
Script to trigger mount bug
(In reply to comment #0) > Expected results: > > infinite number of mounts > Sounds like reserved port exhaustion. Does this problem go away if you do the mounts with "-o resvport"? "-o resvport" gives same problem, with "-o noresvport" the numer of sockets in TIME_WAIT seems to stabilize around 700, will see if I can increase mount rate to trigger bug. Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in the vain hope that it would eventually retry? Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?), should it then be made an option? Or should I just accept that the problem exists and use noresvport and cross my fingers that it won't bite me again? (In reply to comment #4) Sorry, I meant -o noresvport before... > Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in > the vain hope that it would eventually retry? I don't think that will work. > Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?), > should it then be made an option? I suspect that would get a chilly reception upstream, but you're welcome to propose it there. You'll need some way to deal with "stray" packets that come in after the socket has been closed and reused. > Or should I just accept that the problem exists and use noresvport and cross my > fingers that it won't bite me again? That's probably what I'd recommend. Reserved ports are a limited resource so you'll never get an infinite number of connections with them. Note that the reproducer you have represents the pessimal case. It mounts only to immediately unmount again. If you have multiple mounts to the same server, then sockets will be shared between them. So, for instance if you were to loop and do a ton of mounts to one server and then another loop to do a ton of unmounts then that will just make one socket (and probably will share superblocks too). What exactly were you doing when you got "bitten" by this? Do you really have that many individual NFS servers? Or were you mounting and unmounting in quick succession like this for some other reason? (In reply to comment #5) > (In reply to comment #4) > > Sorry, I meant -o noresvport before... I forgot a smiley :-) > > Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in > > the vain hope that it would eventually retry? > > I don't think that will work. OK, I won't try that at home then :-) > > Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?), > > should it then be made an option? > > I suspect that would get a chilly reception upstream, but you're welcome to > propose it there. You'll need some way to deal with "stray" packets that come > in after the socket has been closed and reused. I'm not surprised. > > Or should I just accept that the problem exists and use noresvport and cross my > > fingers that it won't bite me again? > > That's probably what I'd recommend. Reserved ports are a limited resource so > you'll never get an infinite number of connections with them. Note that the > reproducer you have represents the pessimal case. It mounts only to immediately > unmount again. I know, this was the reproducible case :-) Would have been easier to track down if EADDRINUSE did not get transmogrified into EIO. > If you have multiple mounts to the same server, then sockets will be shared > between them. So, for instance if you were to loop and do a ton of mounts to > one server and then another loop to do a ton of unmounts then that will just > make one socket (and probably will share superblocks too). > > What exactly were you doing when you got "bitten" by this? Do you really have > that many individual NFS servers? Or were you mounting and unmounting in quick > succession like this for some other reason? It ususally only bites during busy nights (backups, etc) when ypbind and ypserv seems to eat reserved ports (judged from TIME_WAIT status during light load). echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse Is an OK workaround here, server is local and only switches, so stray packets should be very rare. Long term goal is to get rid of yp/nis, so problem will diminish itself. Now the bug will show up in google and hopefully save somebody else some time. You can close this bug as you see fit! Ok, I think it's probably safe enough to use noresvport and that's probably safer than messing around with the tcp_tw parms. Reserved ports don't give much in the way of security these days anyway. We're stuck though with making that the default for nfs since that's what the RFC specifies. I'll go ahead and close this WONTFIX. |