Description of problem: When doing a lot of mount/unmounts, mount fails with EIO after some time. Version-Release number of selected component (if applicable): 2.6.40-4.fc15.i686.PAE How reproducible: Always Steps to Reproduce: 1. mkdir /x 2. echo '/x localhost(ro,async)' >> /etc/exports 3. exportfs -a 4. Run attached script Actual results: Output: mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = 0 mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = 0 mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = 0 mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = -1 EIO (Input/output error) mount.nfs: mount system call failed mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = -1 EIO (Input/output error) mount.nfs: mount system call failed Dmesg: [459671.539903] RPC: created transport e3986000 with 16 slots [459671.539929] RPC: 53234 reserved req e3a2d000 xid e2f61e3e [459671.539975] RPC: 53234 xprt_connect xprt e3986000 is not connected [459671.540170] RPC: 53234 xprt_connect_status: error 98 connecting to server localhost [459671.540177] RPC: 53234 release request e3a2d000 [459671.540195] RPC: destroying transport e3986000 [459671.540205] RPC: disconnected transport e3986000 Expected results: infinite number of mounts Additional info: Problem manifests itself on 2.6.35.13-91.fc14.i686.PAE as well, only workaround I have found is: echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
Created attachment 524079 [details] Script to trigger mount bug
(In reply to comment #0) > Expected results: > > infinite number of mounts > Sounds like reserved port exhaustion. Does this problem go away if you do the mounts with "-o resvport"?
"-o resvport" gives same problem, with "-o noresvport" the numer of sockets in TIME_WAIT seems to stabilize around 700, will see if I can increase mount rate to trigger bug.
Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in the vain hope that it would eventually retry? Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?), should it then be made an option? Or should I just accept that the problem exists and use noresvport and cross my fingers that it won't bite me again?
(In reply to comment #4) Sorry, I meant -o noresvport before... > Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in > the vain hope that it would eventually retry? I don't think that will work. > Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?), > should it then be made an option? I suspect that would get a chilly reception upstream, but you're welcome to propose it there. You'll need some way to deal with "stray" packets that come in after the socket has been closed and reused. > Or should I just accept that the problem exists and use noresvport and cross my > fingers that it won't bite me again? That's probably what I'd recommend. Reserved ports are a limited resource so you'll never get an infinite number of connections with them. Note that the reproducer you have represents the pessimal case. It mounts only to immediately unmount again. If you have multiple mounts to the same server, then sockets will be shared between them. So, for instance if you were to loop and do a ton of mounts to one server and then another loop to do a ton of unmounts then that will just make one socket (and probably will share superblocks too). What exactly were you doing when you got "bitten" by this? Do you really have that many individual NFS servers? Or were you mounting and unmounting in quick succession like this for some other reason?
(In reply to comment #5) > (In reply to comment #4) > > Sorry, I meant -o noresvport before... I forgot a smiley :-) > > Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in > > the vain hope that it would eventually retry? > > I don't think that will work. OK, I won't try that at home then :-) > > Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?), > > should it then be made an option? > > I suspect that would get a chilly reception upstream, but you're welcome to > propose it there. You'll need some way to deal with "stray" packets that come > in after the socket has been closed and reused. I'm not surprised. > > Or should I just accept that the problem exists and use noresvport and cross my > > fingers that it won't bite me again? > > That's probably what I'd recommend. Reserved ports are a limited resource so > you'll never get an infinite number of connections with them. Note that the > reproducer you have represents the pessimal case. It mounts only to immediately > unmount again. I know, this was the reproducible case :-) Would have been easier to track down if EADDRINUSE did not get transmogrified into EIO. > If you have multiple mounts to the same server, then sockets will be shared > between them. So, for instance if you were to loop and do a ton of mounts to > one server and then another loop to do a ton of unmounts then that will just > make one socket (and probably will share superblocks too). > > What exactly were you doing when you got "bitten" by this? Do you really have > that many individual NFS servers? Or were you mounting and unmounting in quick > succession like this for some other reason? It ususally only bites during busy nights (backups, etc) when ypbind and ypserv seems to eat reserved ports (judged from TIME_WAIT status during light load). echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse Is an OK workaround here, server is local and only switches, so stray packets should be very rare. Long term goal is to get rid of yp/nis, so problem will diminish itself. Now the bug will show up in google and hopefully save somebody else some time. You can close this bug as you see fit!
Ok, I think it's probably safe enough to use noresvport and that's probably safer than messing around with the tcp_tw parms. Reserved ports don't give much in the way of security these days anyway. We're stuck though with making that the default for nfs since that's what the RFC specifies. I'll go ahead and close this WONTFIX.