Bug 740024

Summary:

nfs: Mount occasionally fails with EIO

Product:

[Fedora] Fedora

Reporter:

Anders Blomdell <anders.blomdell>

Component:

kernel

Assignee:

Jeff Layton <jlayton>

Status:

CLOSED WONTFIX

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

CC:

gansalmon, itamar, jlayton, jonathan, kernel-maint, madhu.chinakonda, pasteur, steved

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-09-21 12:58:51 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Script to trigger mount bug	none

Description Anders Blomdell 2011-09-20 18:11:37 UTC

Description of problem:

When doing a lot of mount/unmounts, mount fails with EIO after some time.

Version-Release number of selected component (if applicable):

2.6.40-4.fc15.i686.PAE 


How reproducible:

Always

Steps to Reproduce:
1. mkdir /x
2. echo '/x localhost(ro,async)' >> /etc/exports
3. exportfs -a
4. Run attached script
  
Actual results:

Output:

mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = 0
mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = 0
mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = 0
mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = -1 EIO (Input/output error)
mount.nfs: mount system call failed
mount("localhost:/x", "/tmp/mnt", "nfs", 0, "vers=4,addr=127.0.0.1,clientaddr"...) = -1 EIO (Input/output error)
mount.nfs: mount system call failed

Dmesg:

[459671.539903] RPC:       created transport e3986000 with 16 slots
[459671.539929] RPC: 53234 reserved req e3a2d000 xid e2f61e3e
[459671.539975] RPC: 53234 xprt_connect xprt e3986000 is not connected
[459671.540170] RPC: 53234 xprt_connect_status: error 98 connecting to server localhost
[459671.540177] RPC: 53234 release request e3a2d000
[459671.540195] RPC:       destroying transport e3986000
[459671.540205] RPC:       disconnected transport e3986000

Expected results:

infinite number of mounts

Additional info:

Problem manifests itself on 2.6.35.13-91.fc14.i686.PAE as well, only workaround I have found is:

    echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
    echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

Comment 1 Anders Blomdell 2011-09-20 18:13:05 UTC

Created attachment 524079 [details]
Script to trigger mount bug

Comment 2 Jeff Layton 2011-09-20 19:36:48 UTC

(In reply to comment #0)

> Expected results:
> 
> infinite number of mounts
> 

Sounds like reserved port exhaustion. Does this problem go away if you do the mounts with "-o resvport"?

Comment 3 Anders Blomdell 2011-09-20 20:45:46 UTC

"-o resvport" gives same problem, with "-o noresvport" the numer of sockets in TIME_WAIT seems to stabilize around 700, will see if I can increase mount rate to trigger bug.

Comment 4 Anders Blomdell 2011-09-21 06:41:31 UTC

Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in the vain hope that it would eventually retry? Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?), should it then be made an option?
Or should I just accept that the problem exists and use noresvport and cross my fingers that it won't bite me again?

Comment 5 Jeff Layton 2011-09-21 11:18:18 UTC

(In reply to comment #4)

Sorry, I meant -o noresvport before...

> Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in
> the vain hope that it would eventually retry?

I don't think that will work.

> Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?),
> should it then be made an option?

I suspect that would get a chilly reception upstream, but you're welcome to propose it there. You'll need some way to deal with "stray" packets that come in after the socket has been closed and reused.

> Or should I just accept that the problem exists and use noresvport and cross my
> fingers that it won't bite me again?

That's probably what I'd recommend. Reserved ports are a limited resource so you'll never get an infinite number of connections with them. Note that the reproducer you have represents the pessimal case. It mounts only to immediately unmount again.

If you have multiple mounts to the same server, then sockets will be shared between them. So, for instance if you were to loop and do a ton of mounts to one server and then another loop to do a ton of unmounts then that will just make one socket (and probably will share superblocks too).

What exactly were you doing when you got "bitten" by this? Do you really have that many individual NFS servers? Or were you mounting and unmounting in quick succession like this for some other reason?

Comment 6 Anders Blomdell 2011-09-21 12:50:33 UTC

(In reply to comment #5)
> (In reply to comment #4)
> 
> Sorry, I meant -o noresvport before...
I forgot a smiley :-)

> > Would it make sense to convert EADDRINUSE to EAGAIN in exprt_connect_status, in
> > the vain hope that it would eventually retry?
> 
> I don't think that will work.
OK, I won't try that at home then :-)

> > Or would it make more sense to allow socket reuse (sock->sk->sk_reuse?),
> > should it then be made an option?
> 
> I suspect that would get a chilly reception upstream, but you're welcome to
> propose it there. You'll need some way to deal with "stray" packets that come
> in after the socket has been closed and reused.
I'm not surprised.

> > Or should I just accept that the problem exists and use noresvport and cross my
> > fingers that it won't bite me again?
> 
> That's probably what I'd recommend. Reserved ports are a limited resource so
> you'll never get an infinite number of connections with them. Note that the
> reproducer you have represents the pessimal case. It mounts only to immediately
> unmount again.
I know, this was the reproducible case :-)

Would have been easier to track down if EADDRINUSE did not get transmogrified into EIO.

> If you have multiple mounts to the same server, then sockets will be shared
> between them. So, for instance if you were to loop and do a ton of mounts to
> one server and then another loop to do a ton of unmounts then that will just
> make one socket (and probably will share superblocks too).
> 
> What exactly were you doing when you got "bitten" by this? Do you really have
> that many individual NFS servers? Or were you mounting and unmounting in quick
> succession like this for some other reason?

It ususally only bites during busy nights (backups, etc) when ypbind and ypserv seems to eat reserved ports (judged from TIME_WAIT status during light load). 

    echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
    echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

Is an OK workaround here, server is local and only switches, so stray packets should be very rare. Long term goal is to get rid of yp/nis, so problem will diminish itself.

Now the bug will show up in google and hopefully save somebody else some time. You can close this bug as you see fit!

Comment 7 Jeff Layton 2011-09-21 12:58:51 UTC

Ok, I think it's probably safe enough to use noresvport and that's probably safer than messing around with the tcp_tw parms. Reserved ports don't give much in the way of security these days anyway. We're stuck though with making that the default for nfs since that's what the RFC specifies.

I'll go ahead and close this WONTFIX.