Bug 1009410

Summary: nfsd: peername failed (err 107)!
Product: [Fedora] Fedora Reporter: customercare
Component: kernelAssignee: nfs-maint
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: bfields, bill-bugzilla.redhat.com, customercare, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, marcelo.barbosa, skottler
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-05 22:24:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description customercare 2013-09-18 11:42:41 UTC
Description of problem:

nfsd with kernel 3.10.11-100 interrupts nfs connections if to many connections are running paralell

Version-Release number of selected component (if applicable):

does happen with "kernel-PAE-3.10.11-100" but not with "kernel-PAE-3.10.9-100.fc18.i686".

How reproducible:

While transmitting more large files ( > 400 MB ) via different connections from differen servers, it happen to 16 out of 20 servers. All at the same time.


Actual results:

... 20 or more lines of this: 
Sep 16 02:04:25 backup kernel: [283056.536898] nfs: server 83.246.80.137 not responding, timed out
Sep 16 02:04:25 backup kernel: [283056.536960] nfs: server 83.246.80.137 not responding, timed out 

followed by :

Sep 16 02:04:25 backup kernel: [283056.548993] nfsd: peername failed (err 107)!
Sep 16 02:04:25 backup kernel: [283056.549908] nfsd: peername failed (err 107)!
Sep 16 02:04:25 backup kernel: [283056.550741] nfsd: peername failed (err 107)!
Sep 16 02:04:25 backup kernel: [283056.551202] nfsd: peername failed (err 107)!
Sep 16 02:04:25 backup kernel: [283056.551279] nfsd: peername failed (err 107)!
Sep 16 02:04:25 backup kernel: [283056.551708] nfsd: peername failed (err 107)!
Sep 16 02:04:25 backup kernel: [283056.552203] nfsd: peername failed (err 107)!
Sep 16 02:04:25 backup kernel: [283056.552622] nfsd: peername failed (err 107)!  


Kernel 3.9.11 and all previouse kernels worked flawless. 

Expected results:

none interrupped transmition. 

Additional info:

this was run for 2 days in a row, and it happend excatly on those 2 days until i reinstalled the old kernel. Until then, no new error messages.

All servers who mounted the nfsmountpoints after the timeouts of the "early birds" , succeeded with theire filetransfers.

The systems involved here were not changed. The only difference was the used kernel on the nfsd serversystem.

Comment 1 Justin M. Forbes 2013-10-18 21:20:30 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.

Comment 2 customercare 2013-10-18 22:10:05 UTC
While installing the newest kernel this came up:

WARNING: /lib/modules/3.11.5-100.fc18.i686.PAE/kernel/drivers/char/crash.ko needs unknown symbol page_is_ram :


Führe Verarbeitung durch
  Aktualisieren: kernel-headers-3.11.5-100.fc18.i686        1/4
  Installieren : kernel-PAE-3.11.5-100.fc18.i686            2/4
  Aufräumen    : kernel-PAE-3.10.13-101.fc18.i686           3/4
  Aufräumen    : kernel-headers-3.10.14-100.fc18.i686       4/4
depmod: WARNING: /lib/modules/3.11.5-100.fc18.i686.PAE/kernel/drivers/char/crash.ko needs unknown symbol page_is_ram
  Verifying    : kernel-PAE-3.11.5-100.fc18.i686            1/4
  Verifying    : kernel-headers-3.11.5-100.fc18.i686        2/4
  Verifying    : kernel-headers-3.10.14-100.fc18.i686       3/4
  Verifying    : kernel-PAE-3.10.13-101.fc18.i686           4/4

Comment 3 customercare 2013-11-10 08:20:34 UTC
happens with 3.11.7-100.fc18 too.

Comment 4 Bill McGonigle 2013-12-20 21:14:16 UTC
I just saw this with f19 with 3.11.10-200.fc19.x86_64

I tried exportfs -f which often clears up stale nfs handles but that didn't resolve this issue.  Also restarted nfs service to no avail, but a reboot cleared it up (I had rebooted a client first without benefit).  

I didn't see anything interesting in syslog.  Here's the dmesg:

[3739013.789191] nfsd: last server has exited, flushing export cache
[3739014.354693] NFSD: starting 90-second grace period (net ffffffff81cbdfc0)
[4323948.836824] nfsd: peername failed (err 107)!
[4323948.949103] nfsd: peername failed (err 107)!
[4323949.012233] nfsd: peername failed (err 107)!
[4323949.056911] nfsd: peername failed (err 107)!
[4323949.100422] nfsd: peername failed (err 107)!
[4323949.160057] nfsd: peername failed (err 107)!
[4323949.201984] nfsd: peername failed (err 107)!
[4323949.281673] nfsd: peername failed (err 107)!
[4323949.350530] nfsd: peername failed (err 107)!
[4323949.408005] nfsd: peername failed (err 107)!
[4560939.569254] nfsd: last server has exited, flushing export cache
[4560939.656733] NFSD: starting 90-second grace period (net ffffffff81cbdfc0)
[4560948.862565] nfsd: last server has exited, flushing export cache
[4560948.943642] NFSD: starting 90-second grace period (net ffffffff81cbdfc0)

Is 'peername' related to reverse DNS or does it have a meaning in the NFS context?  This host doesn't appear to have a valid reverse entry, but it worked for months before this.  I didn't have time to try DNS changes before rebooting.

Comment 5 J. Bruce Fields 2013-12-20 22:26:35 UTC
Looking at the code: this means kernel_getpeername failed on a new incoming tcp connection.  I believe this is implemented by an net/ipv4/af_inet.c:inet_getname() call with peer=1 in our case.  That can fail with -ENOTCONN when

  !inet->inet_dport || ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT))

I suspect it's the TCPF_CLOSE case that we're hitting here?  Could it be some race where we're trying to accept a new socket that we've already closed?

In any case, nothing to do with DNS.

What actual problem did you see (other than logged warnings).

Comment 6 Fedora End Of Life 2013-12-21 14:36:09 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Fedora End Of Life 2014-02-05 22:24:17 UTC
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.