Bug 725547
Summary: | NFS server hangs with kernel: nfsd: peername failed (err 107) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Eyal <shimony> | |
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | |
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 5.3 | CC: | Bert.Deknuydt, bfields, dhowells, diana.chinces, hocks, jlayton, jmcaninl, rwheeler, sprabhu, steved | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 908876 (view as bug list) | Environment: | ||
Last Closed: | 2013-10-15 19:25:58 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 908876 |
Description
Eyal
2011-07-25 20:03:37 UTC
We are also encountering this issue when two clients with the same MAC address try to mount a NFS directory ? Are you planning to fix this ? If you have two clients with the same MAC address, I'm amazed that anything works at all; that's a separate problem. My first thought was that it was the same problem as that which c51e88efa9bf31e0f0bdf872c61a0e921a9faffb "sunrpc: fix peername failed on closed listener", but that fixes a regression that never existed in rhel5. It might be interesting to know where the nfsd threads are hanging, if they in fact are. Perhaps a sysrq-t trace would help? (echo "t" >/proc/sysrq-trigger, then attach the results which are dumped to the log). Hi Bruce, Thanks for helping. One question, can i run it now when it works or should i run it just at the exact same time it hang? I ran this on other server to test the command and noticed it hanged my server for a minute or so...is that suppose to do it? Thanks, Eyal. Run that after nfsd stops hanging. No, I wouldn't expect that to hang the server for a minute. How do you know the server was hung during that time? Looks like the same problem I am seeing with kernel 2.6.32-279.22.1.el6.x86_64 nfsd: peername failed (err 107)! There are 2 blocked task messages before the err 107 message INFO: task nfsd:25721 blocked for more than 120 seconds. INFO: task nfsd:25745 blocked for more than 120 seconds. and the trace starts with: Call Trace: [<ffffffff81090dee>] ? prepare_to_wait_exclusive+0x4e/0x80 [<ffffffffa01ee6e0>] cv_wait_common+0xa0/0x1a0 [spl] [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa0284190>] ? avl_find+0x60/0xb0 [zavl] [<ffffffffa01ee813>] __cv_wait+0x13/0x20 [spl] ....... The clients have no access during this error but after 10 minutes the nfsd recovered on it's own. Any fix for this issue? Thanks,Eva How many clients do you have, and how many nfsd threads are you running? I am also seeing this problem with kernel 2.6.32-279.el6.x86_64. The NFS server displays: nfsd: peername failed (err 107)! ... The client server will then display these messages repeatedly: server SERVERNAME not responding, still trying server SERVERNAME OK The NFS mount points then become unresponsive on all clients (we're in a blade configuration, multiple clients pointing to same NFS server). The scenario sounds similar. We are transferring numerous files from NFS server to client via "cp". The files vary in size, from several gigs to a few Kb's. However, the "cp" commands are being executed sequentially, so I wouldn't think there is excessive parallelism occurring. I do have one oddity which may be unrelated to this thread. The copies are always failing on the same file, which is an ~10 MB gzip-compressed TAR file. Is anyone else noticing their failures occurring on the same file? My take is, "a file is a file," so I don't know why a compressed TAR would be handled any differently. Permissions are fine. Like I said, this fact might be unrelated and coincidental. I just figured I should add it, since I see no resolution currently. No additional minor releases are planned for Production Phase 2 in Red Hat Enterprise Linux 5, and therefore Red Hat is closing this bugzilla as it does not meet the inclusion criteria as stated in: https://access.redhat.com/site/support/policy/updates/errata/#Production_2_Phase The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |