Bug 818329
| Summary: | NFS mount hanging | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Mark Nipper <nipsy> | ||||||||
| Component: | kernel | Assignee: | nfs-maint | ||||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Filesystem QE <fs-qe> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 6.2 | CC: | bfields, ikent, jlayton, kzhang, rwheeler, steved | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2012-05-22 12:28:03 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 581700 [details]
sysrq-trigger echo t output from NFS hang
Created attachment 581701 [details]
rpc_debug output from NFS hang
Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Looks like the clients are just waiting on the server to respond. Have you sniffed traffic between the two? You might want to do so to see whether the server is ignoring calls from the client or something or maybe whether calls are not going out at all for some reason. If you need help tracking down the cause, then I'd suggest opening RH support bug so that our support folks can help you with debugging. Well, we're an academic license, so we don't actually get any support (that I'm aware of anyway). Having said that, this had been working okay previously. It seems like one of the more recent kernel updates (within the last three or four released) was around the time we started having issues with this. We have two identical machines, both acting as load balanced web servers with an older NetApp filer and a Linux server backing everything via NFS. When this happens, the other web machine is working fine and the NFS mounts to the NetApp filer are still working without any problems. There is still a perfectly operable network connection between the affected client and the Linux server on which the NFS mounts hang. We had been using NFSv3, but we just switched to NFSv4 yesterday to see if the problem goes away exercising a different code path in the kernel. I agree that it looks like the client is simply sending and waiting for a response. But nothing has really changed in this setup except for newer kernel packages to account for why it was working previously and now suddenly, it's not. Both web front ends experience the problem, just at different times. But usually within a few days of the last reboot, one of the two will have gotten into this state. If it's still happening with NFSv4, I'll try to grab everything happening between the client and server via tcpdump or wireshark. In addition to the network traffic, it might be worth trying the sysrq-t dump on the Linux server, just to see if the server threads are stuck. It's worth mentioning that since we moved both clients to NFSv4, we haven't had the problem again. Something definitely seems wrong in the NFSv3 client. But we're not especially keen on going back to debug it at this point. (In reply to comment #9) > It's worth mentioning that since we moved both clients to NFSv4, we haven't > had the problem again. Something definitely seems wrong in the NFSv3 > client. But we're not especially keen on going back to debug it at this > point. Fair enough... Since we can't reproduces this and moving forward fixes the issues Lets close this bz. If the problem reappears please feel free to reopen this bz... |
Created attachment 581699 [details] rpc_debug output from NFS client hang Description of problem: Randomly, we have two RHEL 6.2 clients with NFS mounts that end up hanging / blocking / freezing. The server is a RHEL 5.8 server, and the mounts are all NFSv3. Version-Release number of selected component (if applicable): The kernel on the clients is 2.6.32-220.13.1.el6.x86_64 and nfs-utils is nfs-utils-1.2.3-15.el6.x86_64. How reproducible: It takes anywhere from a day to a few weeks. It seems to be rather random. Steps to Reproduce: 1. occurs randomly Actual results: NFS mount stops working. Expected results: NFS mount shouldn't stop working. Additional info: I'm attaching the output from: --- echo 0 > /proc/sys/sunrpc/rpc_debug echo t > /proc/sysrq-trigger