Bug 818329 - NFS mount hanging
NFS mount hanging
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.2
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: nfs-maint
Filesystem QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-02 15:17 EDT by Mark Nipper
Modified: 2012-05-22 08:28 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-22 08:28:03 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
rpc_debug output from NFS client hang (803 bytes, text/plain)
2012-05-02 15:17 EDT, Mark Nipper
no flags Details
sysrq-trigger echo t output from NFS hang (719.57 KB, text/plain)
2012-05-02 15:22 EDT, Mark Nipper
no flags Details
rpc_debug output from NFS hang (803 bytes, text/plain)
2012-05-02 15:26 EDT, Mark Nipper
no flags Details

  None (edit)
Description Mark Nipper 2012-05-02 15:17:36 EDT
Created attachment 581699 [details]
rpc_debug output from NFS client hang

Description of problem:
Randomly, we have two RHEL 6.2 clients with NFS mounts that end up hanging / blocking / freezing.  The server is a RHEL 5.8 server, and the mounts are all NFSv3.

Version-Release number of selected component (if applicable):
The kernel on the clients is 2.6.32-220.13.1.el6.x86_64 and nfs-utils is nfs-utils-1.2.3-15.el6.x86_64.

How reproducible:
It takes anywhere from a day to a few weeks.  It seems to be rather random.

Steps to Reproduce:
1. occurs randomly
  
Actual results:
NFS mount stops working.

Expected results:
NFS mount shouldn't stop working.

Additional info:
I'm attaching the output from:
---
echo 0 > /proc/sys/sunrpc/rpc_debug
echo t > /proc/sysrq-trigger
Comment 1 Mark Nipper 2012-05-02 15:22:39 EDT
Created attachment 581700 [details]
sysrq-trigger echo t output from NFS hang
Comment 2 Mark Nipper 2012-05-02 15:26:13 EDT
Created attachment 581701 [details]
rpc_debug output from NFS hang
Comment 4 RHEL Product and Program Management 2012-05-06 00:06:16 EDT
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 6 Jeff Layton 2012-05-08 06:55:08 EDT
Looks like the clients are just waiting on the server to respond. Have you sniffed traffic between the two? You might want to do so to see whether the server is ignoring calls from the client or something or maybe whether calls
are not going out at all for some reason.

If you need help tracking down the cause, then I'd suggest opening RH support
bug so that our support folks can help you with debugging.
Comment 7 Mark Nipper 2012-05-08 11:39:13 EDT
Well, we're an academic license, so we don't actually get any support (that I'm aware of anyway).

Having said that, this had been working okay previously.  It seems like one of the more recent kernel updates (within the last three or four released) was around the time we started having issues with this.  We have two identical machines, both acting as load balanced web servers with an older NetApp filer and a Linux server backing everything via NFS.  When this happens, the other web machine is working fine and the NFS mounts to the NetApp filer are still working without any problems.  There is still a perfectly operable network connection between the affected client and the Linux server on which the NFS mounts hang.

We had been using NFSv3, but we just switched to NFSv4 yesterday to see if the problem goes away exercising a different code path in the kernel.  I agree that it looks like the client is simply sending and waiting for a response.  But nothing has really changed in this setup except for newer kernel packages to account for why it was working previously and now suddenly, it's not.  Both web front ends experience the problem, just at different times.  But usually within a few days of the last reboot, one of the two will have gotten into this state.

If it's still happening with NFSv4, I'll try to grab everything happening between the client and server via tcpdump or wireshark.
Comment 8 J. Bruce Fields 2012-05-08 11:54:46 EDT
In addition to the network traffic, it might be worth trying the sysrq-t dump on the Linux server, just to see if the server threads are stuck.
Comment 9 Mark Nipper 2012-05-21 11:29:12 EDT
It's worth mentioning that since we moved both clients to NFSv4, we haven't had the problem again.  Something definitely seems wrong in the NFSv3 client.  But we're not especially keen on going back to debug it at this point.
Comment 10 Steve Dickson 2012-05-22 08:28:03 EDT
(In reply to comment #9)
> It's worth mentioning that since we moved both clients to NFSv4, we haven't
> had the problem again.  Something definitely seems wrong in the NFSv3
> client.  But we're not especially keen on going back to debug it at this
> point.
Fair enough... Since we can't reproduces this and moving forward fixes the issues Lets close this bz. If the problem reappears please feel free to reopen this bz...

Note You need to log in before you can comment on or make changes to this bug.