Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 818329

Summary: NFS mount hanging
Product: Red Hat Enterprise Linux 6 Reporter: Mark Nipper <nipsy>
Component: kernelAssignee: nfs-maint
Status: CLOSED INSUFFICIENT_DATA QA Contact: Filesystem QE <fs-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: bfields, ikent, jlayton, kzhang, rwheeler, steved
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-22 12:28:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rpc_debug output from NFS client hang
none
sysrq-trigger echo t output from NFS hang
none
rpc_debug output from NFS hang none

Description Mark Nipper 2012-05-02 19:17:36 UTC
Created attachment 581699 [details]
rpc_debug output from NFS client hang

Description of problem:
Randomly, we have two RHEL 6.2 clients with NFS mounts that end up hanging / blocking / freezing.  The server is a RHEL 5.8 server, and the mounts are all NFSv3.

Version-Release number of selected component (if applicable):
The kernel on the clients is 2.6.32-220.13.1.el6.x86_64 and nfs-utils is nfs-utils-1.2.3-15.el6.x86_64.

How reproducible:
It takes anywhere from a day to a few weeks.  It seems to be rather random.

Steps to Reproduce:
1. occurs randomly
  
Actual results:
NFS mount stops working.

Expected results:
NFS mount shouldn't stop working.

Additional info:
I'm attaching the output from:
---
echo 0 > /proc/sys/sunrpc/rpc_debug
echo t > /proc/sysrq-trigger

Comment 1 Mark Nipper 2012-05-02 19:22:39 UTC
Created attachment 581700 [details]
sysrq-trigger echo t output from NFS hang

Comment 2 Mark Nipper 2012-05-02 19:26:13 UTC
Created attachment 581701 [details]
rpc_debug output from NFS hang

Comment 4 RHEL Program Management 2012-05-06 04:06:16 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 6 Jeff Layton 2012-05-08 10:55:08 UTC
Looks like the clients are just waiting on the server to respond. Have you sniffed traffic between the two? You might want to do so to see whether the server is ignoring calls from the client or something or maybe whether calls
are not going out at all for some reason.

If you need help tracking down the cause, then I'd suggest opening RH support
bug so that our support folks can help you with debugging.

Comment 7 Mark Nipper 2012-05-08 15:39:13 UTC
Well, we're an academic license, so we don't actually get any support (that I'm aware of anyway).

Having said that, this had been working okay previously.  It seems like one of the more recent kernel updates (within the last three or four released) was around the time we started having issues with this.  We have two identical machines, both acting as load balanced web servers with an older NetApp filer and a Linux server backing everything via NFS.  When this happens, the other web machine is working fine and the NFS mounts to the NetApp filer are still working without any problems.  There is still a perfectly operable network connection between the affected client and the Linux server on which the NFS mounts hang.

We had been using NFSv3, but we just switched to NFSv4 yesterday to see if the problem goes away exercising a different code path in the kernel.  I agree that it looks like the client is simply sending and waiting for a response.  But nothing has really changed in this setup except for newer kernel packages to account for why it was working previously and now suddenly, it's not.  Both web front ends experience the problem, just at different times.  But usually within a few days of the last reboot, one of the two will have gotten into this state.

If it's still happening with NFSv4, I'll try to grab everything happening between the client and server via tcpdump or wireshark.

Comment 8 J. Bruce Fields 2012-05-08 15:54:46 UTC
In addition to the network traffic, it might be worth trying the sysrq-t dump on the Linux server, just to see if the server threads are stuck.

Comment 9 Mark Nipper 2012-05-21 15:29:12 UTC
It's worth mentioning that since we moved both clients to NFSv4, we haven't had the problem again.  Something definitely seems wrong in the NFSv3 client.  But we're not especially keen on going back to debug it at this point.

Comment 10 Steve Dickson 2012-05-22 12:28:03 UTC
(In reply to comment #9)
> It's worth mentioning that since we moved both clients to NFSv4, we haven't
> had the problem again.  Something definitely seems wrong in the NFSv3
> client.  But we're not especially keen on going back to debug it at this
> point.
Fair enough... Since we can't reproduces this and moving forward fixes the issues Lets close this bz. If the problem reappears please feel free to reopen this bz...