Bug 820004 - nfs client hangs older nfs servers
nfs client hangs older nfs servers
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Jeff Layton
Filesystem QE
Depends On:
  Show dependency treegraph
Reported: 2012-05-08 16:46 EDT by Vilius Šumskas
Modified: 2012-12-11 06:09 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-12-11 06:09:51 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Vilius Šumskas 2012-05-08 16:46:30 EDT
Description of problem:
We are running RHEL 6.2 with RHEL 6.3 beta kernel as an NFS client. The server also runs vsftpd which is configured to access mounted NFS volume. The beta kernel was installed because of issue in https://bugzilla.redhat.com/show_bug.cgi?id=770592 . Now we have another problem. Seems like every time when at least 5 users connect to FTP and start upload data to NFS volume it hangs the actual NFS server (which is Mac OS X Server 10.5.8). We tried to mount the volumes from two different Mac OS X Servers and it always behaves like this. NFS server just freezes. The amount of data coming through FTP is ~50-70mbps. Reverting the kernel to RHEL 6.1 (2.6.32-131.17.1.el6.x86_64) fixes the problem.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Install RHEL6.3 beta kernel.
2. Mount NFS volume from Mac OS X 10.5.8 server. Fstab entry:  /mnt/ftp        nfs     nosuid,exec,nodev,proto=tcp  0 0

3. Install and configure vsftpd on RHEL to allow uploads into /mnt/ftp
4. Connect as many clients as you can and start uploading from all of them.
Actual results:
The NFS server hangs.

Expected results:
It should not hang.

Additional info:
I suspect it could be that RHEL still looses the connection/packet randomly like in https://bugzilla.redhat.com/show_bug.cgi?id=770592 but this just doesn't show up in the logs anymore. As NFS protocol is not so prone to packet losts this could freeze the server.
Comment 2 RHEL Product and Program Management 2012-05-13 00:04:41 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 3 Vilius Šumskas 2012-06-25 10:59:35 EDT
The issue still persist in kernel-2.6.32-279.el6.x86_64 in RHEL 6.3 release.

Can someone take a look into it?
Comment 4 Tore H. Larsen 2012-07-17 15:18:34 EDT
Comment 5 Steve Dickson 2012-07-24 14:55:24 EDT
Could you please get a network trace on whats going over the wire? Something similar to:

tshark -w /tmp/data.pcap host <server>
bzip2 /tmp/data.pcap
Comment 6 Vilius Šumskas 2012-07-31 07:56:06 EDT
Here you go http://www.tekila.lt/public/data.pcap.bz2

The capture was made just right after rebooted to RHEL 6.3 kernel, started to upload data through FTP to NFS share, and then NFS server hang.
Comment 7 Jeff Layton 2012-07-31 08:15:57 EDT
Hmmm...sounds more like a problem with the server here. Even if the client is doing something it doesn't like, hanging is not really a good way to handle it.

Perhaps you should consider getting Apple's support organization involved?
Comment 8 Vilius Šumskas 2012-07-31 08:28:03 EDT
I completely support your statement that hanging is not a really good way to handle it, but if something can be done on the client side to make it more compatible with Apple (and other) products, it would be great.

I can try to report this to Apple, but their support is beyond terrible when it goes to server products. The response times are YEARS (literally) and their standard response is "we don't support versions other than the current version". Even for the companies with support contracts.
Comment 9 Jeff Layton 2012-07-31 08:34:31 EDT
To be clear, we're happy to help, but without some idea of why the server is hanging, it's going to end up being a game of "try this and see if it works", if we can even come up with things to try...

What you might also want to do is to get a capture of some of the network traffic between the client and server for the "working" case as well so you can compare and contrast between the two.
Comment 10 Vilius Šumskas 2012-07-31 09:33:15 EDT
A working case capture with RHEL 6.1 kernel: http://www.tekila.lt/public/data_working.pcap.bz2
Comment 11 Steve Dickson 2012-10-08 09:30:12 EDT
(In reply to comment #10)
> A working case capture with RHEL 6.1 kernel:
> http://www.tekila.lt/public/data_working.pcap.bz2

Unfortunately I don't see the hang in that trace... But I do see a number of
  [TCP previous segment lost] packages which might point to a network issue...
Comment 12 Vilius Šumskas 2012-10-08 11:08:03 EDT
I have already ruled that out changing the router.

Even if it is network problem, it doesn't explain why server works in 6.1 kernel and doesn't work in 6.3.
Comment 13 Jeff Layton 2012-12-11 06:09:51 EST
Whether Apple's support is terrible or not, a hung server indicates a bug in the server. A network server of any sort ought to be able to handle anything the client throws at it without hanging. I've looked over the captures and I don't see anything wrong with what's being sent to the server here.

Without more to go on, I don't see anything that we can do. At this point, I'm going to call this a bug in Apple's product and close this as NOTABUG. If you can get their support organization involved, and they point out something wrong with what we're sending to them then please reopen this can we'll be happy to take another look.

Also, with complex multi-vendor cases like this, it's generally a good idea to open a support case with RH support.

Note You need to log in before you can comment on or make changes to this bug.