Bug 771131 - Enabling jumbo packets breaks NFS
Summary: Enabling jumbo packets breaks NFS
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-01 19:08 UTC by Wolfgang Denk
Modified: 2012-09-04 15:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-04 15:09:03 UTC
Type: ---


Attachments (Terms of Use)

Description Wolfgang Denk 2012-01-01 19:08:22 UTC
Description of problem:

We're running NFS between a bunch of vanilla fedora 16 installations
on x86_64 bare metal machines (no virtualization involved here).

The clients mount their $HOME from the server through an entry in
/etc/fstab like this:

server:/home  /home  nfs4  noatime,rsize=8192,wsize=8192,timeo=14,soft  0 0

Everything works fine and without any problems.  To improve network
performance / throughput we decided to enable support for jumbo
packets by increasing the MTU on all systems from the default 1500 to
9000.  This works fine, too, for most applications.  Network tests show
that we actually use jumbo packets then.

However, NFS stops working.  We see no indication of errors on the
server machine, but the clients report:

[108567.574198] nfs: server ... not responding, timed out
[108571.778790] nfs: server ... not responding, timed out
[108575.983417] nfs: server ... not responding, timed out
[108585.565459] nfs: server ... not responding, timed out
[108589.770074] nfs: server ... not responding, timed out
[108593.974690] nfs: server ... not responding, timed out
[108598.179302] nfs: server ... not responding, timed out
[108602.383908] nfs: server ... not responding, timed out
[108606.588558] nfs: server ... not responding, timed out
[108610.793141] nfs: server ... not responding, timed out
[108614.997763] nfs: server ... not responding, timed out
[108619.202374] nfs: server ... not responding, timed out
[108623.406982] nfs: server ... not responding, timed out

Attempts to "cd $HOME" or "ls" will hang and finally time out with
"I/O errors".

Switching the MTU on the NFS server back to 1500 will make the
problem go away.

Version-Release number of selected component (if applicable):

We experienced the same issue under Fedora 15, but decided to wait
for an software and hardware update and then try again.  Now, under
Fedora 16, we see still the same issue.

Server and all clients are running  kernel-3.1.6-1.fc16.x86_64


How reproducible:

absolutely reliable.

Steps to Reproduce:
1. Set up a NFS4 server / client installation
2. verify that everything is workign fine
3. enable support for jumbo packets by raising the MTU to 9000.
  
Actual results:

NFS I/O errors on the clients, kernel messages "nfs: server ... not
responding, timed out"

Expected results:

No errors, improved performance.

Additional info:

In an older setup, some network cards would not allow a MTU of 9000,
they maxed out at 7200. Running the same test with a MTU of 7200
instead of 9000 still shows the same problem.

Comment 1 Dave Jones 2012-01-03 15:47:50 UTC
What network cards are you using ?

Comment 2 Wolfgang Denk 2012-01-03 22:48:21 UTC
(In reply to comment #1)
> What network cards are you using ?

Test were done mostly on a Supermicro H8DME-2 mainboard which has two
nVidia MCP55 Ethernet controllers.

I also tried a SysKonnect SK-9871 V2.0 Gigabit Ethernet 1000Base-ZX Adapter,
a SysKonnect SK-9E21D 10/100/1000Base-T Adapter, and a Intel 82572EI
Gigabit Ethernet Controller.  Less testing was done with these, but
the results appear to be the same.

Comment 3 Miguel CV 2012-01-16 14:54:25 UTC
I can confirm this behaviour:

Linux negro.micasa 3.1.8-2.fc16.x86_64 #1 SMP Sat Jan 7 13:35:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)


If I do:

ifconfig eth1 mtu 1500

clients mount fine and can work.

If i do:
ifconfig eth1 mtu 7200

clients hang and there is no messages on server. Even if I change mtu once mounted the ls on client hangs too.

I have to say that my clients are old machines (a Siemens M740 DVB) and a Samsung TV.

Comment 4 Wolfgang Denk 2012-01-17 07:11:49 UTC
(In reply to comment #1)
> What network cards are you using ?

To eliminate that this problem might be caused by the network driver
(the nVidia MCP55 Ethernet controller is still a bit suspicious to me)
I installed INtel EtherExpress Pro/1000ET Quad cards (E1G44ET2BLK) on
both the server and the client used for this test.

The problem still happens with these.  I see no indication of the
errors on the network, not on the server.  Just the client hangs with
"nfs: server ... not responding, timed out" messages.

Comment 5 Steve Dickson 2012-03-15 15:43:19 UTC
I'm still thinking this is a network problem not an NFS problem but...
Could please post a binary network trace with either
   tcpdump -s0 -w /tmp/data.pcap host <server> 
or
   tshark -w /tmp/data.pcap host <server>

than bzip2 the trace file
   bzip2 /tmp/data.pcap

Comment 6 Miguel CV 2012-03-19 14:28:55 UTC
(In reply to comment #5)
> I'm still thinking this is a network problem not an NFS problem but...
> Could please post a binary network trace with either
>    tcpdump -s0 -w /tmp/data.pcap host <server> 
> or
>    tshark -w /tmp/data.pcap host <server>
> 
> than bzip2 the trace file
>    bzip2 /tmp/data.pcap

Ok I'll try to do it this week....

Comment 7 Dave Jones 2012-03-22 16:45:35 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 8 Dave Jones 2012-03-22 16:50:00 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 9 Dave Jones 2012-03-22 16:59:54 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.


Note You need to log in before you can comment on or make changes to this bug.