Hide Forgot
Description of problem: We're running NFS between a bunch of vanilla fedora 16 installations on x86_64 bare metal machines (no virtualization involved here). The clients mount their $HOME from the server through an entry in /etc/fstab like this: server:/home /home nfs4 noatime,rsize=8192,wsize=8192,timeo=14,soft 0 0 Everything works fine and without any problems. To improve network performance / throughput we decided to enable support for jumbo packets by increasing the MTU on all systems from the default 1500 to 9000. This works fine, too, for most applications. Network tests show that we actually use jumbo packets then. However, NFS stops working. We see no indication of errors on the server machine, but the clients report: [108567.574198] nfs: server ... not responding, timed out [108571.778790] nfs: server ... not responding, timed out [108575.983417] nfs: server ... not responding, timed out [108585.565459] nfs: server ... not responding, timed out [108589.770074] nfs: server ... not responding, timed out [108593.974690] nfs: server ... not responding, timed out [108598.179302] nfs: server ... not responding, timed out [108602.383908] nfs: server ... not responding, timed out [108606.588558] nfs: server ... not responding, timed out [108610.793141] nfs: server ... not responding, timed out [108614.997763] nfs: server ... not responding, timed out [108619.202374] nfs: server ... not responding, timed out [108623.406982] nfs: server ... not responding, timed out Attempts to "cd $HOME" or "ls" will hang and finally time out with "I/O errors". Switching the MTU on the NFS server back to 1500 will make the problem go away. Version-Release number of selected component (if applicable): We experienced the same issue under Fedora 15, but decided to wait for an software and hardware update and then try again. Now, under Fedora 16, we see still the same issue. Server and all clients are running kernel-3.1.6-1.fc16.x86_64 How reproducible: absolutely reliable. Steps to Reproduce: 1. Set up a NFS4 server / client installation 2. verify that everything is workign fine 3. enable support for jumbo packets by raising the MTU to 9000. Actual results: NFS I/O errors on the clients, kernel messages "nfs: server ... not responding, timed out" Expected results: No errors, improved performance. Additional info: In an older setup, some network cards would not allow a MTU of 9000, they maxed out at 7200. Running the same test with a MTU of 7200 instead of 9000 still shows the same problem.
What network cards are you using ?
(In reply to comment #1) > What network cards are you using ? Test were done mostly on a Supermicro H8DME-2 mainboard which has two nVidia MCP55 Ethernet controllers. I also tried a SysKonnect SK-9871 V2.0 Gigabit Ethernet 1000Base-ZX Adapter, a SysKonnect SK-9E21D 10/100/1000Base-T Adapter, and a Intel 82572EI Gigabit Ethernet Controller. Less testing was done with these, but the results appear to be the same.
I can confirm this behaviour: Linux negro.micasa 3.1.8-2.fc16.x86_64 #1 SMP Sat Jan 7 13:35:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03) If I do: ifconfig eth1 mtu 1500 clients mount fine and can work. If i do: ifconfig eth1 mtu 7200 clients hang and there is no messages on server. Even if I change mtu once mounted the ls on client hangs too. I have to say that my clients are old machines (a Siemens M740 DVB) and a Samsung TV.
(In reply to comment #1) > What network cards are you using ? To eliminate that this problem might be caused by the network driver (the nVidia MCP55 Ethernet controller is still a bit suspicious to me) I installed INtel EtherExpress Pro/1000ET Quad cards (E1G44ET2BLK) on both the server and the client used for this test. The problem still happens with these. I see no indication of the errors on the network, not on the server. Just the client hangs with "nfs: server ... not responding, timed out" messages.
I'm still thinking this is a network problem not an NFS problem but... Could please post a binary network trace with either tcpdump -s0 -w /tmp/data.pcap host <server> or tshark -w /tmp/data.pcap host <server> than bzip2 the trace file bzip2 /tmp/data.pcap
(In reply to comment #5) > I'm still thinking this is a network problem not an NFS problem but... > Could please post a binary network trace with either > tcpdump -s0 -w /tmp/data.pcap host <server> > or > tshark -w /tmp/data.pcap host <server> > > than bzip2 the trace file > bzip2 /tmp/data.pcap Ok I'll try to do it this week....
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.