Red Hat Bugzilla – Full Text Bug Listing
|Summary:||NFS throughput low, high NFS util locks screen updates for a few minutes|
|Product:||[Fedora] Fedora||Reporter:||Saikat Guha <sg266>|
|Component:||nfs-utils||Assignee:||Steve Dickson <steved>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||Ben Levenson <benl>|
|Fixed In Version:||rawhide||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-03-31 05:39:03 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Saikat Guha 2006-12-01 20:06:14 EST
On Rawhide (kernel-2.6.18-1.2849.fc6 and .2798.fc6) NFS client, I am unable to achieve more than 2MBPS write throughput to a NFS server. Older FC hosts (FC5) can achieve 10MBPS (network saturated). Furthermore, when NFS utilization is high, Xorg locks up for 3 miutes at a stretch and then is responsive again for a couple of minutes before locking up again until the NFS utilization subsides. The host, however, responds to network logins while Xorg is locked -- CPU is idle, there is little or no IO wait (even though it should be NFS utilization is high). However, running ls, cp, mv, tab-expansion etc. on the NFS volume blocks for a long time. NFS is running over TCP (to the server on a FC5 host). Dmesg is clean. NFS options in fstab are "tcp,defaults,soft" [root@sioux ~]# nfsstat -rc; sleep 10; nfsstat -rc Client rpc stats: calls retrans authrefrsh 1080783 0 0 Client rpc stats: calls retrans authrefrsh 1080824 0 0 Last few lines of tethereal capture (roughly 11 seconds, only ~1000 packets per second on a 100mbps network): 11.078425 xxx.yy.zzz.152 -> xxx.yy.aaa.36 TCP nfs > netviewdm3 [ACK] Seq=160200 Ack=13390152 Win=501 Len=0 TSV=384699903 TSER=4297056 11.078643 xxx.yy.zzz.152 -> xxx.yy.aaa.36 TCP nfs > netviewdm3 [ACK] Seq=160200 Ack=13392720 Win=501 Len=0 TSV=384699903 TSER=4297056 11.091453 xxx.yy.zzz.152 -> xxx.yy.aaa.36 NFS V2 WRITE Reply (Call In 10908) 11.091474 xxx.yy.aaa.36 -> xxx.yy.zzz.152 NFS V2 WRITE Call, FH:0x6c027d0e BeginOffset:2277376 Offset:2277376 TotalCount:8192[Unreassembled Packet [incorrect TCP checksum]] 11.091480 xxx.yy.aaa.36 -> xxx.yy.zzz.152 RPC Continuation 11.091485 xxx.yy.aaa.36 -> xxx.yy.zzz.152 RPC Continuation 11.091914 xxx.yy.zzz.152 -> xxx.yy.aaa.36 TCP nfs > netviewdm3 [ACK] Seq=160300 Ack=13395616 Win=501 Len=0 TSV=384699906 TSER=4297071 11.092159 xxx.yy.zzz.152 -> xxx.yy.aaa.36 TCP nfs > netviewdm3 [ACK] Seq=160300 Ack=13398512 Win=501 Len=0 TSV=384699906 TSER=4297071 11.092378 xxx.yy.zzz.152 -> xxx.yy.aaa.36 TCP nfs > netviewdm3 [ACK] Seq=160300 Ack=13401080 Win=501 Len=0 TSV=384699906 TSER=4297071 11 packets dropped 11102 packets captured Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.2%id, 0.0%wa, 0.2%hi, 0.5%si, 0.0%st The Xorg free behavior is always reproducable under high NFS utilization on my setup. The low throughput to the NFS server is always reproducable.
Comment 1 Steve Dickson 2006-12-07 09:49:13 EST
hmm... [incorrect TCP checksum] is a bit worrisome.... I would guess thats the cause of the slow down... So this is between a FC6 (or rawhide) client and a FC5 server?
Comment 2 Saikat Guha 2006-12-07 17:03:31 EST
Correct. Between a rawhide client and a FC5 server. Switching NFS from TCP to UDP results in the same symptoms -- low throughput, intermittent display lockups, lots of idle CPU etc.
Comment 3 Steve Dickson 2006-12-11 08:50:30 EST
yeah on a congested network, UDP would be worse...
Comment 4 Saikat Guha 2006-12-11 09:21:29 EST
True, however, I can achieve the full network bandwidth using SCP or TTCP/IPERF etc. NFS seems to be achieving a factor of 5 less. In addition, intense NFS activity completely locks up Xorg for tens of seconds, sometimes several minutes -- no mouse cursor update, no panel clock updates, no system monitor graph updates etc. On the system, /home is mounted from a _different_ NFS server than the server to which the "high"-bandwidth transfer is taking place. / -- local /home -- FC1 NFS server A <---- home directory /mnt/nfs2 -- FC5 NFS server B <---- destination of large file copy The large background file copy to B should cause Xorg/gnome etc to freeze for minutes. The freeze is not observed when the copy to B is performed using SCP. Also, as mentioned, SCP bandwidth is much higher. If there is any sort of diagnostics you'd like me to run please let me know. Thanks.
Comment 5 Steve Dickson 2006-12-15 06:27:45 EST
> True, however, I can achieve the full network bandwidth using SCP or TTCP/IPERF > etc. NFS seems to be achieving a factor of 5 less. Well there will always be much less protocol overhead with streams like that Plus the NFS client can go wire speed... I've seen it... I just noticed "NFS V2 WRITE Call" Why are you using v2? How does V3 using TCP work? > In addition, intense NFS activity completely locks up Xorg for tens of > seconds, sometimes several minutes -- no mouse cursor update, no panel clock > updates, no system monitor graph updates etc. Although NFS may contribute to it.... it very rare that NFS (or any other filesystem) causes mouses and displays to lock up... Try opening up another console terminal (Alt-Ctrl-F2) and run top to see who is graping your CPU...
Comment 6 Saikat Guha 2006-12-15 07:15:31 EST
(In reply to comment #5) > Well there will always be much less protocol overhead with streams like that > Plus the NFS client can go wire speed... I've seen it... I've seen NFS at wirespeed as well (from the very same rawhide box to an FC1 NFS server for example). > I just noticed "NFS V2 WRITE Call" Why are you using v2? How does V3 using > TCP work? Hmmm. I don't recall setting it to V2; should be using the default. Will try to force it to V3. > Although NFS may contribute to it.... it very rare that NFS (or any > other filesystem) causes mouses and displays to lock up... Try opening > up another console terminal (Alt-Ctrl-F2) and run top to see who is > graping your CPU... Can't Ctrl-Alt-F2 (console is completely stuck) but can log into the stuck host from another host as mentioned in the original post; top shows 0% cpu and 0% io wait.
Comment 7 Saikat Guha 2008-03-31 05:39:03 EDT
Seems to be working well these last few months. Closing this bug for now.