Red Hat Bugzilla – Bug 834799
NFS write performance with default options absolutely terrible on F16<->F16
Last modified: 2012-10-23 14:17:27 EDT
Description of problem:
Mounting remote ext4 on LVM + RAID1 via nfs on Fedora 16 is giving very bad write speeds thru Gbit network on big files (e.g. ISO images). This is working very strange way - first it transfers at very high speed (close to Gbit network transfer speed) the amount that fits into server memory, then the performance drops and the load rockets on the server, client starts freezing. iotop shows only 600-800kB/s write and many NFS processes
On F16 server (client is F16 too):
867 be/3 root 0.00 B/s 0.00 B/s 0.00 % 94.73 % [jbd2/md125-8]
1953 be/4 root 0.00 B/s 22.29 K/s 0.00 % 0.45 % [nfsd]
1960 be/4 root 0.00 B/s 29.72 K/s 0.00 % 0.31 % [nfsd]
1955 be/4 root 0.00 B/s 37.15 K/s 0.00 % 0.29 % [nfsd]
1958 be/4 root 0.00 B/s 37.15 K/s 0.00 % 0.26 % [nfsd]
1957 be/4 root 0.00 B/s 37.15 K/s 0.00 % 0.25 % [nfsd]
1954 be/4 root 0.00 B/s 29.72 K/s 0.00 % 0.25 % [nfsd]
1959 be/4 root 0.00 B/s 37.15 K/s 0.00 % 0.24 % [nfsd]
1956 be/4 root 0.00 B/s 22.29 K/s 0.00 % 0.16 % [nfsd]
top - 19:26:36 up 6:48, 2 users, load average: 9.15, 5.41, 2.81
The write performance of the bare RAID on the server is about 54MB/s.
I had these problems also when server was at F15, but the transfer with default options was around 6MB/s, with F16 it is 600kB/s.
With async option I was able to reach around 30-40MB/s real transfer speed when client was F16 (x86_64), server F15 (i686) which is resonable, similar situation is with async and server on F16 (i686). The write in iotop shows for a one second write up to 100MB/s, in the following few kB/s, but in average it is about the 30-50MB/s.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. just export some ext4 directory via nfs
2. transfer a large file (e.g. DVD iso)
600kB/s write speed
There is similar bug #790232
Would it be possible to get a look at what the server threads are doing at the time? Are they stuck waiting somewhere (in which case sysrq-t output might be interesting), or are they burning CPU (in which case maybe something like perf could get us an idea where they're spending their time?)
Can you reproduce the same problem on F17?
I am not sure I can provide what you ask for as I probably do not know how to do that. Here is a perf top on server side during the transfer:
26.97% [sunrpc] [k] svc_recv
10.09% [sunrpc] [k] svc_tcp_has_wspace
7.57% [sunrpc] [k] svc_xprt_enqueue
6.57% [kernel] [k] local_bh_enable_ip
4.57% [sunrpc] [k] svc_xprt_release
3.98% [kernel] [k] _raw_spin_unlock_bh
2.48% [kernel] [k] _raw_spin_lock_bh
2.23% [kernel] [k] __list_del_entry
2.02% [kernel] [k] kfree
1.90% [sunrpc] [k] svc_release_skb
1.81% [unknown] [.] 0xb776e55f
1.66% [sunrpc] [k] svc_xprt_received
1.65% [kernel] [k] __list_add
1.59% [kernel] [k] page_address
1.51% [sunrpc] [k] svc_xprt_put
1.48% [kernel] [k] _cond_resched
1.18% [kernel] [k] local_bh_disable
1.18% [sunrpc] [k] svc_pool_for_cpu
1.01% [sunrpc] [k] svc_reserve
0.71% [kernel] [k] kthread_should_stop
0.70% [kernel] [k] acpi_idle_do_entry
0.68% libperl.so [.] 0x000462c3
0.58% [nfsd] [k] nfsd4_encode_components
Usuall top shows:
1954 root 20 0 0 0 0 D 28.7 0.0 3:50.38 nfsd
1955 root 20 0 0 0 0 R 25.1 0.0 3:45.21 nfsd
1957 root 20 0 0 0 0 D 21.1 0.0 3:50.41 nfsd
1953 root 20 0 0 0 0 D 10.9 0.0 3:55.68 nfsd
1959 root 20 0 0 0 0 D 10.9 0.0 3:50.18 nfsd
1960 root 20 0 0 0 0 D 1.7 0.0 3:51.02 nfsd
374 root 20 0 0 0 0 S 0.7 0.0 36:08.36 md125_raid1
867 root 20 0 0 0 0 D 0.7 0.0 0:07.37 jbd2/md125-8
It looks like it is burning CPU - at least the top shows the %CPU used.
Thanks! Yes, that gives me a better idea where to look.... I may not get the chance soon, though.
You say you didn't see this problem (or saw only a milder version of it, I'm not clear?) on F15. What was the last kernel version that worked for you on F15? Also, are you positive that it was a server upgrade that caused the problem (did the client version stay the same throughout?).
Another approach might be to start bisecting to figure out exactly which version of the kernel started failing.
If you watch the client-server communication with wireshark: what does the network traffic look like after the problem starts? (You could upload an example: I'd need the full binary data.)
Created attachment 594188 [details]
output of sysrq t
Attached is the output of sysrq-t during transfer.
Well, the performance on F15 was better, but definitely not good. On the Gbit network I'd expect the write performace to be at the array speed. I've updated only the server, client remains the same - It could be the kernel changes however to 3.4.2 (x86_64) on client too in the meantime. I do not think this is of a big importance, as the performance is bad overall.
Are the top and sysrq-t both taken *after* the performance got bad?
Also: all the above seems to be with exports that have the "async" export option set in /etc/exports, is that correct? Might be worth seeing what the performance is like without that.
The sysrq-t trace shows 8 threads, mostly waiting for writes.
You probably want more than 8 threads, though I doubt increasing it will fix the problem.
Yes, this is when the performance got bad. This should be with default export options - means sync. When I add "async" to exportfs then the performance is much better - almost what is expected.
The strange thing with the threads is, that while in F16 there is roughtly 60kB/s write in iotop per thread, in F15 it was roughly 600kB/s per thread (still too low however).
I know, I can increase the NFS write performance with various options like "insecure, async" etc. but the default behaviour is quite frustrating. I understand there could be some penalty as a trade of for secure and sync, but this is somewhat too much.
Also the load on the server is really abnormal - close to 10, increasing the threads will make it even worse. Client often gets stuck while this slow write on the server happens. When I did this test with 4GB file, I had to hard reset the client, as it get completely unresponsive - I know waiting long enougth to finish the write would also make it wake up again, but I was not willing to wait the 4GB to be transfered at 60kB/s.
# Mass update to all open bugs.
Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.
Please retest with this kernel, and let us know if your problem has been fixed.
In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with. Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.
If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
Given the prominence of svc_recv, svc_tcp_has_wspace, and svc_xprt_enable in the perf top output, this is almost certainly the bug fixed by upstream commit d10f27a750 "svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping", which is included in 3.6.
I'm therefore closing this bug under the assumption that it is fixed; however, any confirmation, or information to the contrary, is still welcome.