Bug 834799

Summary: NFS write performance with default options absolutely terrible on F16<->F16
Product: [Fedora] Fedora Reporter: Adam Pribyl <covex>
Component: kernelAssignee: nfs-maint
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16CC: bfields, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-23 18:17:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of sysrq t none

Description Adam Pribyl 2012-06-23 18:02:34 UTC
Description of problem:
Mounting remote ext4 on LVM + RAID1 via nfs on Fedora 16 is giving very bad write speeds thru Gbit network on big files (e.g. ISO images). This is working very strange way - first it transfers at very high speed (close to Gbit network transfer speed) the amount that fits into server memory, then the performance drops and the load rockets on the server, client starts freezing. iotop shows only 600-800kB/s write and many NFS processes

On F16 server (client is F16 too):
  867 be/3 root        0.00 B/s    0.00 B/s  0.00 % 94.73 % [jbd2/md125-8]
 1953 be/4 root        0.00 B/s   22.29 K/s  0.00 %  0.45 % [nfsd]
 1960 be/4 root        0.00 B/s   29.72 K/s  0.00 %  0.31 % [nfsd]
 1955 be/4 root        0.00 B/s   37.15 K/s  0.00 %  0.29 % [nfsd]
 1958 be/4 root        0.00 B/s   37.15 K/s  0.00 %  0.26 % [nfsd]
 1957 be/4 root        0.00 B/s   37.15 K/s  0.00 %  0.25 % [nfsd]
 1954 be/4 root        0.00 B/s   29.72 K/s  0.00 %  0.25 % [nfsd]
 1959 be/4 root        0.00 B/s   37.15 K/s  0.00 %  0.24 % [nfsd]
 1956 be/4 root        0.00 B/s   22.29 K/s  0.00 %  0.16 % [nfsd]

top - 19:26:36 up  6:48,  2 users,  load average: 9.15, 5.41, 2.81

The write performance of the bare RAID on the server is about 54MB/s.

I had these problems also when server was at F15, but the transfer with default options was around 6MB/s, with F16 it is 600kB/s.

With async option I was able to reach around 30-40MB/s real transfer speed when client was F16 (x86_64), server F15 (i686) which is resonable, similar situation is with async and server on F16 (i686). The write in iotop shows for a one second write up to 100MB/s, in the following few kB/s, but in average it is about the 30-50MB/s.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.5-5.fc16.i686
kernel-PAE-3.4.2-1.fc16.i68


Steps to Reproduce:
1. just export some ext4 directory via nfs
2. transfer a large file (e.g. DVD iso)
  
Actual results:
600kB/s write speed


Additional info:
There is similar bug #790232

Comment 1 J. Bruce Fields 2012-06-25 11:54:06 UTC
Would it be possible to get a look at what the server threads are doing at the time?  Are they stuck waiting somewhere (in which case sysrq-t output might be interesting), or are they burning CPU (in which case maybe something like perf could get us an idea where they're spending their time?)

Can you reproduce the same problem on F17?

Comment 2 Adam Pribyl 2012-06-25 13:14:44 UTC
I am not sure I can provide what you ask for as I probably do not know how to do that. Here is a perf top on server side during the transfer:

 26.97%  [sunrpc]                                           [k] svc_recv
 10.09%  [sunrpc]                                           [k] svc_tcp_has_wspace
  7.57%  [sunrpc]                                           [k] svc_xprt_enqueue
  6.57%  [kernel]                                           [k] local_bh_enable_ip
  4.57%  [sunrpc]                                           [k] svc_xprt_release
  3.98%  [kernel]                                           [k] _raw_spin_unlock_bh
  2.48%  [kernel]                                           [k] _raw_spin_lock_bh
  2.23%  [kernel]                                           [k] __list_del_entry
  2.02%  [kernel]                                           [k] kfree
  1.90%  [sunrpc]                                           [k] svc_release_skb
  1.81%  [unknown]                                          [.] 0xb776e55f
  1.66%  [sunrpc]                                           [k] svc_xprt_received
  1.65%  [kernel]                                           [k] __list_add
  1.59%  [kernel]                                           [k] page_address
  1.51%  [sunrpc]                                           [k] svc_xprt_put
  1.48%  [kernel]                                           [k] _cond_resched
  1.18%  [kernel]                                           [k] local_bh_disable
  1.18%  [sunrpc]                                           [k] svc_pool_for_cpu
  1.01%  [sunrpc]                                           [k] svc_reserve
  0.71%  [kernel]                                           [k] kthread_should_stop
  0.70%  [kernel]                                           [k] acpi_idle_do_entry
  0.68%  libperl.so                                         [.] 0x000462c3
  0.58%  [nfsd]                                             [k] nfsd4_encode_components

Usuall top shows:
 1954 root      20   0     0    0    0 D 28.7  0.0   3:50.38 nfsd                                   
 1955 root      20   0     0    0    0 R 25.1  0.0   3:45.21 nfsd                                   
 1957 root      20   0     0    0    0 D 21.1  0.0   3:50.41 nfsd                                   
 1953 root      20   0     0    0    0 D 10.9  0.0   3:55.68 nfsd                                   
 1959 root      20   0     0    0    0 D 10.9  0.0   3:50.18 nfsd                                   
 1960 root      20   0     0    0    0 D  1.7  0.0   3:51.02 nfsd                                   
  374 root      20   0     0    0    0 S  0.7  0.0  36:08.36 md125_raid1                            
  867 root      20   0     0    0    0 D  0.7  0.0   0:07.37 jbd2/md125-8       

It looks like it is burning CPU - at least the top shows the %CPU used.

Comment 3 J. Bruce Fields 2012-06-25 13:44:58 UTC
Thanks!  Yes, that gives me a better idea where to look....  I may not get the chance soon, though.

You say you didn't see this problem (or saw only a milder version of it, I'm not clear?) on F15.  What was the last kernel version that worked for you on F15?  Also, are you positive that it was a server upgrade that caused the problem (did the client version stay the same throughout?).

Another approach might be to start bisecting to figure out exactly which version of the kernel started failing.

If you watch the client-server communication with wireshark: what does the network traffic look like after the problem starts?  (You could upload an example: I'd need the full binary data.)

Comment 4 Adam Pribyl 2012-06-25 13:45:30 UTC
Created attachment 594188 [details]
output of sysrq t

Attached is the output of sysrq-t during transfer.

Comment 5 Adam Pribyl 2012-06-25 13:59:10 UTC
Well, the performance on F15 was better, but definitely not good. On the Gbit network I'd expect the write performace to be at the array speed. I've updated only the server, client remains the same - It could be the kernel changes however to 3.4.2 (x86_64) on client too in the meantime. I do not think this is of a big importance, as the performance is bad overall.

Comment 6 J. Bruce Fields 2012-06-25 15:24:58 UTC
Are the top and sysrq-t both taken *after* the performance got bad?

Also: all the above seems to be with exports that have the "async" export option set in /etc/exports, is that correct?  Might be worth seeing what the performance is like without that.

The sysrq-t trace shows 8 threads, mostly waiting for writes.

You probably want more than 8 threads, though I doubt increasing it will fix the problem.

Comment 7 Adam Pribyl 2012-06-25 18:44:12 UTC
Yes, this is when the performance got bad. This should be with default export options - means sync. When I add "async" to exportfs then the performance is much better - almost what is expected.

The strange thing with the threads is, that while in F16 there is roughtly 60kB/s write in iotop per thread, in F15 it was roughly 600kB/s per thread (still too low however).

I know, I can increase the NFS write performance with various options like "insecure, async" etc. but the default behaviour is quite frustrating. I understand there could be some penalty as a trade of for secure and sync, but this is somewhat too much.

Also the load on the server is really abnormal - close to 10, increasing the threads will make it even worse. Client often gets stuck while this slow write on the server happens. When I did this test with 4GB file, I had to hard reset the client, as it get completely unresponsive - I know waiting long enougth to finish the write would also make it wake up again, but I was not willing to wait the 4GB to be transfered at 60kB/s.

Comment 8 Dave Jones 2012-10-23 15:34:15 UTC
# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).

Comment 9 J. Bruce Fields 2012-10-23 18:17:27 UTC
Given the prominence of svc_recv, svc_tcp_has_wspace, and svc_xprt_enable in the perf top output, this is almost certainly the bug fixed by upstream commit d10f27a750 "svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping", which is included in 3.6.

I'm therefore closing this bug under the assumption that it is fixed; however, any confirmation, or information to the contrary, is still welcome.