Bug 869260
Summary: | NFS is very slow | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Edouard Bourguignon <madko> |
Component: | kernel | Assignee: | nfs-maint |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 17 | CC: | bfields, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-08-01 05:37:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Edouard Bourguignon
2012-10-23 12:27:09 UTC
File or directory creation is a synchronous operation under NFS: the client can't create a new object without first hearing back from the server, and the server can't respond to the client's create request without first committing the new object to disk. The linux kernel source has 40,000-some files and directories. If each create requires a disk seek, and the average disk seek takes 10 milliseconds, then creating all of them would take at least 400 seconds. In practice, more--but an hour is surprising. Write latency is what matters the most here--what kind of device is your exported filesystem on? You can work around the problem with async exports, but that is likely to result in data loss if, for example, there is a power failure at the wrong moment. NFS server is a HP Proliant N40L 8Gio RAM, exported filesystem is ext4 on 2HDD in software RAID1. NFS client is Intel i5 650 8Gio RAM. It takes only 1m22 on the same share exported via samba, on the same client. 1m22 likely isn't possible with that hardware. I'm not sure how SMB works, but it may just not be designed to survive reboots in the same way NFS is, in which case they can be less careful about committing changes to disk before replying. I don't know whether it's safer than an "async" nfs export. For best performance you'll need something with very low write latency, like an array with a battery-backed write cache. For comparison, on ext4 over software RAID0 across two SATA drives on an F16 box, I'm getting: $ time tar -xjf linux-3.7-rc2.tar.bz2 real 5m53.545s ... I'd expect RAID1 to be a little slower, but if it's taking an hour then there's a bug. In my case if I do sync; time for f in $(seq 1 100); do touch $f; sync; done on the server, in the exported filesystem, I find it takes about a second, so about 10ms per create. A 4000-file tarball would therefore be expected to take about 400 seconds, and that's in about the same ballpark as I saw above. That's kind of a dumb benchmark, there may be something better, but you might try something similar to make sure there isn't a problem with performance of synchronous write operations on the exported filesystem. On the server: $ time (tar zxf linux-2.6.32.9-apc.tar.gz && sync); du -sh linux-2.6.32.9-apc real 0m18.648s user 0m5.242s sys 0m3.627s 426M linux-2.6.32.9-apc On the client (CIFS/NFS async): $ sync;time (tar zxf linux-2.6.32.9-apc.tar.gz && sync) real 1m47.177s user 0m6.476s sys 0m14.490s $ sync; time for f in $(seq 1 100); do touch $f; sync; done real 0m2.362s user 0m0.028s sys 0m1.978s On the client (NFS sync): $ sync;time (tar zxf linux-2.6.32.9-apc.tar.gz && sync) real 53m15.908s user 0m6.510s sys 0m11.541s $ sync; time for f in $(seq 1 100); do touch $f; sync; done real 0m4.287s user 0m0.046s sys 0m2.282s about 33000 files in this tar. bz2 uses more cpu, may be this is why we have some differences? With NCQ the seek time could be around 1ms instead of 10ms? (if tracks are adjacents) Less than 2min could then be realistic? $ sync; time for f in $(seq 1 100); do touch $f; sync; done real 0m4.287s user 0m0.046s sys 0m2.282s This was done on the client? Actually I was curious what that would look like done on the server side. Anyway if that's right, it's taking almost 40ms to create a file, about 4 times slower than in my case. But your untar is taking about 10 times longer. I doubt cpu time is an issue, but you could try watching in top. And if it's mostly synchronous operations, then I doubt NCQ would make a big difference, but I don't know. What does "ping server" on the client report as the round-trip time to the server? Output of "mountstats --nfs" and "mountstats --rpc" after the untar might also be interesting. Oops sorry forgot the one on the server side: $ sync; time for f in $(seq 1 100); do touch $f; sync; done real 0m16.336s user 0m0.061s sys 0m0.243s 160ms per file, ~88min for 33000 files... not so good. Could explain why it's so slow on the client. I will try to tune this part first. I will post the mountstats as soon as possible. The ping rtt should be good since there is only one Gbps switch between the server and the client. Thanks for your direction and time. Note one of the reasons I say that's kind of a lousy benchmark is the "sync" sync's everything on every filesystem, so if there just happened to be some other activity at the same time that would throw off the measurement--so might want to try a few of those just to make sure that one 16s measurement wasn't a fluke. But yeah, if that's right, it would definitely be a problem. This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |