Bug 759304
Summary: | btrfs: poor performance | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan Kratochvil <jan.kratochvil> | ||||
Component: | kernel | Assignee: | Zach Brown <zab> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 16 | CC: | blair, gansalmon, itamar, jan.kratochvil, jforbes, jonathan, kernel-maint, kxra, madhu.chinakonda, sweil | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-11-14 14:58:35 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jan Kratochvil
2011-12-01 23:23:45 UTC
I found even just under single user mode cp -ax --sparse=always / /new/ & runs ~5MB/s (the disk does ~100MB/s) copying my 600GB btrfs->ext4 for 2 days(!). top - 14:45:33 up 14:43, 2 users, load average: 2.75, 2.93, 2.97 Mem: 5982924k total, 5718436k used, 264488k free, 381000k buffers Swap: 0k total, 0k used, 0k free, 4793812k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1529 root 20 0 0 0 0 R 27.3 0.0 204:52.57 btrfs-delayed-m 618 root 20 0 0 0 0 R 10.3 0.0 41:29.13 btrfs-transacti 4106 root 20 0 0 0 0 S 7.7 0.0 0:46.03 kworker/6:3 1483 root 20 0 127m 11m 500 S 2.6 0.2 21:07.71 cp 610 root 20 0 0 0 0 S 2.3 0.0 11:39.47 btrfs-endio-0 [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. kernel-3.3.6-3.fc16.x86_64 Still poor performance, this time mock --install of many packages making machine unresponsive for several minutes on X220 Intel SSD. top - 21:17:05 up 12 days, 5:19, 24 users, load average: 3.52, 1.88, 1.24 Tasks: 359 total, 2 running, 355 sleeping, 2 stopped, 0 zombie Cpu0 : 0.0%us, 4.9%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu1 : 1.0%us, 4.9%sy, 0.0%ni, 93.1%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu2 : 0.0%us, 84.2%sy, 0.0%ni, 0.0%id, 14.9%wa, 1.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 5.9%sy, 0.0%ni, 94.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7928160k total, 7644644k used, 283516k free, 104k buffers Swap: 8910844k total, 1056624k used, 7854220k free, 5729560k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3112 root 20 0 0 0 0 R 65.2 0.0 3:10.81 flush-btrfs-4 30864 root 20 0 0 0 0 S 18.8 0.0 0:01.83 kworker/2:0 397 root 20 0 15504 1552 1016 R 2.0 0.0 0:00.07 top 31031 root 20 0 0 0 0 S 1.0 0.0 0:00.81 kworker/1:1 Can you try this git tree and tell me if it works better git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git It would be easier with kernel-next.rpm build. Tried: commit af1e297c1beea9d424c09ec0b120226c6b21680d Author: Josef Bacik <josef> Date: Fri Jun 8 15:26:47 2012 -0400 and it works pretty good now like kernel-3.3.6-3.fc16.x86_64, I do not see a difference. Sometimes there are lock ups only up to ~3 seconds. But kernel-3.3.6-3.fc16.x86_64 has worked pretty well now compared to Comment 5. Comment 5 was 12 days running machine using swap etc., on freshly booted box I cannot reproduce it. kernel-3.3.6-3.fc16.x86_64 seems to be better than Comment 0 kernel-3.1.2-1.fc16.x86_64. Also I do not have comparable ext4 box. Two cases where the system was not well responsive during mock --install on the upstream GIT snapshot kernel: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19881 root 20 0 0 0 0 D 4.0 0.0 0:04.40 kworker/1:0 13981 root 20 0 0 0 0 S 2.6 0.0 0:02.22 kworker/2:1 1656 root 20 0 0 0 0 S 1.4 0.0 0:12.56 flush-btrfs-4 12012 root 20 0 0 0 0 D 1.2 0.0 0:03.61 btrfs-endio-wri 16584 root 20 0 118m 1284 1052 D 1.2 0.0 0:00.94 tar 18564 root 20 0 0 0 0 S 1.2 0.0 0:07.15 kworker/0:2 18732 root 20 0 0 0 0 D 1.2 0.0 0:01.62 btrfs-endio-wri 16348 root 20 0 0 0 0 S 1.1 0.0 0:03.27 kworker/3:2 12510 root 20 0 123m 19m 7668 S 1.0 0.2 0:39.05 Xorg 18733 root 20 0 0 0 0 D 1.0 0.0 0:01.40 btrfs-endio-wri 16585 root 20 0 35724 568 460 S 0.8 0.0 0:02.31 pigz 18734 root 20 0 0 0 0 D 0.8 0.0 0:01.38 btrfs-endio-wri 12007 root 20 0 0 0 0 S 0.7 0.0 0:02.52 btrfs-worker-2 14005 root 20 0 0 0 0 D 0.7 0.0 0:01.90 btrfs-endio-wri 16355 root 20 0 0 0 0 D 0.4 0.0 0:01.80 btrfs-endio-wri 554 root 20 0 0 0 0 S 0.3 0.0 0:03.26 btrfs-submit-1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15007 root 20 0 0 0 0 S 6.5 0.0 0:01.13 kworker/2:0 12012 root 20 0 0 0 0 S 2.0 0.0 0:04.02 btrfs-endio-wri Tried to write out 2GB file on Intel SSD with mencoder and it was locked up for ~15 minutes, maybe 20 minutes: kernel-3.4.7-1.fc16.x86_64 top - 17:36:53 up 8:40, 21 users, load average: 14.25, 7.59, 3.28 Tasks: 352 total, 4 running, 348 sleeping, 0 stopped, 0 zombie Cpu0 : 4.0%us, 19.8%sy, 0.0%ni, 40.6%id, 32.7%wa, 0.0%hi, 3.0%si, 0.0%st Cpu1 : 0.0%us, 15.8%sy, 0.0%ni, 56.4%id, 26.7%wa, 0.0%hi, 1.0%si, 0.0%st Cpu2 : 0.0%us, 50.0%sy, 0.0%ni, 37.0%id, 13.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 16.0%sy, 0.0%ni, 75.0%id, 9.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7928068k total, 7726608k used, 201460k free, 48k buffers Swap: 8910844k total, 112608k used, 8798236k free, 5940344k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1169 root 20 0 0 0 0 R 22.8 0.0 0:58.49 flush-btrfs-4 5436 root 20 0 0 0 0 R 9.9 0.0 0:06.05 kworker/2:0 5446 lace 20 0 326m 35m 4752 D 4.0 0.5 0:20.03 mencoder 5408 root 20 0 0 0 0 S 3.0 0.0 0:05.45 kworker/3:1 Moreover even after the file was closed/written at 17:41:20.200789259 and the system was still unusable until 17:47:34, for ~6 minutes. All applications were locked up most of the time. GDB builds are also too slow: $ cat /proc/`pidof ranlib`/stack [<ffffffffa0172715>] wait_current_trans+0xa5/0x110 [btrfs] [<ffffffffa0173cc8>] start_transaction+0x128/0x330 [btrfs] [<ffffffffa01741b3>] btrfs_start_transaction+0x13/0x20 [btrfs] [<ffffffffa017f70d>] btrfs_rename+0x15d/0x680 [btrfs] [<ffffffff81190aa6>] vfs_rename+0x2f6/0x4b0 [<ffffffff81193d33>] sys_renameat+0x1f3/0x220 [<ffffffff81193d7b>] sys_rename+0x1b/0x20 [<ffffffff816047e9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff kernel-3.4.7-1.openvpn.fc16.x86_64 (rebuilt with Bug 834808 Comment 4 patch for nfs-openvpn lock ups) Created attachment 622824 [details]
xz of libbfd.a
Reproducer:
kernel-3.5.4-2.fc17.x86_64
$ sync; time cp -p /tmp/libbfd.a /tmp/libbfd2.a; time sync
btrfs SSD i7-2620M: real 0m0.076s + real 1m7.290s
ext4 HDD i7-920 : real 0m0.343s + real 0m3.190s
$ sync; time ranlib /tmp/libbfd.a; time sync
btrfs SSD i7-2620M: real 1m6.679s + real 0m1.291s
ext4 HDD i7-920 : real 0m0.364s + real 0m2.335s
ranlib hangs and stops the whole system on: rename("/tmp/stbkRwSX", "/tmp/libbfd.a" # Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient). With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report. In this mailing list thread you claim that this bug was closed "without any human reply" but the only reason it was closed is because you did not post to say whether or not it is still a bug. https://lists.fedoraproject.org/pipermail/devel/2013-January/176334.html If it is still a bug, it definitely needs to be fixed. Can you please retest? I was retesting it in Comment 5 and Comment 8 and there was no change. Automatically asking for mass retesting of all the open Bugs (see for example Comment 3) is "not friendly" from the kernel maintainers. Retesting makes sense if the maintainer is aware of a specific fix (which s/he should mention) which could fix the bug. And I even provided in Comment 12 easy self-contained reproducer so that the maintainer can use it for retesting it on his/her own. So there should be no retest request from the reporter until the maintainer believes s/he has fixed it. S/he needs a reproducer for the fix anyway. Here was not even a confirmation Comment 12 makes the bug reproducible for the maintainer. And I even cannot (easily) test it anymore as I had to switch both my boxes back to ext4 as I do not have time to wait 15+ minutes for simple compilation. |