Description of problem: Installed my test server on top of btrfs but it is not usable as the testsuites are timing out. Version-Release number of selected component (if applicable): kernel-3.1.2-1.fc16.x86_64 How reproducible: Steadily. Steps to Reproduce: nice -n20 ionice -c3 du &>/dev/null & time ls -l /bin Actual results: real 0m12.114s Just waiting for a command 10+ seconds commonly. Also the btrfs processes seem to use CPU a lot: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1441 root 20 0 0 0 0 R 56.9 0.0 28:49.98 btrfs-endio-wri 629 root 20 0 0 0 0 R 54.9 0.0 23:18.88 btrfs-delayed-m 29126 root 20 0 0 0 0 R 53.0 0.0 9:30.76 flush-btrfs-4 FYI using it on top of LUKS on HDD. My other machine with SSD runs fine but it is not the test server. Expected results: less than 1 sec Additional info: I was using ext3 on F14 before and with nice/ionice the system was perfectly usable even for interactive work. The machine was better usable even without nice/ionice. GDB testsuite run (3 parallel runs) now gets 13 (!) timeout results. I will have to reinstall it into ext4 as the results are not usable this way. Sure thanks for all the work, this is FYI if you have some patches.
I found even just under single user mode cp -ax --sparse=always / /new/ & runs ~5MB/s (the disk does ~100MB/s) copying my 600GB btrfs->ext4 for 2 days(!). top - 14:45:33 up 14:43, 2 users, load average: 2.75, 2.93, 2.97 Mem: 5982924k total, 5718436k used, 264488k free, 381000k buffers Swap: 0k total, 0k used, 0k free, 4793812k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1529 root 20 0 0 0 0 R 27.3 0.0 204:52.57 btrfs-delayed-m 618 root 20 0 0 0 0 R 10.3 0.0 41:29.13 btrfs-transacti 4106 root 20 0 0 0 0 S 7.7 0.0 0:46.03 kworker/6:3 1483 root 20 0 127m 11m 500 S 2.6 0.2 21:07.71 cp 610 root 20 0 0 0 0 S 2.3 0.0 11:39.47 btrfs-endio-0
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
kernel-3.3.6-3.fc16.x86_64 Still poor performance, this time mock --install of many packages making machine unresponsive for several minutes on X220 Intel SSD. top - 21:17:05 up 12 days, 5:19, 24 users, load average: 3.52, 1.88, 1.24 Tasks: 359 total, 2 running, 355 sleeping, 2 stopped, 0 zombie Cpu0 : 0.0%us, 4.9%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu1 : 1.0%us, 4.9%sy, 0.0%ni, 93.1%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu2 : 0.0%us, 84.2%sy, 0.0%ni, 0.0%id, 14.9%wa, 1.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 5.9%sy, 0.0%ni, 94.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7928160k total, 7644644k used, 283516k free, 104k buffers Swap: 8910844k total, 1056624k used, 7854220k free, 5729560k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3112 root 20 0 0 0 0 R 65.2 0.0 3:10.81 flush-btrfs-4 30864 root 20 0 0 0 0 S 18.8 0.0 0:01.83 kworker/2:0 397 root 20 0 15504 1552 1016 R 2.0 0.0 0:00.07 top 31031 root 20 0 0 0 0 S 1.0 0.0 0:00.81 kworker/1:1
Can you try this git tree and tell me if it works better git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
It would be easier with kernel-next.rpm build.
Tried: commit af1e297c1beea9d424c09ec0b120226c6b21680d Author: Josef Bacik <josef> Date: Fri Jun 8 15:26:47 2012 -0400 and it works pretty good now like kernel-3.3.6-3.fc16.x86_64, I do not see a difference. Sometimes there are lock ups only up to ~3 seconds. But kernel-3.3.6-3.fc16.x86_64 has worked pretty well now compared to Comment 5. Comment 5 was 12 days running machine using swap etc., on freshly booted box I cannot reproduce it. kernel-3.3.6-3.fc16.x86_64 seems to be better than Comment 0 kernel-3.1.2-1.fc16.x86_64. Also I do not have comparable ext4 box.
Two cases where the system was not well responsive during mock --install on the upstream GIT snapshot kernel: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19881 root 20 0 0 0 0 D 4.0 0.0 0:04.40 kworker/1:0 13981 root 20 0 0 0 0 S 2.6 0.0 0:02.22 kworker/2:1 1656 root 20 0 0 0 0 S 1.4 0.0 0:12.56 flush-btrfs-4 12012 root 20 0 0 0 0 D 1.2 0.0 0:03.61 btrfs-endio-wri 16584 root 20 0 118m 1284 1052 D 1.2 0.0 0:00.94 tar 18564 root 20 0 0 0 0 S 1.2 0.0 0:07.15 kworker/0:2 18732 root 20 0 0 0 0 D 1.2 0.0 0:01.62 btrfs-endio-wri 16348 root 20 0 0 0 0 S 1.1 0.0 0:03.27 kworker/3:2 12510 root 20 0 123m 19m 7668 S 1.0 0.2 0:39.05 Xorg 18733 root 20 0 0 0 0 D 1.0 0.0 0:01.40 btrfs-endio-wri 16585 root 20 0 35724 568 460 S 0.8 0.0 0:02.31 pigz 18734 root 20 0 0 0 0 D 0.8 0.0 0:01.38 btrfs-endio-wri 12007 root 20 0 0 0 0 S 0.7 0.0 0:02.52 btrfs-worker-2 14005 root 20 0 0 0 0 D 0.7 0.0 0:01.90 btrfs-endio-wri 16355 root 20 0 0 0 0 D 0.4 0.0 0:01.80 btrfs-endio-wri 554 root 20 0 0 0 0 S 0.3 0.0 0:03.26 btrfs-submit-1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15007 root 20 0 0 0 0 S 6.5 0.0 0:01.13 kworker/2:0 12012 root 20 0 0 0 0 S 2.0 0.0 0:04.02 btrfs-endio-wri
Tried to write out 2GB file on Intel SSD with mencoder and it was locked up for ~15 minutes, maybe 20 minutes: kernel-3.4.7-1.fc16.x86_64 top - 17:36:53 up 8:40, 21 users, load average: 14.25, 7.59, 3.28 Tasks: 352 total, 4 running, 348 sleeping, 0 stopped, 0 zombie Cpu0 : 4.0%us, 19.8%sy, 0.0%ni, 40.6%id, 32.7%wa, 0.0%hi, 3.0%si, 0.0%st Cpu1 : 0.0%us, 15.8%sy, 0.0%ni, 56.4%id, 26.7%wa, 0.0%hi, 1.0%si, 0.0%st Cpu2 : 0.0%us, 50.0%sy, 0.0%ni, 37.0%id, 13.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 16.0%sy, 0.0%ni, 75.0%id, 9.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7928068k total, 7726608k used, 201460k free, 48k buffers Swap: 8910844k total, 112608k used, 8798236k free, 5940344k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1169 root 20 0 0 0 0 R 22.8 0.0 0:58.49 flush-btrfs-4 5436 root 20 0 0 0 0 R 9.9 0.0 0:06.05 kworker/2:0 5446 lace 20 0 326m 35m 4752 D 4.0 0.5 0:20.03 mencoder 5408 root 20 0 0 0 0 S 3.0 0.0 0:05.45 kworker/3:1 Moreover even after the file was closed/written at 17:41:20.200789259 and the system was still unusable until 17:47:34, for ~6 minutes. All applications were locked up most of the time.
GDB builds are also too slow: $ cat /proc/`pidof ranlib`/stack [<ffffffffa0172715>] wait_current_trans+0xa5/0x110 [btrfs] [<ffffffffa0173cc8>] start_transaction+0x128/0x330 [btrfs] [<ffffffffa01741b3>] btrfs_start_transaction+0x13/0x20 [btrfs] [<ffffffffa017f70d>] btrfs_rename+0x15d/0x680 [btrfs] [<ffffffff81190aa6>] vfs_rename+0x2f6/0x4b0 [<ffffffff81193d33>] sys_renameat+0x1f3/0x220 [<ffffffff81193d7b>] sys_rename+0x1b/0x20 [<ffffffff816047e9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff kernel-3.4.7-1.openvpn.fc16.x86_64 (rebuilt with Bug 834808 Comment 4 patch for nfs-openvpn lock ups)
Created attachment 622824 [details] xz of libbfd.a Reproducer: kernel-3.5.4-2.fc17.x86_64 $ sync; time cp -p /tmp/libbfd.a /tmp/libbfd2.a; time sync btrfs SSD i7-2620M: real 0m0.076s + real 1m7.290s ext4 HDD i7-920 : real 0m0.343s + real 0m3.190s $ sync; time ranlib /tmp/libbfd.a; time sync btrfs SSD i7-2620M: real 1m6.679s + real 0m1.291s ext4 HDD i7-920 : real 0m0.364s + real 0m2.335s
ranlib hangs and stops the whole system on: rename("/tmp/stbkRwSX", "/tmp/libbfd.a"
# Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.
In this mailing list thread you claim that this bug was closed "without any human reply" but the only reason it was closed is because you did not post to say whether or not it is still a bug. https://lists.fedoraproject.org/pipermail/devel/2013-January/176334.html If it is still a bug, it definitely needs to be fixed. Can you please retest?
I was retesting it in Comment 5 and Comment 8 and there was no change. Automatically asking for mass retesting of all the open Bugs (see for example Comment 3) is "not friendly" from the kernel maintainers. Retesting makes sense if the maintainer is aware of a specific fix (which s/he should mention) which could fix the bug. And I even provided in Comment 12 easy self-contained reproducer so that the maintainer can use it for retesting it on his/her own. So there should be no retest request from the reporter until the maintainer believes s/he has fixed it. S/he needs a reproducer for the fix anyway. Here was not even a confirmation Comment 12 makes the bug reproducible for the maintainer. And I even cannot (easily) test it anymore as I had to switch both my boxes back to ext4 as I do not have time to wait 15+ minutes for simple compilation.