Hide Forgot
Description of problem: Thie is really weird. If I have, say, a CD ISO image on the disk, I can read it with disk speed, about 30MB/s on my hardware, the disk light in on during the read process. Unfortunately, if both of the following is true: 1. the file is mounted with lookback option 2. ppp interface had been configured since boot time then reading the files becaomes terribly slow, less than 2MB/s the disk light is barely blinking; vmstat and ioststa shows a lot of time spent in iowait Version-Release number of selected component (if applicable): Linux version 2.6.22.9-61.fc6 ( kernel-2.6.22.9-61.fc6.i686 ) ppp-2.4.4-1.fc6.i386 How reproducible: Always in my setup; this is P4 Compaq D51S machine with 256MB of RAM, running Fedora Core 6 with KDE. Steps to Reproduce: 1. create test ISO image of 100MiB size as follows: mkdir tmp_test_cd cd tmp_test_cd mkdir CD dd if=/dev/zero bs=1M count=10 of=CD/NULLs_1Mi.f00 cd CD/ for x in $(perl -e 'for ("01" .. "10") { print "$_\n"}'); do ln NULLs_1Mi.f00 NULLs_1Mi.f$x; done cd .. mkisofs -r -q -no-cache-inodes CD > CD_100Mi.iso 2. mount it as loopback (as root) : mkdir /mnt/loop /mnt/loop/CD mount -o ro,loop tmp_test_cd/CD_100MI.iso /mnt/loop/CD 3. check transfer rate : time md5sum -b tmp_test_cd/CD_100MI.iso 4. have another Linux PC connected via null modem cable (connecting two serial ports) 5. run on this machine pppd /dev/ttyS1 115200 crtscts nodefaultroute lock 10.0.1.11:10.0.1.21 deflate 12 and on the other machine pppd /dev/ttyS0 115200 crtscts nodefaultroute lock 10.0.0.21:10.0.0.11 deflate 12 passive (NOTE: ttyS0 or ttyS1 should reflect to which serial port the cable is connected) 6. check again reading the file time md5sum -b tmp_test_cd/CD_100MI.iso as well as reading files on mountes filesystem md5sum -b /mnt/loop/CD/* which is even slower, below 1MB/s During read process vmstat shows like this procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 3 126136 3236 2920 63604 1 1 204 39 96 188 2 1 96 2 0 0 2 126136 4508 2920 62164 0 0 1154 0 108 935 8 2 0 90 0 1 3 126136 3984 2924 62716 0 0 2054 5 114 838 3 3 0 95 0 0 3 126132 3576 2928 62988 0 0 2259 0 121 825 12 2 0 86 0 0 2 126132 4164 2836 62392 0 0 896 0 110 1115 14 3 0 84 0 0 3 126132 2960 2828 63704 0 0 3331 4 110 1175 4 3 0 94 0 0 2 126132 4140 2796 62492 0 0 896 0 98 599 3 1 0 96 0 0 3 126132 3060 2956 62904 122 0 1986 10 173 994 4 3 0 94 0 0 3 126132 3832 2924 62204 0 0 2178 0 105 892 3 1 0 95 0 0 2 126132 4196 2808 61932 0 0 2560 15 105 918 3 2 0 95 0 0 3 126132 4072 2808 62092 0 0 2178 0 103 842 4 2 0 95 0 0 3 126132 3060 2804 62984 0 0 1664 0 101 797 4 1 0 96 0 please note that we have 94% - 96% in iowait state. Actual results: Intolerably slow access to files in mounted loopback system and to the mounted image file. If I kill pppd and remove via rmmod all modules I could find relating to pppd, reading is still very slow. The only way I could find get work it with normal speed is to reboot the system (and do not configure ppp) Expected results: Should be able to read files from fikleysstem mounted via loopback at normal speed even if ppp is configured. On by setup loopback mount is almost unusable when I use pppd with null modem connection to another Linux machine. Additional info: nothing relevant in /var/log/messages or dmesg output
(This is a mass-update to all current FC6 kernel bugs in NEW state) Hello, I'm reviewing this bug list as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug, however this version of Fedora is no longer maintained. Please attempt to reproduce this bug with a current version of Fedora (presently Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a few days if there is no further information lodged. Thanks for using Fedora!
It seems that the real factor deciding whether the problem occurs is whether KDE environment is started or not. I have missed this initially since I have KDE running most of the time. I have reproduced the problem on Fedora 8 with kernel 2.6.23.9-85.fc8. The interesting thing is that while the problem is also evident on FedoraCore 6 with kernel kernel-2.6.22.14-72.fc6.i686, the original FC6 kernel kernel-2.6.18-1.2798.fc6.i686 is perfectly OK. I will follow with more details later today.
Created attachment 293492 [details] iso 9660 image for testing and command to recreate it File CD_700M.iso_gen.tgz contains: mk_tmp_test_largecd.bash_hist - bash commnand to recreate file CD_700Mi.iso CD_700Mi_files_LRGCD.tar.gz.gz.gz - tar of LRGCD directory, compressed 3 times CD_700Mi.iso.gz.gz.gz - 700MiB CD ISO image, compressed 3 times CD_700Mi.iso.md5 - MD5 of decompressed ISO image
Created attachment 293497 [details] bash command to do tests and results on Fedora8 File CD_700M_Fedora8_tests.tgz contain the following files: 080128_CD700loop_F8.bash_hist - command to run tests; path needs to be corrected to one the image is actually installed 080128_CD700loop_F8_mounted1.ddlog - screen log of running commnands 080128_CD700loop_F8_NOTmounted2_KDE.iostat5 - results of iostat -x 5 5 080128_CD700loop_F8_NOTmounted2_KDE.ddlog - screen log of running commands 080128_CD700loop_F8_mounted3_KDE.iostat5 - results of iostat -x 5 5 080128_CD700loop_F8_mounted3_KDE.ddlog - screen log of running commands *mounted1* is with only GDM running (no KDE) *NOTmouned2* is with KDE running but without mounting image *mounted3* is with KDE running and image loop mounted
I tend to believe that the problem I experience is related to new 'tickless' (nohz) kernel feature. This is because - 2.6.18 Fedora6 kernel is OK, and it does not have nohz - 2.6.22, 2.6.23 Fedora kernels on P4 machines I tried, where nohz actually is turned on (verified by grep timer /proc/interrups returning small almost constant number) - there is problem here On my Fedora8 P4 machine (256MB RAM, i8xx based mobo) it is enough to start KDE with 2 konsole-s, 4 bash session each and access to image mounted via loopback is most of the time about 1.3 - 1.5 MB/s down from 25-30MB/s expected. On my Fedora6 P4 machine (256MB RAM, same mobo) and 2.6.22 Fedora kernel sometimes it is enough to start KDE (I have about 10konsoles with about 60 bash sessions total, plus konqueror, tsclient, some CIFS mounts) sometimes I also need to start pppd over serial cable - 2.6.22 on my home machine with Fedora Core 6 does have nohz (tickless) feature, but somehow it does not get enabled, which is verified by grep timer /proc/interrupts returning number increasing at 1000/second rate (that machine is old PIII system, Intel BX440 based motherboard) Notes: - I can see that even without KDE running Fedora8 is about 2 times slower than Fedora6, when accessing the image file. - If the tests are being performed on machine with more than 700 MB of RAM, I would suggest that larger image be created or that pagecache be freed between tests via echo 1 > /proc/sys/vm/drop_caches - Also, If CPU is slower than P4, you might try replacing md5sum with wc -c. - I attempted to disable nohz on P4 machine (Fedora8) by passing nohz=off in kernel parameters from grub, but that did not disable nohz feature.
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. (In reply to comment #5) > - I attempted to disable nohz on P4 machine (Fedora8) by passing > nohz=off > in kernel parameters from grub, but that did not disable nohz feature. Have you tried editing the kernel parameters directly in grub.conf?
Ok, to be usre I tried today with most recent Fedora kernel; The grub.conf entry I was using was title Fedora nohz=off (2.6.23.15-137.fc8) root (hd0,2) kernel /boot/vmlinuz-2.6.23.15-137.fc8 ro root=LABEL=/ rhgb quiet nohz=off initrd /boot/initrd-2.6.23.15-137.fc8.img Output of dmesg command contains the line Kernel command line: ro root=LABEL=/ rhgb quiet nohz=off grep timer /proc/interrupts show constant number during the test, while I expected that nohz=off would result in a number of timer interrupts increasing at a 1000/second rate It seems like nohz=off gets ignored. Anyway, the problem is still present in this kernel. starting KDE results in read data transfer rate for a mounted loopback file to drop from 32MiB/s to 1.4MiB/s (more than 20 times slower). Should I attach more details? Wojtek
(In reply to comment #7) > grep timer /proc/interrupts > show constant number during the test, > while I expected that nohz=off would result > in a number of timer interrupts increasing at a 1000/second rate > It seems like nohz=off gets ignored. > Timers run from the local apic sometimes. Is the "LOC" number increasing?
(In reply to comment #8) > (In reply to comment #7) > > grep timer /proc/interrupts > > show constant number during the test, > > while I expected that nohz=off would result > > in a number of timer interrupts increasing at a 1000/second rate > > It seems like nohz=off gets ignored. > > > > Timers run from the local apic sometimes. Is the "LOC" number increasing? > Yes, LOC number is increasing. I did not pay attention to it (and I cannot access that machine today), so I cannot tell at what rate (but the order of magnitude is same as or close to that of 1000/s) I will attach dmesg output and result of cat /proc/timer_list that I saved. Perhaps the problems with data transfer rate when reading from loop-back mounted images are unrelated to dynticks/nohz?
Created attachment 295551 [details] a few snapshots of /proc/interrupts Actually I have saved enough information to compute LOC rate - it is exactly 1000/s (+-2%). But it is the same on some other machine (2CPU PIII server running FC6 and 2.6.22.14-72.fc6 kernel) which do not has nohz=off kernel parameter specified. So I am not sure is LOC number rate can be used to determine whether nohz=off was effective?
Created attachment 295552 [details] dmesg output Output from running dmesg commnad on the machine the tests were running on.
Created attachment 295553 [details] Contents of /proc/timer_list In case it could be useful, I am enclosing results of cat /proc/timer_list. The contents was taken before doing transfer tests (same for other files attached)
please post output of 'lspci -vnn' from the machine that is having the problem.
Created attachment 298786 [details] lspci output of two machines I am attaching output of lspci with different options. lspci has been run on two almost identical machines (Compaq Deskpro EN SFF, with P4 and 256 MB RAMl; One had motherboard exchanged after failure and has possibly slighttly different SCSI adapter (not in active use, anyway) I am not sure if you can trust lspci output completely - lspci -vvn shows different IRQ values from lspci -vvn -F xxx.lspci-x run on result of lcpci -x > xxx.lspci-x Anyway I assumed that you meant lspci -vvn rather than lspci -vnn, since lspci -vnn and lspci -vn have produced identical output. (On ther other hand lspci -vvvn and lspci -vvn also have produced identical output) Wojtek
I have moved to a newer and faster machine and installed Fedora 8 on that; The system behaves normally. no such problem as described. I have KDE running, serial link active. Move that to X files?
Closing: it got fixed but we don't know why or how.