Description of problem: IDE performance down by 40% in FC5 vs FC3. Logged vs hal just because killing hald alleviates the regression - but it's a userspace issue and I have no clue as to which component this might be assigned to. Version-Release number of selected component (if applicable): n/a How reproducible: 100% Steps to Reproduce: 1. boot Fedora or vanilla kernel >2.6.15 in FC5 2. run hdparm -t on an IDE drive 3. reboot into FC3 and do the same (preferably with same vanilla kernel as in step 1 so it clearly shows up as a userspace issue) Actual results: FC3 hdparm: 33-35MB/s FC5 hdparm: 17-19MB/s Expected results: FC5 hdparm: 33-35MB/s Additional info: Booting into /sbin/init -b on FC5 and manually creating the device node for /dev/hda yields 33-35MB/s as expected. Discussion on lkml: http://www.ussg.iu.edu/hypermail/linux/kernel/0604.1/0046.html
Hi, i'm experiencing the same problem here. Asus Laptop W1N with 2,5" Toshiba IDE HD. # With hald running hdparm -t /dev/hdb Timing buffered disk reads: 74 MB in 3.01 seconds = 24.62 MB/sec # Without hald running /dev/hdb: Timing buffered disk reads: 104 MB in 3.01 seconds = 34.50 MB/sec With FC5 kernel (tested from FC5 release to current 2.6.16-1.2111_FC5) and haldaemon running, the whole system load average is always too high (from 0.30 to 0.80), tested in runlevel 1 too, with only haldaemon running (without hal running the load goes to 0.00). HAL definitively drop down the system performance... i ran HAL with # strace hald --daemon=no but i can't see any problem here. i tryed with rawhide kernel also (2.6.16-1.2204_FC6 = 2.6.17rc4-git3 ), also slow here. i haven't tryed the vanilla kernel, but this is a step to try. Follow brief description of my system details (if you need more detailed descriptions let me know): kernel-2.6.16-1.2111_FC5 glibc-headers-2.4-8 glibc-kernheaders-3.0-5.2 glibc-common-2.4-8 glibc-devel-2.4-8 glibc-2.4-8 hal-devel-0.5.7-3.fc5.1 hal-gnome-0.5.7-3.fc5.1 hal-cups-utils-0.5.5-1.2.fc5.2 hal-0.5.7-3.fc5.1 dbus-0.61-3.fc5.1 dbus-glib-0.61-3.fc5.1 dbus-python-0.61-3.fc5.1 dbus-x11-0.61-3.fc5.1 dbus-sharp-0.61-3.fc5.1 dbus-devel-0.61-3.fc5.1 # /etc/modprobe.conf alias eth0 skge alias eth1 ipw2200 alias snd-card-0 snd-intel8x0 options snd-card-0 index=0 options snd-intel8x0 index=0 remove snd-intel8x0 { /usr/sbin/alsactl store 0 >/dev/null 2>&1 || : ; }; /sbin/modprobe -r --ignore-remove snd-intel8x0 # lsmod vmnet 28964 3 parport_pc 25445 0 vmmon 167692 0 fglrx 458880 7 intermodule 4293 1 fglrx ipv6 225697 18 rfcomm 34517 0 l2cap 23617 5 rfcomm bluetooth 44069 4 rfcomm,l2cap tun 11073 0 dm_mirror 19985 0 dm_mod 50905 1 dm_mirror video 14917 0 button 6609 0 battery 9285 0 asus_acpi 11221 0 ac 4933 0 lp 12297 0 parport 34313 2 parport_pc,lp nvram 8393 0 tuner 46457 0 ehci_hcd 29005 0 uhci_hcd 28881 0 ohci1394 31749 0 joydev 9473 0 ieee1394 288665 1 ohci1394 snd_intel8x0m 16077 0 snd_intel8x0 30301 14 snd_seq_dummy 3781 0 snd_ac97_codec 83937 2 snd_intel8x0m,snd_intel8x0 snd_ac97_bus 2497 1 snd_ac97_codec snd_seq_oss 28993 0 snd_seq_midi_event 7105 1 snd_seq_oss snd_seq 47153 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event snd_seq_device 8909 3 snd_seq_dummy,snd_seq_oss,snd_seq snd_pcm_oss 45009 0 saa7134 106401 0 video_buf 21317 1 saa7134 compat_ioctl32 1473 1 saa7134 v4l2_common 7745 2 tuner,saa7134 v4l1_compat 11973 1 saa7134 ir_kbd_i2c 8269 1 saa7134 ir_common 9413 2 saa7134,ir_kbd_i2c videodev 9409 1 saa7134 ipw2200 95633 0 ieee80211 28681 1 ipw2200 ieee80211_crypt 6081 1 ieee80211 snd_mixer_oss 16449 13 snd_pcm_oss snd_pcm 76869 4 snd_intel8x0m,snd_intel8x0,snd_ac97_codec,snd_pcm_oss skge 34897 0 snd_timer 22597 2 snd_seq,snd_pcm snd 50501 14 snd_intel8x0m,snd_intel8x0,snd_ac97_codec,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer i2c_i801 8525 0 i2c_core 20673 4 tuner,saa7134,ir_kbd_i2c,i2c_i801 soundcore 9377 13 snd snd_page_alloc 10441 3 snd_intel8x0m,snd_intel8x0,snd_pcm ext3 116169 2 jbd 52693 1 ext3 Follow my system hardware details: CPU: Intel Centrino 1,8 Ghz RAM: 1 GB Video: ATI Radeon 9700 running with fglrx ati driver HD: 80 GB Toshiba 5400RPM Vendor Url: http://www.asus.com/products4.aspx?l1=5&l2=22&l3=123&model=16&modelmenu=1 Best Regards PS. Ciao Alessandro ;-)
I discovered the following situation: If i insert a media in the DVD reader, the load average caused from hald goes down near to 0.03. The IDE layout of my laptop is: /dev/hda = DVD R/RW /dev/hdb = Toshiba Hard Disk I think the cause is related to hald that loop in pooling the DVD reader when no media is inserted. so, actually the solution is: a) Disable haldaemon service b) insert a media in the dvd or cd-rom reader also hdparm -t /dev/hdb with hald running and a media inserted show me the same result like when hald is stopped. examples tryed: 1) hald running + no media inserted: load average: 0.31, 0.29, 0.27 hdparm output: 70 MB in 3.05 seconds = 21.92 MB/sec 2) hald stopped + no media inserted: load average: 0.04, 0.07, 0.09 hdparm output: 104 MB in 3.02 seconds = 34.47 MB/sec 3) hald running + dvd media inserted: load average: 0.06, 0.10, 0.09 hdparm output: 104 MB in 3.01 seconds = 34.56 MB/sec i'll try the cvs version of haldaemon to see if this bug is solved. Best Regards
Looks like a kernel issue with HAL's polling for media inserts. We poll every two seconds. Actually we used to have a blacklist because of this and took out the blacklist in one of the upstream updates. 2005-10-21 Danny Kukawka <danny.kukawka> * fdi/preprobe/10osvendor/10-ide-drives.fdi: removed no longer needed blacklist entry for 'HL-DT-STCD-RW/DVD-ROM GCC-4240N'. This work fine at least with kernel 2.6.13 (tested with SUSE). We also had an old bug open about this (Bug #138148) but it was closed becuase it was stuck in NEEDINFO. Apparently it has to do with the CD drive being on the same channel as the HD. Anyway, kernel issue. Reassigning there. Please follow up on any NEEDINFO requests. Thanks.
(In reply to comment #3) > Apparently it has to do with the CD drive being on the same channel as the HD. > Anyway, kernel issue. Reassigning there. Yes, i confirm this. I've many FC5 systems, and this problem append only on systems with cd/dvd readers on the same hd channel.
All of the above is interesting, but Ugo's problem is different from mine. My system is a desktop, and has hda: SAMSUNG SP1604N, ATA DISK drive hdb: Maxtor 6Y160P0, ATA DISK drive hdc: TSSTcorpCD/DVDW TS-H552B, ATAPI CD/DVD-ROM drive Load average is the following with Bram's bittorrent running: [root@donkey ~]# w 21:00:22 up 7 days, 18:07, 4 users, load average: 0.00, 0.00, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT asuardi pts/0 :2.0 14May06 9:40m 0.45s 0.45s bash asuardi pts/1 :2.0 14May06 7days 0.05s 0.05s bash asuardi pts/2 :2.0 14May06 7days 0.10s 12.22s gnome-terminal root pts/3 sandman 20:51 0.00s 0.12s 0.01s w Stopping haldaemon is simply alleviating the IDE disk performance problem, but DOES NOT fix it by any means; the results below do NOT change when I insert media in the DVD drive. [root@donkey ~]# hdparm -t /dev/hda /dev/hda: Timing buffered disk reads: 58 MB in 3.07 seconds = 18.91 MB/sec [root@donkey ~]# hdparm -t /dev/hdb /dev/hdb: Timing buffered disk reads: 56 MB in 3.03 seconds = 18.45 MB/sec [root@donkey ~]# /etc/init.d/haldaemon stop Stopping HAL daemon: [ OK ] [root@donkey ~]# hdparm -t /dev/hda /dev/hda: Timing buffered disk reads: 62 MB in 3.09 seconds = 20.08 MB/sec [root@donkey ~]# hdparm -t /dev/hdb /dev/hdb: Timing buffered disk reads: 62 MB in 3.07 seconds = 20.19 MB/sec Those disks are reporting >33MB/s when system boots as /sbin/init -b, or by default in FC3. So please, this is a userspace problem - the SAME kernel under FC3 and FC5 yields DIFFERENT results.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Hi Dave, actually i'm running Fedora Core 6 updated to rawhide branch (last update: 20061017) kernel-2.6.18-1.2784.fc6 hal-0.5.8.1-4.fc6 Hardware info: ASUS W1N Laptop hda = IDE DVDROM/Writer hdb = TOSHIBA IDE Disk 80 GB Tests: 1) haldaemon running + cd media inserted [root@hidlt ~]# hdparm -t /dev/hdb /dev/hdb: Timing buffered disk reads: 92 MB in 3.06 seconds = 30.06 MB/sec 2) haldaemon running + no cd media inserted [root@hidlt ~]# hdparm -t /dev/hdb /dev/hdb: Timing buffered disk reads: 68 MB in 3.06 seconds = 22.24 MB/sec 3) hadaemon stopped + cd media inserted [root@hidlt ~]# hdparm -t /dev/hdb /dev/hdb: Timing buffered disk reads: 98 MB in 3.07 seconds = 31.97 MB/sec 4) hadaemon stopped + no cd media inserted [root@hidlt ~]# hdparm -t /dev/hdb /dev/hdb: Timing buffered disk reads: 98 MB in 3.02 seconds = 32.43 MB/sec Results: the problem is not solved :-( I don't know if this is a hal or kernel bug. Best Regards
From comment #5: "Those disks are reporting >33MB/s when system boots as /sbin/init -b, or by default in FC3. So please, this is a userspace problem - the SAME kernel under FC3 and FC5 yields DIFFERENT results." This has to be a hal bug. If it were kernel, we'd see the lousy performance in all runlevels.
Again, all we do in hal is to open the device every two seconds; looks like the IDE lernel drivers and/or the hardware has issues scheduling the commands to attain desired performance. I'm not sure why it makes sense to blame hal for this so reassigning back to the kernel; it's a well-known IDE problem. It could be interesting to know if Windows exhibit the same problems. (Some things are planned like 1) reduced polling interval (say, every 20 secs) when no user is logged in; 2) reduced polling interval when running on battery (say, every 5 or 10 seconds); and 3) enable apps to disable polling (think DVD watching app; we'll readahead 500MB and can leave the drive powered down for 30 minutes saving battery). But that's another story.)
If hal isn't to blame how do you explain that the performance is normal in runlevels where hal isn't running ?
In response to comment 10: What we do is essentially this while (TRUE) { fd = open (device_file, O_RDONLY | O_NONBLOCK | O_EXCL); if (fd < 0) goto skip; [... code for investigating the media ...] skip: sleep (2) } See http://gitweb.freedesktop.org/?p=hal.git;a=blob;h=22247cfc4ec90f05ba3707b2518faa059f5e6d7e;hb=9c99fc03fbac6380032a6678c641a76ef02ad834;f=hald/linux/addons/addon-storage.c for the actual code. The crux of the problems is simply that the hard disk and the optical drive share the same IDE channel (stupid decision by Toshiba, same seen for some Dell laptops too but understandable when one wants to support "media bays" before SATA became etc. prevalent). The effect of HAL opening /dev/hdb every two seconds affects the throughput of /dev/hda since the IDE channel is hogged. I also believe the problems go away if you mount a data disc in /dev/hdb (but am not entirely sure). So either this is a hardware limitation or a kernel driver problem. You could also say that user space is stupid wanting to detect media changed but that is getting old fast. Anyway, as noted in comment 9 we're moving towards more configurability and control from user space, mostly to save battery etc., but perhaps we can have a list of drive ID's etc. we know we shouldn't poll every two seconds, maybe only every five or so. I have no idea how Windows deal with hence why I asked for numbers. If Windows is doing fine and checks for media every two seconds, chances are it's a Linux kernel driver problem. Alan can probably explain this a lot better.
(In reply to comment #11) > In response to comment 10: What we do is essentially this > > while (TRUE) { > fd = open (device_file, O_RDONLY | O_NONBLOCK | O_EXCL); > if (fd < 0) > goto skip; > [... code for investigating the media ...] > skip: > sleep (2) > } > > See > > http://gitweb.freedesktop.org/?p=hal.git;a=blob;h=22247cfc4ec90f05ba3707b2518faa059f5e6d7e;hb=9c99fc03fbac6380032a6678c641a76ef02ad834;f=hald/linux/addons/addon-storage.c > > for the actual code. > > The crux of the problems is simply that the hard disk and the optical drive > share the same IDE channel (stupid decision by Toshiba, same seen for some Dell > laptops too but understandable when one wants to support "media bays" before > SATA became etc. prevalent). The effect of HAL opening /dev/hdb every two > seconds affects the throughput of /dev/hda since the IDE channel is hogged. I > also believe the problems go away if you mount a data disc in /dev/hdb (but am > not entirely sure). > > So either this is a hardware limitation or a kernel driver problem. You could > also say that user space is stupid wanting to detect media changed but that is > getting old fast. > > Anyway, as noted in comment 9 we're moving towards more configurability and > control from user space, mostly to save battery etc., but perhaps we can have a > list of drive ID's etc. we know we shouldn't poll every two seconds, maybe only > every five or so. > > I have no idea how Windows deal with hence why I asked for numbers. If Windows > is doing fine and checks for media every two seconds, chances are it's a Linux > kernel driver problem. > > Alan can probably explain this a lot better. > I will give a spin to the most recent FC5 kernel to my K7-800 desktop, which I logged the bug for, and that has two IDE disks on different channels, both of which experience the slowdown since FC3. I also have a DVD burner, which obviously is on just one IDE channel, and for which putting in a DVD disc in the drive or not makes absolutely no difference. Just this week I used an old 2.6.16-something Torvalds kernel in FC3 to burn DVDs at 7.8x with my box, which can't even come close to such burning speed in FC5, due to the IDE disks being that much slower. Interestingly enough, it *seems* (still need more detailed testing) that an identical 2.6.19-rc2 (compiled from another FC5 box for both FC3 and FC5) is slow (~20MB/s) on both FC3 _and_ FC5, which will probably require a detailed matrix of tests :( Or is it wrong to build a kernel for FC3 on a box with a FC5 userspace ? I'm also afraid that after the recent installation of a skge Gigabit card in the box, my older kernels can't do fast USB anymore (the USB 2.0 PCI card is in the slot close to the PCI Gigabit ethernet) and they choke at 1MB/s compared to earlier 33MB/s, while 2.6.19-rc2 gets to 20MB/s - as in the IDE case. I'll try to work on more details by this weekend. And since I'm apparently the only one reporting this problem (to this extent) on FC5, please let me know whether you're interested in still pursuing this on FC5 or I'd rather just wipe FC5 out and install FC6 on the same box - I'll be keeping FC3 around obviously for its far better DVD burning performance...
Reassigning back to HAL this is not a kernel problem. When you ask some drives to do stuff they go through a full power on/off cycle to save power. This takes time and you jam the bus for it. Vendors appear to ship a diferently tuned windows for such systems or perhaps are using non polling approaches on drives that support them.
OK, I FINALLY found the actual bug - and it's a kernel bug. The reason why userspace seemed to be the culprit was that my kernels followed my hardware, and until I did NOT have an external USB disk, the problem did not appear. More or less at the same time I bought the external USB disk _and_ configured FC5 on the new partition - with EHCI support, while the older FC3 kernels didn't have such support; after all they were meant to be replaced by FC5. I compiled a 2.6.19-rc3-git4 kernel for FC3 with *all* options as the FC5 one, and that as well began showing the same behavior - init -b booting at 40MB/s then dropping to 20MB/s at a certain point. Such point being, after lenghty trials editing /etc/rc.d/rc.sysinit, the loading of ehci_hcd module. The FC6 kernel also shows hdparm -t at 20MB/s, but the figure zips to 39MB/s right after unloading ehci_hcd. Pity my USB disk at that point crawls at < 1MB/s from its original > 20MB/s speed :( I guess at this point the bug abstract should be changed to something like "loading ehci_hcd slows down IDE disk performance" Opinions ?
over to our usb guru... Pete, any ideas what on earth could be happening here?
Discussion on LKML has shown that the problem is an exceedingly aggressive setting on a specific VIA EHCI chipset that hammers the PCI bus every 1us instead of 10us - and turned up the patch here: http://lkml.org/lkml/2008/3/17/340 ...which works for me - FC6 booting a kernel.org 2.6.25-rc6-git2 with the above patch on top brings my hdparm -t back to 35+ and 37+ MB/s for hda and hdb respectively.
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
I don't know why this was placed into NEEDINFO, so I changed this to MODIFIED. I provided a pointer to a _working_ patch which should be soon (if not already) in the kernel.org mainline kernel. NEEDINFO is the wrongest possible status :)
That patch looks ok to me. I'll get this into rawhide for F9. We should probably also backport it to F7 & F8.
changing version to 'rawhide' and adding to tracker for f9
Current rawhide kernel definitely contains the patch mentioned. Can you retest this and confirm the fix, or should we just close the bug?
I'm for closing the bug - patch is very self-contained and already tested. Furthermore my box is still running FC6, current uptime at 29 days and that's only because I installed the patch on top of the back-then kernel-du-jour... I'm actually waiting for F9 to be released, as I upgrade that box every three Fedora cycles ;) Thanks, --alessandro
Fair enough. Closing as per request.