Description of problem: Every few nano seconds, hald-addon-storage wakes up, opens my CDROM and causes in excess of 20% CPU. top - 00:44:24 up 8:36, 1 user, load average: 0.41, 0.30, 0.27 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2430 root 16 0 1924 660 580 S 20 0.0 42:55.74 hald-addon-stor Strace says it is repeatedly opening /dev/hdc - CDROM. Why in god's name does it have to poll a CDROM so fast and why would it need 20% of CPU for that? Version-Release number of selected component (if applicable): hal-0.5.8.1-6.fc6 How reproducible: Always Steps to Reproduce: 1. Just run a FC6 system with CDROM drive and watch top output 2. 3. Actual results: hald-addon-storage sucks CPU like crazy Expected results: Should not wake up so often and should not cause this much CPU utilization. Additional info: It sucks when you find out your laptop's fans are blowing to keep hald-addon-storage running and polling the non existent CD in the CDROM drive and causing increased power consumption.
Strace output [root@localhost ~]# strace -p 2430 Process 2430 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0 open("/dev/hdc", O_RDONLY|O_NONBLOCK|O_EXCL|O_LARGEFILE) = 4 ioctl(4, CDROM_DRIVE_STATUS, 0x7fffffff) = 1 close(4) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({2, 0}, {2, 0}) = 0 open("/dev/hdc", O_RDONLY|O_NONBLOCK|O_EXCL|O_LARGEFILE) = 4 ioctl(4, CDROM_DRIVE_STATUS, 0x7fffffff) = 1 close(4) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({2, 0}, <unfinished ...> Process 2430 detached
*** Bug 221690 has been marked as a duplicate of this bug. ***
Could you attach an strace using the -tt options? Also, please attach the output of lshal - it smells like broken hardware / broken kernel driver to me. Thanks.
Created attachment 144983 [details] strace -tt output
Created attachment 144984 [details] lshal output
Thanks for the attachments. If you kill the addon process does the problem go away? Thanks. (btw, please mark attachments having MIME type text/plain the next time; it makes it easier to open in Firefox)
Well obviously the problem goes away when I kill hald-addon-storage - that's what I am doing currently to avoid CPU consumption. In {Open}Suse HAL RPMS I saw a changelog entry stating *"disable polling on SATA CDROM" - I don't see that in FC6 hal changelog. May be it's related? * http://rpmfind.net/linux/RPM/suse/updates/10.0/i386/rpm/ppc/hal-64bit-0.5.4-6.2.ppc.html
A few things - the kernel drivers needs to be able to cope with user space opening a device file every two seconds. Period. - sometimes (most of the time, not always) the hardware is just broken and one cannot poll at all. If that's the case (but that requires proving the kernel driver is not buggy) we can disable polling from HAL; we used to do that for a few drives but don't anymore as a newer revision of the kernel driver magically fixed that - Your drive is /dev/hdc so probably not SATA - There is some work going on to make polling unneeded, that's tracked in bug 204969 As such I'm reassigning this bug to the kernel. Feel free to reassign back if it's not a kernel problem. I'm also adding myself as Cc if there are any questions / concerns. Thanks!
I am sure my hardware isn't broken - I have used other distros and never had a problem. That /dev/hdc is due to Fedora drivers being PATA. To verify if the PATA driver causes this problem or not, I tried booting with a kernel with all SATA drivers and unfortunately haldaemon, Avahi daemon and bunch of other things fail to start on this kernel - separate issue. I am not sure I agree about the 2 second polling period. I guess it should be set somewhere in config instead of hard coding it. Will submit a patch if I manage to fix it.
With libata.atapi_enabled=1 and combined_mode=libata, it no longer hogs CPU. So seems to me like the PATA driver has some kind of bug.
I can confirm this bug. I see the same symptons on my Via EPIA 15000 too running updated FC6 (IDE cd-rom). It almost completely kills the machine as the prosessor ain't that fast. When this happens kernel prints following message every 5 seconds: kernel: hdc: status timeout: status=0xd0 { Busy } kernel: ide: failed opcode was: unknown kernel: hdc: drive not ready for command
I am seeing somethning different and this appears to be every 10 seconds. Before kernel 2.6.20-1.2925.fc6 hard disks were /dev/hda and /dev/hdc and hald was spamming my logs with: kernel: hde: status error: status=0x58 { DriveReady SeekComplete DataRequest } kernel: ide: failed opcode was: unknown kernel: hde: drive not ready for command Doh! This is CD/DVD drive and usually there are no media in that. With 2.6.20-1.2925.fc6 hard disks were taken over by SATA (and I had to use 'acpi=off irqpoll' before I was able to boot at all) and I am getting the same with "hde" replaced by "hda". Stopping hald immediately terminates this madness.
(In reply to comment #12) > Stopping hald immediately terminates this madness. The problem is either buggy hardware and/or buggy drivers. See comment 8 for a more detailed explanation.
> The problem is either buggy hardware and/or buggy drivers. Possibly. Unfortunately hardware is not comming with stickers which say "Buggy!" and I do not have much choice here. Buggy hardware is all over the place and needs to be taken into account.
Thinking of this I would even initially go for the following conditional (kernel, after all, has an information what it deals with): "if this is something with removable media then stop logging useless errors; or log that once, or at most once per hour". In the current situation, among other detrimental side-effects, this makes logs big and noisy and this has security implications.
Is there any reasonable way to tell hal not to look at CD drive at all save off unplugging CD? In a very short time on an afflicted machine I accumulated courtesy of hal in /var/log/messages around 60 Megs of "ide: failed opcode was: unknown" garbage. This is unsustainble. I can handle CDs much better with autofs - thank you very much! Options mentioned in comment #10 unfortunately are not doing anything useful for me. Following examples, which pass for a hal documentation, I dropped into /usr/share/hal/fdi/policy/20thirdparty a file called 99-storage-no-cdrom.fdi and with the following content: <?xml version="1.0" encoding="UTF-8"?> <!-- This .fdi files takes out cd from hal --> <deviceinfo version="0.2"> <device> <match key="storage.hotpluggable" bool="false"> <match key="storage.drive_type" string="cdrom"> <merge key="storage.policy.should_mount" type="bool">false</merge> </match> </match> </device> </deviceinfo> This apppeared to help for a few minutes. After that a flood of garbage returned and 'tail -f /var/log/messages' shows that this bombardment is practically continuous. I really would like to inform hal "this device is off limits".
It looks like that although I can shut off a constant scream from a CD/DVD drive, at least with an extra policy which results in storage.media_check_enabled = false (bool) for that particular device, I have to do that with 2.6.19-1.2911.6.5.fc6 kernel. With 2.6.20-1.2925.fc6 I have to shut off hald or I am flooded. BTW - trying to check with strace what is happening I collected in a very short time during a hald startup a 127 'stat()' and the same number of 'open()' on this my extra policy file. Small wonder that this thing starts for ages.
(In reply to comment #17) > BTW - trying to check with strace what is happening I collected in > a very short time during a hald startup a 127 'stat()' and the same > number of 'open()' on this my extra policy file. Small wonder that > this thing starts for ages. BTW, this should be fixed in F7's HAL package where the braindead "parse every XML file for every event" was addressed.
(This is a mass-update to all current FC6 kernel bugs in NEW state) Hello, I'm reviewing this bug list as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug, however this version of Fedora is no longer maintained. Please attempt to reproduce this bug with a current version of Fedora (presently Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a few days if there is no further information lodged. Thanks for using Fedora!
I'm upgrading that machine to Fedora 8 soon (week or two). I will update this bug when I know whether the bug still exist or not.
Fedora 8 with the same hardware (VIA EPIA 15000) doesn't suffer from this bug anymore. So unless the original reporter wants to keep this bug open, I think we can close this one as fixed. Summary: FC6 with latest updates, no outside kernel modules: after few hours 100% CPU usage until you restart hald or kill hald-addon-storage F8, with latest updates, no outside kernel modules: No issues, machine been up continuously more than a week.
This affects RHEL5 in a Xen virtual machine as well. It polls the virtual CDROM every two seconds, taking about 5% CPU, amounting to 77 hours of wasted CPU over 40 days. Workaround seems to be to get the UDI of the cdrom drive with the hal-add the following to /etc/rc.local: for cdrom in `hal-find-by-capability --capability storage.cdrom`; do hal-set-property --udi $cdrom --key storage.media_check_enabled --bool false done