221691 – storage drivers need works

Bug 221691 - storage drivers need works

Summary: storage drivers need works

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	6
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	221690 (view as bug list)
Depends On:
Blocks:	427887
TreeView+	depends on / blocked

Reported:	2007-01-06 05:55 UTC by Parag Warudkar
Modified:	2010-04-16 08:31 UTC (History)
CC List:	6 users (show)
Fixed In Version:	f8
Clone Of:
Environment:
Last Closed:	2008-01-22 17:37:36 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace -tt output (1.79 KB, application/octet-stream) 2007-01-06 21:11 UTC, Parag Warudkar	no flags	Details
lshal output (114.95 KB, application/octet-stream) 2007-01-06 21:12 UTC, Parag Warudkar	no flags	Details
View All

Description Parag Warudkar 2007-01-06 05:55:07 UTC

Description of problem:
Every few nano seconds, hald-addon-storage wakes up, opens my CDROM and causes
in excess of 20% CPU. 

top - 00:44:24 up  8:36,  1 user,  load average: 0.41, 0.30, 0.27

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 2430 root      16   0  1924  660  580 S   20  0.0  42:55.74 hald-addon-stor

Strace says it is repeatedly opening /dev/hdc - CDROM. Why in god's name does it
have to poll a CDROM so fast and why would it need 20% of CPU for that?


Version-Release number of selected component (if applicable):
hal-0.5.8.1-6.fc6

How reproducible:
Always

Steps to Reproduce:
1. Just run a FC6 system with CDROM drive and watch top output
2.
3.
  
Actual results:
hald-addon-storage sucks CPU like crazy

Expected results:
Should not wake up so often and should not cause this much CPU utilization.


Additional info:
It sucks when you find out your laptop's fans are blowing to keep
hald-addon-storage running and polling the non existent CD in the CDROM drive
and causing increased power consumption.

Comment 1 Parag Warudkar 2007-01-06 05:56:30 UTC

Strace output

[root@localhost ~]# strace -p 2430
Process 2430 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
open("/dev/hdc", O_RDONLY|O_NONBLOCK|O_EXCL|O_LARGEFILE) = 4
ioctl(4, CDROM_DRIVE_STATUS, 0x7fffffff) = 1
close(4)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, {2, 0})               = 0
open("/dev/hdc", O_RDONLY|O_NONBLOCK|O_EXCL|O_LARGEFILE) = 4
ioctl(4, CDROM_DRIVE_STATUS, 0x7fffffff) = 1
close(4)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0},  <unfinished ...>
Process 2430 detached

Comment 2 Parag Warudkar 2007-01-06 06:01:07 UTC

*** Bug 221690 has been marked as a duplicate of this bug. ***

Comment 3 David Zeuthen 2007-01-06 20:49:07 UTC

Could you attach an strace using the -tt options? Also, please attach the output
of lshal - it smells like broken hardware / broken kernel driver to me. Thanks.

Comment 4 Parag Warudkar 2007-01-06 21:11:11 UTC

Created attachment 144983 [details]
strace -tt output

Comment 5 Parag Warudkar 2007-01-06 21:12:02 UTC

Created attachment 144984 [details]
lshal output

Comment 6 David Zeuthen 2007-01-08 16:56:36 UTC

Thanks for the attachments. If you kill the addon process does the problem go
away? Thanks.

(btw, please mark attachments having MIME type text/plain the next time; it
makes it easier to open in Firefox)

Comment 7 Parag Warudkar 2007-01-08 18:02:57 UTC

Well obviously the problem goes away when I kill hald-addon-storage - that's
what I  am doing currently to avoid CPU consumption.

In {Open}Suse HAL RPMS I saw a changelog entry stating *"disable polling on SATA
CDROM" - I don't see that in FC6 hal changelog. May be it's related?

*
http://rpmfind.net/linux/RPM/suse/updates/10.0/i386/rpm/ppc/hal-64bit-0.5.4-6.2.ppc.html

Comment 8 David Zeuthen 2007-01-08 18:28:07 UTC

A few things

 - the kernel drivers needs to be able to cope with user space opening a device
   file every two seconds. Period.

 - sometimes (most of the time, not always) the hardware is just broken and one
   cannot poll at all. If that's the case (but that requires proving the kernel
   driver is not buggy) we can disable polling from HAL; we used to do that
   for a few drives but don't anymore as a newer revision of the kernel
   driver magically fixed that

 - Your drive is /dev/hdc so probably not SATA

 - There is some work going on to make polling unneeded, that's tracked in bug
204969

As such I'm reassigning this bug to the kernel. Feel free to reassign back if
it's not a kernel problem. I'm also adding myself as Cc if there are any
questions / concerns. Thanks!

Comment 9 Parag Warudkar 2007-01-13 00:39:51 UTC

I am sure my hardware isn't broken - I have used other distros and never had a
problem. That /dev/hdc is due to Fedora drivers being PATA. 

To verify if the PATA driver causes this problem or not, I tried booting with a
kernel with all SATA drivers and unfortunately haldaemon, Avahi daemon and bunch
of other things fail to start on this kernel - separate issue. 

I am not sure I agree about the 2 second polling period. I guess it should be
set somewhere in config instead of hard coding it. Will submit a patch if I
manage to fix it.

Comment 10 Parag Warudkar 2007-01-21 19:56:49 UTC

With libata.atapi_enabled=1 and combined_mode=libata, it no longer hogs CPU.  So
seems to me like the PATA driver has some kind of bug.

Comment 11 Mauri Korkeala 2007-01-24 20:07:25 UTC

I can confirm this bug. I see the same symptons on my Via EPIA 15000 too running
updated FC6 (IDE cd-rom). It almost completely kills the machine as the
prosessor ain't that fast.
When this happens kernel prints following message every 5 seconds:
kernel: hdc: status timeout: status=0xd0 { Busy }
kernel: ide: failed opcode was: unknown
kernel: hdc: drive not ready for command

Comment 12 Michal Jaegermann 2007-03-16 21:09:56 UTC

I am seeing somethning different and this appears to be every 10 seconds.
Before kernel 2.6.20-1.2925.fc6 hard disks were /dev/hda and /dev/hdc
and hald was spamming my logs with:

kernel: hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
kernel: ide: failed opcode was: unknown
kernel: hde: drive not ready for command

Doh! This is CD/DVD drive and usually there are no media in that.

With 2.6.20-1.2925.fc6 hard disks were taken over by SATA (and
I had to use 'acpi=off irqpoll' before I was able to boot at all)
and I am getting the same with "hde" replaced by "hda".

Stopping hald immediately terminates this madness.

Comment 13 David Zeuthen 2007-03-16 22:41:37 UTC

(In reply to comment #12)
> Stopping hald immediately terminates this madness.

The problem is either buggy hardware and/or buggy drivers. See comment 8 for a
more detailed explanation.

Comment 14 Michal Jaegermann 2007-03-16 23:06:12 UTC

> The problem is either buggy hardware and/or buggy drivers.

Possibly.  Unfortunately hardware is not comming with stickers which
say "Buggy!" and I do not have much choice here.  Buggy hardware is
all over the place and needs to be taken into account.

Comment 15 Michal Jaegermann 2007-03-16 23:25:42 UTC

Thinking of this I would even initially go for the following
conditional (kernel, after all, has an information what it deals with):
"if this is something with removable media then stop logging useless
errors; or log that once, or at most once per hour".  In the current
situation, among other detrimental side-effects, this makes logs
big and noisy and this has security implications.

Comment 16 Michal Jaegermann 2007-03-17 21:07:16 UTC

Is there any reasonable way to tell hal not to look at CD drive at all
save off unplugging CD?  In a very short time on an afflicted machine
I accumulated courtesy of hal in /var/log/messages around 60 Megs of
"ide: failed opcode was: unknown" garbage.  This is unsustainble.
I can handle CDs much better with autofs - thank you very much!
Options mentioned in comment #10 unfortunately are not doing anything
useful for me.

Following examples, which pass for a hal documentation, I dropped into
/usr/share/hal/fdi/policy/20thirdparty a file called
99-storage-no-cdrom.fdi and with the following content:

<?xml version="1.0" encoding="UTF-8"?>

<!-- This .fdi files takes out cd from hal -->
<deviceinfo version="0.2">
  <device>
    <match key="storage.hotpluggable" bool="false">
        <match key="storage.drive_type" string="cdrom">
          <merge key="storage.policy.should_mount" type="bool">false</merge>
        </match>
    </match>
  </device>
</deviceinfo>

This apppeared to help for a few minutes.  After that a flood of
garbage returned and 'tail -f /var/log/messages' shows that this
bombardment is practically continuous.  I really would like to
inform hal "this device is off limits".

Comment 17 Michal Jaegermann 2007-03-18 12:58:16 UTC

It looks like that although I can shut off a constant scream from
a CD/DVD drive, at least with an extra policy which results in

  storage.media_check_enabled = false  (bool)

for that particular device, I have to do that with 2.6.19-1.2911.6.5.fc6
kernel. With 2.6.20-1.2925.fc6 I have to shut off hald or I am flooded.

BTW - trying to check with strace what is happening I collected in
a very short time during a hald startup a 127 'stat()' and the same
number of 'open()' on this my extra policy file.  Small wonder that
this thing starts for ages.

Comment 18 David Zeuthen 2007-03-19 16:09:52 UTC

(In reply to comment #17)
> BTW - trying to check with strace what is happening I collected in
> a very short time during a hald startup a 127 'stat()' and the same
> number of 'open()' on this my extra policy file.  Small wonder that
> this thing starts for ages.

BTW, this should be fixed in F7's HAL package where the braindead "parse every
XML file for every event" was addressed.

Comment 19 Jon Stanley 2008-01-08 01:50:24 UTC

(This is a mass-update to all current FC6 kernel bugs in NEW state)

Hello,

I'm reviewing this bug list as part of the kernel bug triage project, an attempt
to isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!

Comment 20 Mauri Korkeala 2008-01-08 11:51:00 UTC

I'm upgrading that machine to Fedora 8 soon (week or two). I will update this
bug when I know whether the bug still exist or not.

Comment 21 Mauri Korkeala 2008-01-22 13:40:38 UTC

Fedora 8 with the same hardware (VIA EPIA 15000) doesn't suffer from this bug
anymore. So unless the original reporter wants to keep this bug open, I think we
can close this one as fixed. 
Summary: 
FC6 with latest updates, no outside kernel modules: after few hours 100% CPU
usage until you restart hald or kill hald-addon-storage

F8, with latest updates, no outside kernel modules: No issues, machine been up
continuously more than a week.

Comment 22 Chris Wilson 2010-04-16 08:31:31 UTC

This affects RHEL5 in a Xen virtual machine as well. It polls the virtual CDROM every two seconds, taking about 5% CPU, amounting to 77 hours of wasted CPU over 40 days.

Workaround seems to be to get the UDI of the cdrom drive with the hal-add the following to /etc/rc.local:

for cdrom in `hal-find-by-capability --capability storage.cdrom`; do
  hal-set-property --udi $cdrom --key storage.media_check_enabled --bool false
done

Note You need to log in before you can comment on or make changes to this bug.