Red Hat Bugzilla – Bug 204294
problems with usb storage on VIA and recent kernels, IRQ screwup
Last modified: 2008-02-05 08:35:03 EST
Description of problem:
There are severe problems regarding USB on VIA-based motherboard of one of my
friends. We've spent many hours trying to make it better via kernel arguments
and one patch, without success. Worth noting is the fact that all was working
fine in FC2 (that's why he haven't upgraded till this point) and that some
kernel parameters make one half of the problem disappear.
Version-Release number of selected component (if applicable):
Any relatively recent kernel including kernel-2.6.17-1.2174_FC5. It was working
in FC2 (2.6.5). Not sure when exactly did it broke.
Always, tested on many kernels/distributions (several Fedora versions, one
Fedora derivative distro and some other distro, can't remember which).
The system is a Gigabyte motherboard with a VIA chipset. I'll attach lspci -v if
I don't forget. It has USB 2.0 controller of course. Now, my friend has two USB
- a USB 2.0 hard drive (I'll call it HDD), this one is more important,
- a USB 1.1 [or, according to the manufacturer, 2.0-compatible ;)] music player
with a card reader onboard (I'll call it MP3).
I'm going to describe few scenarios:
1. When Fedora comes up with the HDD and MP3 connected, both devices work (!)
but it's USB on purpose, you can't expect people to reboot the computer to make
USB device work :) When you disconnect and reconnect HDD, it won't work again
(and you can see kernel bugs in dmesg). Reconnecting MP3 works.
2. When Fedora comes up without the HDD and it's connected later, it doesn't
work even for the first time (again, kernel bugs in dmesg).
3. When you pass "acpi=off" to the kernel, HDD can be connected at any time and
it works (nevermind the fact that no normal person would come up the "acpi=off"
parameter), but the MP3 doesn't. Turns out "acpi=off" makes EHCI use other IRQ
than UHCI, making EHCI magically work, but UHCI spits another ton of errors,
including a kernel stack trace and suggestion to use "irqpoll".
4. With "acpi=off irqpoll" the behaviour is the same as without any parameters
(the HDD doesn't work).
Passing "noapic" makes the IRQ numbers look more like in FC2 (where everything
works), but doesn't make a difference.
And let me repeat, under FC2 both devices still work fine. So even if it's
hardware's fault, it's its flaw (which can be worked around and 2.6.5 knew,
how), not malfunction. That's why I ask you to fix it again in sofware.
I guess we don't see many such bugs because people use only one USB controller
at a time [or don't buy VIA ;)] and one at a time can be made to work.
I've found http://lkml.org/lkml/2006/7/27/402 and applied the patch, but despite
the messages of changing IRQ-s disappeared, the problems persisted. The messages
seem bogus, dev->irq (177) doesn't equal irq from pci_read_config_byte (10) and
after the presumed change we see that it still generates IRQ 177, not 1. So it
looks like the interrupt handler really thinks it should be 1 and 177 confuses
the kernel, but the patch doesn't really help [maybe that's why I don't see it
commited to 2.6.18 :)]
I'll attach a set of files with dmesg and lspci -v output collected by the guy
who has the problems.
Created attachment 135043 [details]
dmesg output of the first scenario, default Fedora
The first dmesg is without additional kernel parameters.
- HDD and MP3 are connected before booting up the system and they work (reading
files from both HDD and MP3, which you don't see in dmesg because it works).
- Then you can see them being disconnected.
- Then MP3 is connected. It has two flash "disks" in it and they work (ignore
"sdb: Current: sense key: Medium Error" errors from sdb, that's a real medium
error (heavily used flash memory), we needed some card to test the embedded
card reader and that's the only one available - and it works, the system was
reading files from both memories).
- Then HDD is connected. You can see it gets discovered many times
simultaneously with data corruption occuring. Then there's kernel stack trace
dump and all the horror.
If the HDD isn't connected when system starts, it breaks identically when
connected, only without the initial stage of working after boot (the stage
which is seen in this dmesg).
Created attachment 135044 [details]
lspci -v, first scenario
Created attachment 135046 [details]
dmesg output of the third scenario, acpi=off
This is dmesg from a boot with "acpi=off". Notice that this time EHCI is on
another IRQ than UHCI.
- Both devices are disconnected initially.
- Then MP3 is connected. Kernel yells about a poor IRQ that nobody cares of,
then disables the IRQ, then tries to use the device, but thinks that it got
disconnected and spits out errors.
- Then HDD is connected. This time it works, no corruption, no BUGS, no
Created attachment 135047 [details]
lspci -v, acpi=off
Created attachment 135048 [details]
dmesg output of the third scenario, acpi=off irqpoll
The "irqpoll" parameter added to "acpi=off" makes a difference indeed:
- When MP3 is connected, it works this time (again, ignore sdb medium errors).
- Instead HDD doesn't work, again being detected many times at the same time. I
assure you all of this is the effect of only one plug-in, no fiddling with
Reassigning to correct owner, kernel-maint.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed. See bug 207474 for further details.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.
This bug has been mass-closed along with all other bugs that
have been in NEEDINFO state for several months.
Due to the large volume of inactive bugs in bugzilla, this
is the only method we have of cleaning out stale bug reports
where the reporter has disappeared.
If you can reproduce this bug after installing all the
current updates, please reopen this bug.
If you are not the reporter, you can add a comment requesting
it be reopened, and someone will get to it asap.
Closing since there was an error in previous mass-close and they remained in