Description of problem: I have the following hardware: Intel S3000AHLX Motherboard Intel Xeon X3220 Kentsfield 2.4GHz Processor Adaptec 29160 2253200-R PCI SCSI adapter HITACHI Deskstar T7K250 SATA 3.0Gb disk drives If I boot with kernel 2.6.21-1.3194.fc7 (x86_64) I get this error message on the console every few seconds even If I have no USB devices. kernel: usb 1-2: device descriptor read/64, error -71 When I boot with kernel 2.6.22.1-41.fc7 (x86_64) I get the "kernel: usb 1-2: device descriptor read/64, error -71" error only during boot soon after "USB hub found". Adding USB devices (mouse, keyboard, etc.) does not seem to cause more errors. Version-Release number of selected component (if applicable): How reproducible: Every time I boot unless I disable USB support in the BIOS. Steps to Reproduce: 1. Boot the box 2. 3. Actual results: "usb 1-2: device descriptor read/64, error -71" errors Expected results: No errors related to USB Additional info: With 2.6.21-1.3194.fc7 and both USB keyboard and USB mouse attached the box crashed after hundereds of the above errors. With 2.6.22.1-41.fc7 and no USB devices attached the box does not crash.
The one-time -71 at boot should be addresed with the new mkinitrd, in Rawhide at least. Maybe you can pull it over and install on top of FC-7 to verify. Although... it seems like being delayed. The latest is mkinitrd-6.0.9-9, which is not fixed yet.
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest mkinitrd? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Cheers Chris
Hi Chris, Thanks for looking into this. In responce to your request I just 1) updated the Fedora 7 box inquestion to the latest kernel {0}lithium:/root > uname -r 2.6.22.5-76.fc7 2) Rebooted and enabled the on-board USB hubs in the BIOS 3) Rebooted again. Things have changed: Booting with no USB devices attached yeilds no "kernel: usb 1-2: device descriptor read/64, error -71" errors . In fact I am not getting any USB detection when I attach a USB device after booting. Well perhaps there is something wrong with my on-board USB hub or ports. So I try booting to a bootable falsh drive. That works just fine. Now leave the bootable falsh drive attached, but instead boot to F7 (on the hard disk). And I find in /var/log/messages: kernel: hub 5-0:1.0: connect-debounce failed, port 1 disabled last message repeated 9 times kernel: usb 1-1: new full speed USB device using uhci_hcd and address 2 kernel: usb 1-1: device descriptor read/64, error -71 kernel: usb 1-1: device descriptor read/64, error -71 kernel: usb 1-1: new full speed USB device using uhci_hcd and address 3 kernel: usb 1-1: device descriptor read/64, error -71 kernel: usb 1-1: device descriptor read/64, error -71 [SNIP] kernel: Uhhuh. NMI received for unknown reason a9. kernel: You have some hardware problem, likely on the PCI bus. kernel: Dazed and confused, but trying to continue But still no usable USB hub. I know how the kernel feels; I get dazed and confused sometimes too. Thanks again for looking at this issue, Don
Your issue had similar symptoms to: https://bugzilla.redhat.com/show_bug.cgi?id=247499 and you'll feel right at home here: https://bugs.launchpad.net/linux/+bug/88530 As Pete Zaitcev indicates in the RH bug, it is most likely bad BIOS. Could you attach a full dmesg output as text/plain as well as output from dmidecode (you may have to install) as this will help a bit. I would also recommend have a look to see if there is an updated BIOS for your board first and foremost.
No, -71 is not usually BIOS. Typically it's some kind of signal integrity issue: CRC errors, lost tokens, etc. There's also a special case with EHCI trying to take over on boot which triggers them. The solid -110 on all accesses is IRQ not coming through, which is usually BIOS these days (it used to be that we parsed ACPI tables wrong, but nowadays it's pretty rare). But this bug is not like that, so please don't confuse Don with that.
What's the solution then Pete? If you know of an explanation table for these error numbers that would be appreciated as well.
The original issue looked like the commonplace boot-time -71, so I suggested new mkinitrd. Peter Jones added a fix which addressed those errors. Then, comment #3 says that there's more to it. Gettng a NMI means something seriously cooked in the motherboard, way beyond just USB. So I have no more suggestions.
Created attachment 206041 [details] dmesg output with latest BIOS and USB flash drive attached
Created attachment 206081 [details] dnidecode output with latest BIUOS and USB flash drive attached
Hi, Well Intel has released a BIOS update since I first submitted this bug back in early August. I've updated to the latest BIOS, but the USB behavior is exactly the same as my post on 2007-09-25. Submitted dmesg and dmidecode output as requested. Don
Well, that dmesg is riddled with errors. Can you try booting with the following parameter: acpi=off then attach that dmesg (please could you select text/plain as the attachment type).
Created attachment 208261 [details] dmesg booting with scpi=off Here is the dmesg output after booting with acpi=off. Thanks again for looking into this. Don
Thanks Don. If you boot with noapic as well as acpi=off, does this still generate the masses of SCSI aborts in dmesg?
Created attachment 208961 [details] dmesg after booting with noapic acpi=off Hi Chris, noapic yields no SCSI ABORTs Don
Dmesg looks much better. Could you now try: pci=noacpi (instead of acpi=off) and nolapic (instead of noapic) and attach another dmesg with type as text/plain. Cheers Chris
Created attachment 209061 [details] dmesg from booting with pci=noacpi nolapic dmesg from booting with pci=noacpi nolapic. To simplify things I removed the Adaptec 29160 Ultra160 SCSI adapter. It is only used for tape backups. Don
Don, Any update on this. I'm guessing you're still having the issue but the: Uhhuh. NMI received for unknown reason a9. You have some hardware problem, likely on the PCI bus. Dazed and confused, but trying to continue suggests that, as Pete said, its bad hardware. In which case I'm tempted to close close this NOTABUG but will wait to see what you come back with...
Closing as per previous comment. Please re-open if this is still an issue for you on new hardware.