Bug 250779

Summary: usb 1-2: device descriptor read/64 errors
Product: [Fedora] Fedora Reporter: Don Harden <harden>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 7CC: chris.brown
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-16 02:32:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output with latest BIOS and USB flash drive attached
none
dnidecode output with latest BIUOS and USB flash drive attached
none
dmesg booting with scpi=off
none
dmesg after booting with noapic acpi=off
none
dmesg from booting with pci=noacpi nolapic none

Description Don Harden 2007-08-03 15:49:51 UTC
Description of problem:
I have the following hardware:
 Intel S3000AHLX  Motherboard
 Intel Xeon X3220 Kentsfield 2.4GHz Processor
 Adaptec 29160 2253200-R PCI SCSI adapter
 HITACHI Deskstar T7K250 SATA 3.0Gb disk drives
 
If I boot with kernel 2.6.21-1.3194.fc7 (x86_64) I get this error message on the
console every few seconds even If I have no USB devices.

kernel: usb 1-2: device descriptor read/64, error -71


When I boot with kernel 2.6.22.1-41.fc7 (x86_64) I get the 
"kernel: usb 1-2: device descriptor read/64, error -71"  error only during boot
soon after "USB hub found".  Adding USB devices (mouse, keyboard, etc.) does not
seem to cause more errors.


Version-Release number of selected component (if applicable):


How reproducible:  Every time I boot unless I disable USB support in the BIOS.


Steps to Reproduce:
1.  Boot the box
2.
3.
  
Actual results:
"usb 1-2: device descriptor read/64, error -71" errors

Expected results:
No errors related to USB

Additional info:  With 2.6.21-1.3194.fc7 and both USB keyboard and USB mouse
attached the box crashed after hundereds of the above errors.    With 
2.6.22.1-41.fc7 and no USB devices attached the box does not crash.

Comment 1 Pete Zaitcev 2007-08-20 22:37:28 UTC
The one-time -71 at boot should be addresed with the new mkinitrd, in Rawhide
at least. Maybe you can pull it over and install on top of FC-7 to verify.
Although... it seems like being delayed. The latest is mkinitrd-6.0.9-9,
which is not fixed yet.


Comment 2 Christopher Brown 2007-09-23 21:08:17 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest mkinitrd?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris

Comment 3 Don Harden 2007-09-24 15:47:40 UTC
Hi Chris,

Thanks for looking into this. In responce to your request I just 

1) updated the Fedora 7 box inquestion to the latest kernel 
     {0}lithium:/root > uname -r
      2.6.22.5-76.fc7

2) Rebooted and enabled the on-board USB hubs in the BIOS

3) Rebooted again.

Things have changed:

Booting with  no USB devices attached yeilds  no "kernel: usb 1-2: device
descriptor read/64, error -71" errors .  In fact I am not getting any USB
detection when I attach a USB device after booting.   

Well perhaps there is something wrong with my on-board USB hub or ports.  So I
try booting to a bootable falsh drive.  That works just fine.

Now leave the bootable falsh drive attached, but instead boot to F7 (on the hard
disk).    And I find in  /var/log/messages:

kernel: hub 5-0:1.0: connect-debounce failed, port 1 disabled
last message repeated 9 times
kernel: usb 1-1: new full speed USB device using uhci_hcd and address 2
kernel: usb 1-1: device descriptor read/64, error -71
kernel: usb 1-1: device descriptor read/64, error -71
kernel: usb 1-1: new full speed USB device using uhci_hcd and address 3
kernel: usb 1-1: device descriptor read/64, error -71
kernel: usb 1-1: device descriptor read/64, error -71

[SNIP]

kernel: Uhhuh. NMI received for unknown reason a9.
kernel: You have some hardware problem, likely on the PCI bus.
kernel: Dazed and confused, but trying to continue

But still no usable USB hub.  I know how the kernel feels; I get dazed and
confused sometimes too.

Thanks again for looking at this issue,
Don
 

Comment 4 Christopher Brown 2007-09-24 23:39:59 UTC
Your issue had similar symptoms to:

https://bugzilla.redhat.com/show_bug.cgi?id=247499

and you'll feel right at home here:

https://bugs.launchpad.net/linux/+bug/88530

As Pete Zaitcev indicates in the RH bug, it is most likely bad BIOS. Could you
attach a full dmesg output as text/plain as well as output from dmidecode (you
may have to install) as this will help a bit.

I would also recommend have a look to see if there is an updated BIOS for your
board first and foremost.

Comment 5 Pete Zaitcev 2007-09-25 06:51:39 UTC
No, -71 is not usually BIOS. Typically it's some kind of signal integrity
issue: CRC errors, lost tokens, etc. There's also a special case with
EHCI trying to take over on boot which triggers them.

The solid -110 on all accesses is IRQ not coming through, which is usually
BIOS these days (it used to be that we parsed ACPI tables wrong, but nowadays
it's pretty rare). But this bug is not like that, so please don't confuse
Don with that.

Comment 6 Christopher Brown 2007-09-25 09:07:26 UTC
What's the solution then Pete? If you know of an explanation table for these
error numbers that would be appreciated as well.

Comment 7 Pete Zaitcev 2007-09-25 16:19:35 UTC
The original issue looked like the commonplace boot-time -71, so I suggested
new mkinitrd. Peter Jones added a fix which addressed those errors.

Then, comment #3 says that there's more to it. Gettng a NMI means something
seriously cooked in the motherboard, way beyond just USB. So I have no
more suggestions.

Comment 8 Don Harden 2007-09-25 20:19:25 UTC
Created attachment 206041 [details]
dmesg output with latest BIOS and USB flash drive attached

Comment 9 Don Harden 2007-09-25 20:23:31 UTC
Created attachment 206081 [details]
dnidecode output with latest BIUOS and USB flash drive attached

Comment 10 Don Harden 2007-09-25 20:28:06 UTC
Hi,

Well Intel has released a BIOS update since I first submitted this bug back in
early August.  I've updated to the latest BIOS, but the USB behavior is exactly
the same as my post on 2007-09-25.  Submitted dmesg and dmidecode output as
requested.

Don

Comment 11 Christopher Brown 2007-09-26 20:00:37 UTC
Well, that dmesg is riddled with errors. Can you try booting with the following
parameter:

acpi=off

then attach that dmesg (please could you select text/plain as the attachment type).

Comment 12 Don Harden 2007-09-27 11:25:02 UTC
Created attachment 208261 [details]
dmesg  booting with scpi=off

Here is the dmesg output after booting with acpi=off.

Thanks again for looking into this.
Don

Comment 13 Christopher Brown 2007-09-27 19:08:17 UTC
Thanks Don. If you boot with noapic as well as acpi=off, does this still
generate the masses of SCSI aborts in dmesg?

Comment 14 Don Harden 2007-09-27 20:00:03 UTC
Created attachment 208961 [details]
dmesg  after booting with noapic acpi=off 

Hi Chris,
noapic yields no SCSI ABORTs
Don

Comment 15 Christopher Brown 2007-09-27 20:29:53 UTC
Dmesg looks much better. Could you now try:

pci=noacpi (instead of acpi=off)

and

nolapic (instead of noapic)

and attach another dmesg with type as text/plain.

Cheers
Chris

Comment 16 Don Harden 2007-09-27 21:52:13 UTC
Created attachment 209061 [details]
dmesg from booting with pci=noacpi nolapic


dmesg from booting with pci=noacpi nolapic.

To simplify things I removed the Adaptec 29160 Ultra160 SCSI adapter.  It is
only used for tape backups.

Don

Comment 17 Christopher Brown 2008-01-10 19:17:09 UTC
Don,

Any update on this. I'm guessing you're still having the issue but the:

Uhhuh. NMI received for unknown reason a9.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue

suggests that, as Pete said, its bad hardware. In which case I'm tempted to
close close this NOTABUG but will wait to see what you come back with...

Comment 18 Christopher Brown 2008-02-16 02:32:15 UTC
Closing as per previous comment. Please re-open if this is still an issue for
you on new hardware.