Bug 455843 - Kernel panic at hcd_pci_release+16
Summary: Kernel panic at hcd_pci_release+16
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6.z
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Pete Zaitcev
QA Contact: Martin Jenner
URL:
Whiteboard:
: 456065 (view as bug list)
Depends On: 456630
Blocks: 461304
TreeView+ depends on / blocked
 
Reported: 2008-07-18 09:14 UTC by Qian Cai
Modified: 2018-10-20 02:20 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-18 19:26:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reproducer (176 bytes, application/octet-stream)
2008-07-24 09:00 UTC, Vitaly Mayatskikh
no flags Details
proposed patch (5.60 KB, patch)
2008-07-24 11:43 UTC, Vitaly Mayatskikh
no flags Details | Diff
new proposed patch (7.29 KB, patch)
2008-10-27 02:15 UTC, Pete Zaitcev
no flags Details | Diff
Full Log of Oops on SGI Altix (374.84 KB, text/plain)
2008-11-12 10:53 UTC, Qian Cai
no flags Details
proposed patch w/ 471560 (9.20 KB, application/octet-stream)
2008-12-06 00:38 UTC, Pete Zaitcev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1024 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 14:57:26 UTC

Description Qian Cai 2008-07-18 09:14:22 UTC
Description of problem:
When running the reproducer of bz450865 (load/unload ohci-hcd module in a loop),
there was a Kernel Oops,

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3663173

Unable to handle kernel paging request at ffffffffa0039dd0 RIP: 
<ffffffff80290a84>{hcd_pci_release+16}
PML4 103027 PGD 105027 PMD 3fc68c067 PTE 0
Oops: 0000 [1] SMP 
CPU 6 
Modules linked in: netconsole netdump md5 ipv6 parport_pc lp parport autofs4
sunrpc ds yenta_socket pcmcia_core loop button battery ac k8_edac edac_mc e1000
sr_mod dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi
mptscsi mptbase usb_storage uhci_hcd ehci_hcd sd_mod scsi_mod
Pid: 3723, comm: hald Not tainted 2.6.9-67.0.22.ELsmp
RIP: 0010:[<ffffffff80290a84>] <ffffffff80290a84>{hcd_pci_release+16}
RSP: 0018:00000102fb217e10  EFLAGS: 00010206
RAX: ffffffffa0039d80 RBX: 00000100dfe6ed00 RCX: 0000000000000030
RDX: 00000100dfe6ed00 RSI: ffffffff801ec6e2 RDI: 00000100dfe6ec78
RBP: ffffffff8040c040 R08: 00000105fc7ba878 R09: ffffffff801ec6e2
R10: ffffffff801ec6e2 R11: ffffffff80290a74 R12: ffffffff8040bf80
R13: ffffffff80416208 R14: 00000103001f7140 R15: 0000000000000000
FS:  0000002a963b1d20(0000) GS:ffffffff804f3980(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa0039dd0 CR3: 00000002fc7b2000 CR4: 00000000000006e0
Process hald (pid: 3723, threadinfo 00000102fb216000, task 00000103fc6477f0)
Stack: ffffffff801ec6b5 ffffffff801ec6e2 00000101fc7d2c00 ffffffff8040bd00 
       ffffffff8040bc40 00000102fc65f4f8 ffffffff802888f6 00000101fc7d2d30 
       ffffffff801ec6b5 ffffffff801ec6e2 
Call Trace:<ffffffff801ec6b5>{kobject_cleanup+84}
<ffffffff801ec6e2>{kobject_release+0} 
       <ffffffff802888f6>{usb_release_dev+60}
<ffffffff801ec6b5>{kobject_cleanup+84} 
       <ffffffff801ec6e2>{kobject_release+0}
<ffffffffa002707c>{:sd_mod:scsi_disk_put+81} 
       <ffffffffa002770d>{:sd_mod:sd_release+112}
<ffffffff801824dd>{blkdev_put+161} 
       <ffffffff8017be4b>{__fput+99} <ffffffff8017aa31>{filp_close+103} 
       <ffffffff8017aaba>{sys_close+130} <ffffffff80110276>{system_call+126} 
       

Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00 
RIP <ffffffff80290a84>{hcd_pci_release+16} RSP <00000102fb217e10>
CR2: ffffffffa0039dd0

From the log, there were lots of sda failures. Looks like it was a virtual
floppy,

Jul 18 01:51:08 sun-x4600-01 kernel: usb 2-5: new full speed USB device using
address 4
Jul 18 01:51:08 sun-x4600-01 kernel: scsi1 : SCSI emulation for USB Mass Storage
devices
Jul 18 01:51:08 sun-x4600-01 kernel:   Vendor: AMI       Model: Virtual Floppy  
Rev: 1.00
Jul 18 01:51:08 sun-x4600-01 kernel:   Type:   Direct-Access                    
ANSI SCSI revision: 02
Jul 18 01:51:08 sun-x4600-01 kernel: Attached scsi removable disk sda at scsi1,
channel 0, id 0, lun 0

The machine in question is sun-x4600-01.rhts.bos.redhat.com. I had setup a
netdump before the Oops, but don't know why it failed to capture it.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-67.0.22.EL

How reproducible:
not always

Comment 1 Vitaly Mayatskikh 2008-07-21 07:48:08 UTC
How to reproduce: 

Insert any h/w into ohci usb port, run two scripts in parallel:

$ while true; do rmmod ohci; modprobe ohci; done

$ while true; do lsusb; done > /dev/null

It's better to run 2-3 lsusb loops simultaneously. Seems to me, this is a race
condition w.r.t. procfs

Comment 2 Vitaly Mayatskikh 2008-07-21 10:10:42 UTC
Hmm, reproducer from #c1 just hangs the kernel (verified on x86_64 and ppc64).
So, this is another bug in ohci.

Comment 3 Qian Cai 2008-07-22 00:41:39 UTC
The same panic happened on another machine, ibm-morrison2.rhts.bos.redhat.com
(x86_64). Vmcore can be found at,
porkchop.devel.redhat.com:/mnt/redhat/qa/qa/qcai/vmcores/vmcore-455843

Hardware information about this machine can be found at,
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3683993

Unable to handle kernel paging request at ffffffffa024f910 RIP: 
<ffffffff802d36f8>{hcd_pci_release+16}
PML4 103027 PGD 105027 PMD 106cad067 PTE 0
Oops: 0000 [1] 
CPU 0 
Modules linked in: nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp
parport autofs4 sunrpc ds yenta_socket pcmcia_core loop button battery ac
hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd raid1 raid0
dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 13506, comm: cat Not tainted 2.6.9-67.0.22.EL
RIP: 0010:[<ffffffff802d36f8>] <ffffffff802d36f8>{hcd_pci_release+16}
RSP: 0018:00000100ed8abe90  EFLAGS: 00010202
RAX: ffffffffa024f8c0 RBX: 000001010f9d9d50 RCX: 0000000000000030
RDX: 000001010f9d9d50 RSI: ffffffff8021c010 RDI: 000001010f9d9cc8
RBP: ffffffff804642e0 R08: 000001010e39c840 R09: 00000100ebdc9180
R10: ffffffff8021c010 R11: ffffffff802d36e8 R12: ffffffff80464200
R13: ffffffff8046f268 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a95561b00(0000) GS:ffffffff80555000(0000) knlGS:00000000f7eb06c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa024f910 CR3: 0000000000101000 CR4: 00000000000006e0
Process cat (pid: 13506, threadinfo 00000100ed8aa000, task 00000100eb370130)
Stack: ffffffff8021bfe3 ffffffff8021c010 0000010103f07c00 ffffffff80463f40 
       ffffffff80463e60 0000010103f2eef8 ffffffff802c94ba 0000010103f07d58 
       ffffffff8021bfe3 ffffffff8021c010 
Call Trace:<ffffffff8021bfe3>{kobject_cleanup+84}
<ffffffff8021c010>{kobject_release+0} 
       <ffffffff802c94ba>{usb_release_dev+60}
<ffffffff8021bfe3>{kobject_cleanup+84} 
       <ffffffff8021c010>{kobject_release+0} <ffffffff801df3c4>{sysfs_release+54} 
       <ffffffff801906f4>{__fput+99} <ffffffff8018ed24>{filp_close+103} 
       <ffffffff8018ee6d>{sys_close+322} <ffffffff80110a9e>{system_call+126} 
       

Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00 
RIP <ffffffff802d36f8>{hcd_pci_release+16} RSP <00000100ed8abe90>
CR2: ffffffffa024f910

Modules linked in: nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp
parport autofs4 sunrpc ds yenta_socket pcmcia_core loop button battery ac
hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd raid1 raid0
dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 13506, comm: cat Not tainted 2.6.9-67.0.22.EL
RIP: 0010:[<ffffffff802d36f8>] <ffffffff802d36f8>{hcd_pci_release+16}
RSP: 0018:00000100ed8abe90  EFLAGS: 00010202
RAX: ffffffffa024f8c0 RBX: 000001010f9d9d50 RCX: 0000000000000030
RDX: 000001010f9d9d50 RSI: ffffffff8021c010 RDI: 000001010f9d9cc8
RBP: ffffffff804642e0 R08: 000001010e39c840 R09: 00000100ebdc9180
R10: ffffffff8021c010 R11: ffffffff802d36e8 R12: ffffffff80464200
R13: ffffffff8046f268 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a95561b00(0000) GS:ffffffff80555000(0000) knlGS:00000000f7eb06c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa024f910 CR3: 0000000000101000 CR4: 00000000000006e0

Call Trace:<ffffffff8021bfe3>{kobject_cleanup+84}
<ffffffff8021c010>{kobject_release+0} 
       <ffffffff802c94ba>{usb_release_dev+60}
<ffffffff8021bfe3>{kobject_cleanup+84} 
       <ffffffff8021c010>{kobject_release+0} <ffffffff801df3c4>{sysfs_release+54} 
       <ffffffff801906f4>{__fput+99} <ffffffff8018ed24>{filp_close+103} 
       <ffffffff8018ee6d>{sys_close+322} <ffffffff80110a9e>{system_call+126} 


Comment 4 Vitaly Mayatskikh 2008-07-24 06:46:18 UTC
*** Bug 456065 has been marked as a duplicate of this bug. ***

Comment 5 Vitaly Mayatskikh 2008-07-24 09:00:09 UTC
Created attachment 312538 [details]
reproducer

This is a common bug for all usb host controller drivers (ohci, ehci, uhci), it
cause kernel to oops or to hang.

Comment 6 Vitaly Mayatskikh 2008-07-24 11:43:00 UTC
Created attachment 312543 [details]
proposed patch

Comment 11 RHEL Program Management 2008-09-03 13:15:11 UTC
Updating PM score.

Comment 12 Pete Zaitcev 2008-10-27 02:15:27 UTC
Created attachment 321570 [details]
new proposed patch

This patch has two parts:
 1. Allow kfree() if hdc_free is NULL, and relocate usb_hcd so it's legal
 2. Add the "dead" HCD stub so we don'tuse hc_driver ifreed with the module

Comment 18 Qian Cai 2008-11-12 10:51:39 UTC
While testing on a RHEL 4.7 Zstream Kernel, I have seen the following Oops on SGI Altix machine. Do you think it is the same issue as in here? 

11/11/08 14:36:59  JobID:35787 Test:/kernel/errata/4.6.z/450865 Response:1
11/11/08 14:36:59  testID:1061889 start:
ACPI: PCI interrupt 0002:01:02.0[A]: no GSI
ACPI: PCI interrupt 0002:01:02.1[B]: no GSI
ACPI: PCI interrupt 0012:01:02.0[A]: no GSI
...
ohci_hcd 0012:01:02.0: init err
ohci_hcd 0012:01:02.0: can't start
ohci_hcd 0012:01:02.0: init error -16
ohci_hcd: probe of 0012:01:02.0 failed with error -16
...
ACPI: PCI interrupt 0012:01:02.1[B]: no GSI
ACPI: PCI interrupt 0002:01:02.0[A]: no GSI
ACPI: PCI interrupt 0002:01:02.1[B]: no GSI
...
Unable to handle kernel paging request at virtual address a00000020021a080
cat[6124]: Oops 8813272891392 [1]
Modules linked in: nfsd exportfs nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core vfat fat loop button ehci_hcd tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod^M

Pid: 6124, CPU 2, comm:                  cat
psr : 0000101008126010 ifs : 8000000000000205 ip  : [<a000000100424c50>]    Not tainted
ip is at hcd_pci_release+0x50/0xc0
unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000069559a99
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001004187c0 b6  : a000000100424c00 b7  : a000000100012970
f6  : 1003e0000000000000000 f7  : 1003e0000000000004000
f8  : 1003e0000000000000000 f9  : 1000d8000000000000000
f10 : 1003e0000000000000001 f11 : 0fffffefdfffff0102000
r1  : a0000001009e0fd0 r2  : a00000010079b108 r3  : a00000020021a080
r8  : a00000020021a030 r9  : a000000100366080 r10 : 0000000000000001
r11 : a0000001002551a0 r12 : e00000301144fe20 r13 : e000003011448000
r14 : e000003015e23c78 r15 : e000003015e23e58 r16 : e000003015e23d38
r17 : 000000000000002e r18 : e00000b0f67c8190 r19 : a0007fff65270000
r20 : 0000000006009d98 r21 : 0000000000c013b3 r22 : e000003011448dd4
r23 : a0000001007f4738 r24 : a0000001007f4738 r25 : 0000000000000000
r26 : 0000000000000001 r27 : 0000001008126010 r28 : 4000000000002300
r29 : 00001213081a6010 r30 : 0000000000004000 r31 : 0000000000004000

Call Trace:

 [<a000000100016e40>] show_stack+0x80/0xa0
                                sp=e00000301144f9b0 bsp=e000003011449378
 [<a000000100017750>] show_regs+0x890/0x8c0
                                sp=e00000301144fb80 bsp=e000003011449330
 [<a00000010003e9b0>] die+0x150/0x240
                                sp=e00000301144fba0 bsp=e0000030114492f0
 [<a000000100064920>] ia64_do_page_fault+0x8e0/0xbe0
                                sp=e00000301144fba0 bsp=e000003011449288
 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260
                                sp=e00000301144fc50 bsp=e000003011449288
 [<a000000100424c50>] hcd_pci_release+0x50/0xc0
                                sp=e00000301144fe20 bsp=e000003011449260
 [<a0000001004187c0>] usb_host_release+0x60/0x80
                                sp=e00000301144fe20 bsp=e000003011449240
 [<a000000100366100>] class_dev_release+0x80/0x120
                                sp=e00000301144fe20 bsp=e000003011449220
 [<a000000100255130>] kobject_cleanup+0x170/0x1e0
                                sp=e00000301144fe20 bsp=e0000030114491e0
 [<a0000001002551c0>] kobject_release+0x20/0x40^M
                                sp=e00000301144fe20 bsp=e0000030114491c0
 [<a000000100256350>] kref_put+0xf0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011449198
 [<a000000100254f90>] kobject_put+0x30/0x60
                                sp=e00000301144fe20 bsp=e000003011449178
 [<a000000100366440>] class_device_put+0x20/0x40
                                sp=e00000301144fe20 bsp=e000003011449158
 [<a000000100418730>] usb_bus_put+0x30/0x60^@^M
                                sp=e00000301144fe20 bsp=e000003011449138
 [<a00000010040ee30>] usb_release_dev+0x190/0x220
                                sp=e00000301144fe20 bsp=e000003011449118
 [<a000000100360370>] device_release+0x70/0x120
                                sp=e00000301144fe20 bsp=e0000030114490f8
 [<a000000100255130>] kobject_cleanup+0x170/0x1e0
                                sp=e00000301144fe20 bsp=e0000030114490c0
 [<a0000001002551c0>] kobject_release+0x20/0x40
                                sp=e00000301144fe20 bsp=e0000030114490a0
 [<a000000100256350>] kref_put+0xf0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011449078
 [<a000000100254f90>] kobject_put+0x30/0x60
                                sp=e00000301144fe20 bsp=e000003011449058
 [<a000000100255170>] kobject_cleanup+0x1b0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011449020
 [<a0000001002551c0>] kobject_release+0x20/0x40
                                sp=e00000301144fe20 bsp=e000003011449000^M
 [<a000000100256350>] kref_put+0xf0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011448fd0
 [<a000000100254f90>] kobject_put+0x30/0x60
                                sp=e00000301144fe20 bsp=e000003011448fb0
 [<a0000001001c47e0>] sysfs_release+0xa0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011448f80
 [<a00000010012b780>] __fput+0x380/0x3e0
                                sp=e00000301144fe20 bsp=e000003011448f30
 [<a00000010012b820>] fput+0x40/0x60
                                sp=e00000301144fe30 bsp=e000003011448f10
 [<a0000001001280e0>] filp_close+0xc0/0x1a0
                                sp=e00000301144fe30 bsp=e000003011448ee0
 [<a000000100128310>] sys_close+0x150/0x1c0
                                sp=e00000301144fe30 bsp=e000003011448e68
 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00000301144fe30 bsp=e000003011448e68
 [<a000000000010640>] 0xa000000000010640
                                sp=e000003011450000 bsp=e000003011448e68

Comment 19 Qian Cai 2008-11-12 10:53:10 UTC
Created attachment 323322 [details]
Full Log of Oops on SGI Altix

Comment 20 Vitaly Mayatskikh 2008-11-12 11:30:15 UTC
Trace path is the same like we have in original report. I think this is the same issue.

Comment 22 Pete Zaitcev 2008-11-20 04:59:22 UTC
Test kernel 2.6.9-78.18.EL.bz455843.1 is available here (with ia64):
  http://people.redhat.com/zaitcev/ftp/455843/

Feel free to let me know if more packages are needed, e.g. kernel-devel
for any specific arch.

Comment 23 Qian Cai 2008-11-21 10:30:14 UTC
Tested on altix4.rhts.bos.redhat.com with the new Linux kernel by the reproducer in comment #5 on, and it did not panic any more. Only the following messages output on the serial console.

bus 3: replacing with dummies
bus 4: replacing with dummies
bus 5: replacing with dummies
bus 6: replacing with dummies
bus 1: replacing with dummies
bus 2: replacing with dummies

The reproducer almost immediately caused the panic with the old Linux kernel. I have also tried the following test with the new Linux kernel for around a hour without seeing any issue.

while :; do rmmod ohci-hcd; modprobe ohci-hcd; done

Comment 24 Pete Zaitcev 2008-12-06 00:27:59 UTC
New test kernel 2.6.9-78.18.EL.bz455843.4 is available at the same location.
Cai, Ulrich, and anyone interested in this but, please test.

The .4 incorporates fixes for bug 471560 and a fix for a failure case
(it has actually happened at the box that Cai provided for me).
Otherwise it's the same as .1.

Comment 25 Pete Zaitcev 2008-12-06 00:38:31 UTC
Created attachment 325952 [details]
proposed patch w/ 471560

This is built into .bz455843.4.

Comment 28 Linda Wang 2008-12-17 20:17:28 UTC
patch posted on Wed, 10 Dec 2008 18:42:46 -0700. move to POST, and dev ack.

Comment 29 RHEL Program Management 2008-12-17 20:20:19 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 30 Qian Cai 2008-12-31 10:38:27 UTC
I have tested the new kernel 2.6.9-78.18.EL.bz455843.4 on several machines, and have not seen any problem,

https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=40744

Thanks Pete!

Comment 31 Qian Cai 2009-01-04 01:34:30 UTC
Also, running the test for 3 hours on various machines over the weekend did not show any issue.

Comment 32 Pete Zaitcev 2009-01-04 03:20:05 UTC
That's great to know. Unfortunately, Prarit was sceptical, so I'm having
trouble drumming up reviews for it. Thread:
 http://post-office.corp.redhat.com/archives/rhkernel-list/2008-December/msg00467.html

Comment 34 Vivek Goyal 2009-01-15 14:03:54 UTC
Committed in 78.29.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 37 Han Pingtian 2009-04-20 05:00:41 UTC
I have reproduced this bug on bm-morrison2.rhts.bos.redhat.com with RHEL4-U7,
kernel version 2.6.9-78.ELsmp:

cannot read deviUnable to handle kernel paging requestce descriptor No at ffffffffa01c7dd0 RIP:
 such device (19<ffffffff80299004>{hcd_pci_release+16})

PML4 103027 PGD 105027 PMD edd0b067 PTE 770cb163
Oops: 0000 [1] SMP
CPU 2
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core cpufreq_powersave loop button battery ac hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 29401, comm: lsusb Not tainted 2.6.9-78.ELsmp
RIP: 0010:[<ffffffff80299004>] <ffffffff80299004>{hcd_pci_release+16}
RSP: 0018:000001007a17be70  EFLAGS: 00010206
RAX: ffffffffa01c7d80 RBX: 000001010b347d00 RCX: 0000000000000030
RDX: 000001010b347d00 RSI: ffffffff801edc9a RDI: 000001010b347c78
RBP: ffffffff80418740 R08: 0000000000000001 R09: ffffffff801edc9a
R10: ffffffff801edc9a R11: ffffffff80298ff4 R12: ffffffff80418680
R13: ffffffff80427488 R14: 000001010aee6178 R15: 00000000ffffffff
FS:  0000002a958a5b00(0000) GS:ffffffff8050d380(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa01c7dd0 CR3: 00000000edfa2000 CR4: 00000000000006e0
Process lsusb (pid: 29401, threadinfo 000001007a17a000, task 000001007bdb77f0)
Stack: ffffffff801edc6d ffffffff801edc9a 0000010037dd6c00 ffffffff80418400
       ffffffff80418340 00000100edd055b8 ffffffff80290e42 0000010037dd6d30
       ffffffff801edc6d 0000007fbffff501
Call Trace:<ffffffff801edc6d>{kobject_cleanup+84} <ffffffff801edc9a>{kobject_release+0}
       <ffffffff80290e42>{usb_release_dev+60} <ffffffff801edc6d>{kobject_cleanup+84}
       <ffffffff8029a195>{usbdev_release+173} <ffffffff8017c920>{__fput+100}
       <ffffffff8017b501>{filp_close+103} <ffffffff8017b58b>{sys_close+131}
       <ffffffff801102f6>{system_call+126}

Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00
RIP <ffffffff80299004>{hcd_pci_release+16} RSP <000001007a17be70>
CR2: ffffffffa01c7dd0
 <0>Kernel panic - not syncing: Oops

Comment 38 Pete Zaitcev 2009-04-20 05:17:56 UTC
No surprise here, the -78.EL does not have the fix. The fix was
committed in -78.29.EL, see Vivek's comment #33.
What was the need to test the -78?

Comment 39 Han Pingtian 2009-04-21 09:09:42 UTC
(In reply to comment #38)
> No surprise here, the -78.EL does not have the fix. The fix was
> committed in -78.29.EL, see Vivek's comment #33.
> What was the need to test the -78?  
Sorry for the confused comment.
I am just trying to verfiy the fix. First, I have to ensure the bug 
could be reporduced on the testing machine.
In the end, I reproduce it on altix4.rhts.bos.redhat.com (load/unload ehci_hcd
and run there "lsusb" parallely) under 2.6.9-78.EL within 1 minute:


Unable to handle kernel paging request at virtual address a0000002001b7d38
lsusb[6601]: Oops 8813272891392 [1]
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core vfat fat loop button ohci_hcd tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod

Pid: 6601, CPU 0, comm:                lsusb
psr : 0000101008126010 ifs : 8000000000000205 ip  : [<a000000100424790>]    Not tainted
ip is at hcd_pci_release+0x50/0xc0
unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000005559a99
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000100418300 b6  : a000000100424740 b7  : a000000100012970
f6  : 000000000000000000000 f7  : 000000000000000000000
f8  : 000000000000000000000 f9  : 000000000000000000000
f10 : 000000000000000000000 f11 : 000000000000000000000
r1  : a0000001009e0ea0 r2  : a00000010079b040 r3  : a0000002001b7d38
r8  : a0000002001b7ce8 r9  : a000000100365bc0 r10 : 0000000000000001
r11 : a000000100254ce0 r12 : e00000b0084cfe20 r13 : e00000b0084c8000
r14 : e0000030f7acd140 r15 : e0000030f7acd320 r16 : e0000030f7acd200
r17 : 0000000000000011 r18 : e0000030f7fa0110 r19 : a0007fff65270000
r20 : 00000000161ec7f8 r21 : 0000000002c3d8ff r22 : e00000b0084c8dd4
r23 : a0000001007f45f8 r24 : a0000001007f45f8 r25 : 0000000000000000
r26 : 0000000000000001 r27 : 0000001008126010 r28 : 400000000000b020
r29 : 00001213081a6018 r30 : 0000000000004000 r31 : 0000000000004000

Call Trace:
 [<a000000100016e40>] show_stack+0x80/0xa0
                                sp=e00000b0084cf9b0 bsp=e00000b0084c9338
 [<a000000100017750>] show_regs+0x890/0x8c0
                                sp=e00000b0084cfb80 bsp=e00000b0084c92f0
 [<a00000010003e9b0>] die+0x150/0x240
                                sp=e00000b0084cfba0 bsp=e00000b0084c92b0
 [<a000000100064920>] ia64_do_page_fault+0x8e0/0xbe0
                                sp=e00000b0084cfba0 bsp=e00000b0084c9248
 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260
                                sp=e00000b0084cfc50 bsp=e00000b0084c9248
 [<a000000100424790>] hcd_pci_release+0x50/0xc0
                                sp=e00000b0084cfe20 bsp=e00000b0084c9220
 [<a000000100418300>] usb_host_release+0x60/0x80
                                sp=e00000b0084cfe20 bsp=e00000b0084c9200
 [<a000000100365c40>] class_dev_release+0x80/0x120
                                sp=e00000b0084cfe20 bsp=e00000b0084c91d8
 [<a000000100254c70>] kobject_cleanup+0x170/0x1e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c91a0
 [<a000000100254d00>] kobject_release+0x20/0x40
                                sp=e00000b0084cfe20 bsp=e00000b0084c9180
 [<a000000100255e90>] kref_put+0xf0/0x1e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c9158
 [<a000000100254ad0>] kobject_put+0x30/0x60
                                sp=e00000b0084cfe20 bsp=e00000b0084c9138
 [<a000000100365f80>] class_device_put+0x20/0x40
                                sp=e00000b0084cfe20 bsp=e00000b0084c9118
 [<a000000100418270>] usb_bus_put+0x30/0x60
                                sp=e00000b0084cfe20 bsp=e00000b0084c90f8
 [<a00000010040e970>] usb_release_dev+0x190/0x220
                                sp=e00000b0084cfe20 bsp=e00000b0084c90d8
 [<a00000010035feb0>] device_release+0x70/0x120
                                sp=e00000b0084cfe20 bsp=e00000b0084c90b8
 [<a000000100254c70>] kobject_cleanup+0x170/0x1e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c9080
 [<a000000100254d00>] kobject_release+0x20/0x40
                                sp=e00000b0084cfe20 bsp=e00000b0084c9060
 [<a000000100255e90>] kref_put+0xf0/0x1e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c9038
 [<a000000100254ad0>] kobject_put+0x30/0x60
                                sp=e00000b0084cfe20 bsp=e00000b0084c9018
 [<a0000001003601a0>] put_device+0x20/0x40
                                sp=e00000b0084cfe20 bsp=e00000b0084c8ff0
 [<a00000010040f010>] usb_put_dev+0x30/0x60
                                sp=e00000b0084cfe20 bsp=e00000b0084c8fd0
 [<a000000100427770>] usbdev_release+0x1f0/0x220
                                sp=e00000b0084cfe20 bsp=e00000b0084c8f80
 [<a00000010012b620>] __fput+0x380/0x3e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c8f30
 [<a00000010012b6c0>] fput+0x40/0x60
                                sp=e00000b0084cfe30 bsp=e00000b0084c8f10
 [<a000000100127f80>] filp_close+0xc0/0x1a0
                                sp=e00000b0084cfe30 bsp=e00000b0084c8ee0
 [<a0000001001281b0>] sys_close+0x150/0x1c0
                                sp=e00000b0084cfe30 bsp=e00000b0084c8e68
 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00000b0084cfe30 bsp=e00000b0084c8e68
 [<a000000000010640>] 0xa000000000010640
                                sp=e00000b0084d0000 bsp=e00000b0084c8e68
Kernel panic - not syncing: Fatal exception

Then, I install the latest RHEL4-U8 kernel 2.6.9-88.EL. The testing has been
running about 3 hours. And The bug doesn't be reproduced. So I think the fix
works. I will change status to VERIFIED. Thanks!

Comment 41 errata-xmlrpc 2009-05-18 19:26:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html


Note You need to log in before you can comment on or make changes to this bug.