Description of problem: When running the reproducer of bz450865 (load/unload ohci-hcd module in a loop), there was a Kernel Oops, http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3663173 Unable to handle kernel paging request at ffffffffa0039dd0 RIP: <ffffffff80290a84>{hcd_pci_release+16} PML4 103027 PGD 105027 PMD 3fc68c067 PTE 0 Oops: 0000 [1] SMP CPU 6 Modules linked in: netconsole netdump md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core loop button battery ac k8_edac edac_mc e1000 sr_mod dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase usb_storage uhci_hcd ehci_hcd sd_mod scsi_mod Pid: 3723, comm: hald Not tainted 2.6.9-67.0.22.ELsmp RIP: 0010:[<ffffffff80290a84>] <ffffffff80290a84>{hcd_pci_release+16} RSP: 0018:00000102fb217e10 EFLAGS: 00010206 RAX: ffffffffa0039d80 RBX: 00000100dfe6ed00 RCX: 0000000000000030 RDX: 00000100dfe6ed00 RSI: ffffffff801ec6e2 RDI: 00000100dfe6ec78 RBP: ffffffff8040c040 R08: 00000105fc7ba878 R09: ffffffff801ec6e2 R10: ffffffff801ec6e2 R11: ffffffff80290a74 R12: ffffffff8040bf80 R13: ffffffff80416208 R14: 00000103001f7140 R15: 0000000000000000 FS: 0000002a963b1d20(0000) GS:ffffffff804f3980(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa0039dd0 CR3: 00000002fc7b2000 CR4: 00000000000006e0 Process hald (pid: 3723, threadinfo 00000102fb216000, task 00000103fc6477f0) Stack: ffffffff801ec6b5 ffffffff801ec6e2 00000101fc7d2c00 ffffffff8040bd00 ffffffff8040bc40 00000102fc65f4f8 ffffffff802888f6 00000101fc7d2d30 ffffffff801ec6b5 ffffffff801ec6e2 Call Trace:<ffffffff801ec6b5>{kobject_cleanup+84} <ffffffff801ec6e2>{kobject_release+0} <ffffffff802888f6>{usb_release_dev+60} <ffffffff801ec6b5>{kobject_cleanup+84} <ffffffff801ec6e2>{kobject_release+0} <ffffffffa002707c>{:sd_mod:scsi_disk_put+81} <ffffffffa002770d>{:sd_mod:sd_release+112} <ffffffff801824dd>{blkdev_put+161} <ffffffff8017be4b>{__fput+99} <ffffffff8017aa31>{filp_close+103} <ffffffff8017aaba>{sys_close+130} <ffffffff80110276>{system_call+126} Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00 RIP <ffffffff80290a84>{hcd_pci_release+16} RSP <00000102fb217e10> CR2: ffffffffa0039dd0 From the log, there were lots of sda failures. Looks like it was a virtual floppy, Jul 18 01:51:08 sun-x4600-01 kernel: usb 2-5: new full speed USB device using address 4 Jul 18 01:51:08 sun-x4600-01 kernel: scsi1 : SCSI emulation for USB Mass Storage devices Jul 18 01:51:08 sun-x4600-01 kernel: Vendor: AMI Model: Virtual Floppy Rev: 1.00 Jul 18 01:51:08 sun-x4600-01 kernel: Type: Direct-Access ANSI SCSI revision: 02 Jul 18 01:51:08 sun-x4600-01 kernel: Attached scsi removable disk sda at scsi1, channel 0, id 0, lun 0 The machine in question is sun-x4600-01.rhts.bos.redhat.com. I had setup a netdump before the Oops, but don't know why it failed to capture it. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-67.0.22.EL How reproducible: not always
How to reproduce: Insert any h/w into ohci usb port, run two scripts in parallel: $ while true; do rmmod ohci; modprobe ohci; done $ while true; do lsusb; done > /dev/null It's better to run 2-3 lsusb loops simultaneously. Seems to me, this is a race condition w.r.t. procfs
Hmm, reproducer from #c1 just hangs the kernel (verified on x86_64 and ppc64). So, this is another bug in ohci.
The same panic happened on another machine, ibm-morrison2.rhts.bos.redhat.com (x86_64). Vmcore can be found at, porkchop.devel.redhat.com:/mnt/redhat/qa/qa/qcai/vmcores/vmcore-455843 Hardware information about this machine can be found at, http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3683993 Unable to handle kernel paging request at ffffffffa024f910 RIP: <ffffffff802d36f8>{hcd_pci_release+16} PML4 103027 PGD 105027 PMD 106cad067 PTE 0 Oops: 0000 [1] CPU 0 Modules linked in: nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core loop button battery ac hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd raid1 raid0 dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod Pid: 13506, comm: cat Not tainted 2.6.9-67.0.22.EL RIP: 0010:[<ffffffff802d36f8>] <ffffffff802d36f8>{hcd_pci_release+16} RSP: 0018:00000100ed8abe90 EFLAGS: 00010202 RAX: ffffffffa024f8c0 RBX: 000001010f9d9d50 RCX: 0000000000000030 RDX: 000001010f9d9d50 RSI: ffffffff8021c010 RDI: 000001010f9d9cc8 RBP: ffffffff804642e0 R08: 000001010e39c840 R09: 00000100ebdc9180 R10: ffffffff8021c010 R11: ffffffff802d36e8 R12: ffffffff80464200 R13: ffffffff8046f268 R14: 0000000000000000 R15: 0000000000000000 FS: 0000002a95561b00(0000) GS:ffffffff80555000(0000) knlGS:00000000f7eb06c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa024f910 CR3: 0000000000101000 CR4: 00000000000006e0 Process cat (pid: 13506, threadinfo 00000100ed8aa000, task 00000100eb370130) Stack: ffffffff8021bfe3 ffffffff8021c010 0000010103f07c00 ffffffff80463f40 ffffffff80463e60 0000010103f2eef8 ffffffff802c94ba 0000010103f07d58 ffffffff8021bfe3 ffffffff8021c010 Call Trace:<ffffffff8021bfe3>{kobject_cleanup+84} <ffffffff8021c010>{kobject_release+0} <ffffffff802c94ba>{usb_release_dev+60} <ffffffff8021bfe3>{kobject_cleanup+84} <ffffffff8021c010>{kobject_release+0} <ffffffff801df3c4>{sysfs_release+54} <ffffffff801906f4>{__fput+99} <ffffffff8018ed24>{filp_close+103} <ffffffff8018ee6d>{sys_close+322} <ffffffff80110a9e>{system_call+126} Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00 RIP <ffffffff802d36f8>{hcd_pci_release+16} RSP <00000100ed8abe90> CR2: ffffffffa024f910 Modules linked in: nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core loop button battery ac hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd raid1 raid0 dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod Pid: 13506, comm: cat Not tainted 2.6.9-67.0.22.EL RIP: 0010:[<ffffffff802d36f8>] <ffffffff802d36f8>{hcd_pci_release+16} RSP: 0018:00000100ed8abe90 EFLAGS: 00010202 RAX: ffffffffa024f8c0 RBX: 000001010f9d9d50 RCX: 0000000000000030 RDX: 000001010f9d9d50 RSI: ffffffff8021c010 RDI: 000001010f9d9cc8 RBP: ffffffff804642e0 R08: 000001010e39c840 R09: 00000100ebdc9180 R10: ffffffff8021c010 R11: ffffffff802d36e8 R12: ffffffff80464200 R13: ffffffff8046f268 R14: 0000000000000000 R15: 0000000000000000 FS: 0000002a95561b00(0000) GS:ffffffff80555000(0000) knlGS:00000000f7eb06c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa024f910 CR3: 0000000000101000 CR4: 00000000000006e0 Call Trace:<ffffffff8021bfe3>{kobject_cleanup+84} <ffffffff8021c010>{kobject_release+0} <ffffffff802c94ba>{usb_release_dev+60} <ffffffff8021bfe3>{kobject_cleanup+84} <ffffffff8021c010>{kobject_release+0} <ffffffff801df3c4>{sysfs_release+54} <ffffffff801906f4>{__fput+99} <ffffffff8018ed24>{filp_close+103} <ffffffff8018ee6d>{sys_close+322} <ffffffff80110a9e>{system_call+126}
*** Bug 456065 has been marked as a duplicate of this bug. ***
Created attachment 312538 [details] reproducer This is a common bug for all usb host controller drivers (ohci, ehci, uhci), it cause kernel to oops or to hang.
Created attachment 312543 [details] proposed patch
Updating PM score.
Created attachment 321570 [details] new proposed patch This patch has two parts: 1. Allow kfree() if hdc_free is NULL, and relocate usb_hcd so it's legal 2. Add the "dead" HCD stub so we don'tuse hc_driver ifreed with the module
While testing on a RHEL 4.7 Zstream Kernel, I have seen the following Oops on SGI Altix machine. Do you think it is the same issue as in here? 11/11/08 14:36:59 JobID:35787 Test:/kernel/errata/4.6.z/450865 Response:1 11/11/08 14:36:59 testID:1061889 start: ACPI: PCI interrupt 0002:01:02.0[A]: no GSI ACPI: PCI interrupt 0002:01:02.1[B]: no GSI ACPI: PCI interrupt 0012:01:02.0[A]: no GSI ... ohci_hcd 0012:01:02.0: init err ohci_hcd 0012:01:02.0: can't start ohci_hcd 0012:01:02.0: init error -16 ohci_hcd: probe of 0012:01:02.0 failed with error -16 ... ACPI: PCI interrupt 0012:01:02.1[B]: no GSI ACPI: PCI interrupt 0002:01:02.0[A]: no GSI ACPI: PCI interrupt 0002:01:02.1[B]: no GSI ... Unable to handle kernel paging request at virtual address a00000020021a080 cat[6124]: Oops 8813272891392 [1] Modules linked in: nfsd exportfs nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core vfat fat loop button ehci_hcd tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod^M Pid: 6124, CPU 2, comm: cat psr : 0000101008126010 ifs : 8000000000000205 ip : [<a000000100424c50>] Not tainted ip is at hcd_pci_release+0x50/0xc0 unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000069559a99 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001004187c0 b6 : a000000100424c00 b7 : a000000100012970 f6 : 1003e0000000000000000 f7 : 1003e0000000000004000 f8 : 1003e0000000000000000 f9 : 1000d8000000000000000 f10 : 1003e0000000000000001 f11 : 0fffffefdfffff0102000 r1 : a0000001009e0fd0 r2 : a00000010079b108 r3 : a00000020021a080 r8 : a00000020021a030 r9 : a000000100366080 r10 : 0000000000000001 r11 : a0000001002551a0 r12 : e00000301144fe20 r13 : e000003011448000 r14 : e000003015e23c78 r15 : e000003015e23e58 r16 : e000003015e23d38 r17 : 000000000000002e r18 : e00000b0f67c8190 r19 : a0007fff65270000 r20 : 0000000006009d98 r21 : 0000000000c013b3 r22 : e000003011448dd4 r23 : a0000001007f4738 r24 : a0000001007f4738 r25 : 0000000000000000 r26 : 0000000000000001 r27 : 0000001008126010 r28 : 4000000000002300 r29 : 00001213081a6010 r30 : 0000000000004000 r31 : 0000000000004000 Call Trace: [<a000000100016e40>] show_stack+0x80/0xa0 sp=e00000301144f9b0 bsp=e000003011449378 [<a000000100017750>] show_regs+0x890/0x8c0 sp=e00000301144fb80 bsp=e000003011449330 [<a00000010003e9b0>] die+0x150/0x240 sp=e00000301144fba0 bsp=e0000030114492f0 [<a000000100064920>] ia64_do_page_fault+0x8e0/0xbe0 sp=e00000301144fba0 bsp=e000003011449288 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260 sp=e00000301144fc50 bsp=e000003011449288 [<a000000100424c50>] hcd_pci_release+0x50/0xc0 sp=e00000301144fe20 bsp=e000003011449260 [<a0000001004187c0>] usb_host_release+0x60/0x80 sp=e00000301144fe20 bsp=e000003011449240 [<a000000100366100>] class_dev_release+0x80/0x120 sp=e00000301144fe20 bsp=e000003011449220 [<a000000100255130>] kobject_cleanup+0x170/0x1e0 sp=e00000301144fe20 bsp=e0000030114491e0 [<a0000001002551c0>] kobject_release+0x20/0x40^M sp=e00000301144fe20 bsp=e0000030114491c0 [<a000000100256350>] kref_put+0xf0/0x1e0 sp=e00000301144fe20 bsp=e000003011449198 [<a000000100254f90>] kobject_put+0x30/0x60 sp=e00000301144fe20 bsp=e000003011449178 [<a000000100366440>] class_device_put+0x20/0x40 sp=e00000301144fe20 bsp=e000003011449158 [<a000000100418730>] usb_bus_put+0x30/0x60^@^M sp=e00000301144fe20 bsp=e000003011449138 [<a00000010040ee30>] usb_release_dev+0x190/0x220 sp=e00000301144fe20 bsp=e000003011449118 [<a000000100360370>] device_release+0x70/0x120 sp=e00000301144fe20 bsp=e0000030114490f8 [<a000000100255130>] kobject_cleanup+0x170/0x1e0 sp=e00000301144fe20 bsp=e0000030114490c0 [<a0000001002551c0>] kobject_release+0x20/0x40 sp=e00000301144fe20 bsp=e0000030114490a0 [<a000000100256350>] kref_put+0xf0/0x1e0 sp=e00000301144fe20 bsp=e000003011449078 [<a000000100254f90>] kobject_put+0x30/0x60 sp=e00000301144fe20 bsp=e000003011449058 [<a000000100255170>] kobject_cleanup+0x1b0/0x1e0 sp=e00000301144fe20 bsp=e000003011449020 [<a0000001002551c0>] kobject_release+0x20/0x40 sp=e00000301144fe20 bsp=e000003011449000^M [<a000000100256350>] kref_put+0xf0/0x1e0 sp=e00000301144fe20 bsp=e000003011448fd0 [<a000000100254f90>] kobject_put+0x30/0x60 sp=e00000301144fe20 bsp=e000003011448fb0 [<a0000001001c47e0>] sysfs_release+0xa0/0x1e0 sp=e00000301144fe20 bsp=e000003011448f80 [<a00000010012b780>] __fput+0x380/0x3e0 sp=e00000301144fe20 bsp=e000003011448f30 [<a00000010012b820>] fput+0x40/0x60 sp=e00000301144fe30 bsp=e000003011448f10 [<a0000001001280e0>] filp_close+0xc0/0x1a0 sp=e00000301144fe30 bsp=e000003011448ee0 [<a000000100128310>] sys_close+0x150/0x1c0 sp=e00000301144fe30 bsp=e000003011448e68 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20 sp=e00000301144fe30 bsp=e000003011448e68 [<a000000000010640>] 0xa000000000010640 sp=e000003011450000 bsp=e000003011448e68
Created attachment 323322 [details] Full Log of Oops on SGI Altix
Trace path is the same like we have in original report. I think this is the same issue.
Test kernel 2.6.9-78.18.EL.bz455843.1 is available here (with ia64): http://people.redhat.com/zaitcev/ftp/455843/ Feel free to let me know if more packages are needed, e.g. kernel-devel for any specific arch.
Tested on altix4.rhts.bos.redhat.com with the new Linux kernel by the reproducer in comment #5 on, and it did not panic any more. Only the following messages output on the serial console. bus 3: replacing with dummies bus 4: replacing with dummies bus 5: replacing with dummies bus 6: replacing with dummies bus 1: replacing with dummies bus 2: replacing with dummies The reproducer almost immediately caused the panic with the old Linux kernel. I have also tried the following test with the new Linux kernel for around a hour without seeing any issue. while :; do rmmod ohci-hcd; modprobe ohci-hcd; done
New test kernel 2.6.9-78.18.EL.bz455843.4 is available at the same location. Cai, Ulrich, and anyone interested in this but, please test. The .4 incorporates fixes for bug 471560 and a fix for a failure case (it has actually happened at the box that Cai provided for me). Otherwise it's the same as .1.
Created attachment 325952 [details] proposed patch w/ 471560 This is built into .bz455843.4.
patch posted on Wed, 10 Dec 2008 18:42:46 -0700. move to POST, and dev ack.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I have tested the new kernel 2.6.9-78.18.EL.bz455843.4 on several machines, and have not seen any problem, https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=40744 Thanks Pete!
Also, running the test for 3 hours on various machines over the weekend did not show any issue.
That's great to know. Unfortunately, Prarit was sceptical, so I'm having trouble drumming up reviews for it. Thread: http://post-office.corp.redhat.com/archives/rhkernel-list/2008-December/msg00467.html
Committed in 78.29.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
I have reproduced this bug on bm-morrison2.rhts.bos.redhat.com with RHEL4-U7, kernel version 2.6.9-78.ELsmp: cannot read deviUnable to handle kernel paging requestce descriptor No at ffffffffa01c7dd0 RIP: such device (19<ffffffff80299004>{hcd_pci_release+16}) PML4 103027 PGD 105027 PMD edd0b067 PTE 770cb163 Oops: 0000 [1] SMP CPU 2 Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core cpufreq_powersave loop button battery ac hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod Pid: 29401, comm: lsusb Not tainted 2.6.9-78.ELsmp RIP: 0010:[<ffffffff80299004>] <ffffffff80299004>{hcd_pci_release+16} RSP: 0018:000001007a17be70 EFLAGS: 00010206 RAX: ffffffffa01c7d80 RBX: 000001010b347d00 RCX: 0000000000000030 RDX: 000001010b347d00 RSI: ffffffff801edc9a RDI: 000001010b347c78 RBP: ffffffff80418740 R08: 0000000000000001 R09: ffffffff801edc9a R10: ffffffff801edc9a R11: ffffffff80298ff4 R12: ffffffff80418680 R13: ffffffff80427488 R14: 000001010aee6178 R15: 00000000ffffffff FS: 0000002a958a5b00(0000) GS:ffffffff8050d380(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa01c7dd0 CR3: 00000000edfa2000 CR4: 00000000000006e0 Process lsusb (pid: 29401, threadinfo 000001007a17a000, task 000001007bdb77f0) Stack: ffffffff801edc6d ffffffff801edc9a 0000010037dd6c00 ffffffff80418400 ffffffff80418340 00000100edd055b8 ffffffff80290e42 0000010037dd6d30 ffffffff801edc6d 0000007fbffff501 Call Trace:<ffffffff801edc6d>{kobject_cleanup+84} <ffffffff801edc9a>{kobject_release+0} <ffffffff80290e42>{usb_release_dev+60} <ffffffff801edc6d>{kobject_cleanup+84} <ffffffff8029a195>{usbdev_release+173} <ffffffff8017c920>{__fput+100} <ffffffff8017b501>{filp_close+103} <ffffffff8017b58b>{sys_close+131} <ffffffff801102f6>{system_call+126} Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00 RIP <ffffffff80299004>{hcd_pci_release+16} RSP <000001007a17be70> CR2: ffffffffa01c7dd0 <0>Kernel panic - not syncing: Oops
No surprise here, the -78.EL does not have the fix. The fix was committed in -78.29.EL, see Vivek's comment #33. What was the need to test the -78?
(In reply to comment #38) > No surprise here, the -78.EL does not have the fix. The fix was > committed in -78.29.EL, see Vivek's comment #33. > What was the need to test the -78? Sorry for the confused comment. I am just trying to verfiy the fix. First, I have to ensure the bug could be reporduced on the testing machine. In the end, I reproduce it on altix4.rhts.bos.redhat.com (load/unload ehci_hcd and run there "lsusb" parallely) under 2.6.9-78.EL within 1 minute: Unable to handle kernel paging request at virtual address a0000002001b7d38 lsusb[6601]: Oops 8813272891392 [1] Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core vfat fat loop button ohci_hcd tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod Pid: 6601, CPU 0, comm: lsusb psr : 0000101008126010 ifs : 8000000000000205 ip : [<a000000100424790>] Not tainted ip is at hcd_pci_release+0x50/0xc0 unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000005559a99 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000100418300 b6 : a000000100424740 b7 : a000000100012970 f6 : 000000000000000000000 f7 : 000000000000000000000 f8 : 000000000000000000000 f9 : 000000000000000000000 f10 : 000000000000000000000 f11 : 000000000000000000000 r1 : a0000001009e0ea0 r2 : a00000010079b040 r3 : a0000002001b7d38 r8 : a0000002001b7ce8 r9 : a000000100365bc0 r10 : 0000000000000001 r11 : a000000100254ce0 r12 : e00000b0084cfe20 r13 : e00000b0084c8000 r14 : e0000030f7acd140 r15 : e0000030f7acd320 r16 : e0000030f7acd200 r17 : 0000000000000011 r18 : e0000030f7fa0110 r19 : a0007fff65270000 r20 : 00000000161ec7f8 r21 : 0000000002c3d8ff r22 : e00000b0084c8dd4 r23 : a0000001007f45f8 r24 : a0000001007f45f8 r25 : 0000000000000000 r26 : 0000000000000001 r27 : 0000001008126010 r28 : 400000000000b020 r29 : 00001213081a6018 r30 : 0000000000004000 r31 : 0000000000004000 Call Trace: [<a000000100016e40>] show_stack+0x80/0xa0 sp=e00000b0084cf9b0 bsp=e00000b0084c9338 [<a000000100017750>] show_regs+0x890/0x8c0 sp=e00000b0084cfb80 bsp=e00000b0084c92f0 [<a00000010003e9b0>] die+0x150/0x240 sp=e00000b0084cfba0 bsp=e00000b0084c92b0 [<a000000100064920>] ia64_do_page_fault+0x8e0/0xbe0 sp=e00000b0084cfba0 bsp=e00000b0084c9248 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260 sp=e00000b0084cfc50 bsp=e00000b0084c9248 [<a000000100424790>] hcd_pci_release+0x50/0xc0 sp=e00000b0084cfe20 bsp=e00000b0084c9220 [<a000000100418300>] usb_host_release+0x60/0x80 sp=e00000b0084cfe20 bsp=e00000b0084c9200 [<a000000100365c40>] class_dev_release+0x80/0x120 sp=e00000b0084cfe20 bsp=e00000b0084c91d8 [<a000000100254c70>] kobject_cleanup+0x170/0x1e0 sp=e00000b0084cfe20 bsp=e00000b0084c91a0 [<a000000100254d00>] kobject_release+0x20/0x40 sp=e00000b0084cfe20 bsp=e00000b0084c9180 [<a000000100255e90>] kref_put+0xf0/0x1e0 sp=e00000b0084cfe20 bsp=e00000b0084c9158 [<a000000100254ad0>] kobject_put+0x30/0x60 sp=e00000b0084cfe20 bsp=e00000b0084c9138 [<a000000100365f80>] class_device_put+0x20/0x40 sp=e00000b0084cfe20 bsp=e00000b0084c9118 [<a000000100418270>] usb_bus_put+0x30/0x60 sp=e00000b0084cfe20 bsp=e00000b0084c90f8 [<a00000010040e970>] usb_release_dev+0x190/0x220 sp=e00000b0084cfe20 bsp=e00000b0084c90d8 [<a00000010035feb0>] device_release+0x70/0x120 sp=e00000b0084cfe20 bsp=e00000b0084c90b8 [<a000000100254c70>] kobject_cleanup+0x170/0x1e0 sp=e00000b0084cfe20 bsp=e00000b0084c9080 [<a000000100254d00>] kobject_release+0x20/0x40 sp=e00000b0084cfe20 bsp=e00000b0084c9060 [<a000000100255e90>] kref_put+0xf0/0x1e0 sp=e00000b0084cfe20 bsp=e00000b0084c9038 [<a000000100254ad0>] kobject_put+0x30/0x60 sp=e00000b0084cfe20 bsp=e00000b0084c9018 [<a0000001003601a0>] put_device+0x20/0x40 sp=e00000b0084cfe20 bsp=e00000b0084c8ff0 [<a00000010040f010>] usb_put_dev+0x30/0x60 sp=e00000b0084cfe20 bsp=e00000b0084c8fd0 [<a000000100427770>] usbdev_release+0x1f0/0x220 sp=e00000b0084cfe20 bsp=e00000b0084c8f80 [<a00000010012b620>] __fput+0x380/0x3e0 sp=e00000b0084cfe20 bsp=e00000b0084c8f30 [<a00000010012b6c0>] fput+0x40/0x60 sp=e00000b0084cfe30 bsp=e00000b0084c8f10 [<a000000100127f80>] filp_close+0xc0/0x1a0 sp=e00000b0084cfe30 bsp=e00000b0084c8ee0 [<a0000001001281b0>] sys_close+0x150/0x1c0 sp=e00000b0084cfe30 bsp=e00000b0084c8e68 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20 sp=e00000b0084cfe30 bsp=e00000b0084c8e68 [<a000000000010640>] 0xa000000000010640 sp=e00000b0084d0000 bsp=e00000b0084c8e68 Kernel panic - not syncing: Fatal exception Then, I install the latest RHEL4-U8 kernel 2.6.9-88.EL. The testing has been running about 3 hours. And The bug doesn't be reproduced. So I think the fix works. I will change status to VERIFIED. Thanks!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html