Bug 129265
Summary: | kernel panic when repeatedly accessing /proc/bus/usb/devices and hot-swapping usb device | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Dan Mechanic <danmechanic> | ||||
Component: | kernel | Assignee: | Pete Zaitcev <zaitcev> | ||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | mbrandsma, petrides, redhat-bugzilla, riel | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-03-15 15:37:08 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 168424 | ||||||
Attachments: |
|
Description
Dan Mechanic
2004-08-05 18:08:00 UTC
Please provide the console "oops" output. Thanks. -ernie Okay, I'm able to reproduce the same using the latest kernel. I started the cat loop and used my memory stick and inserted and removed a few times and there it is: Oops. I was able to reproduce the same with my very very bad usb camera, but I'm simply to slow copying the oops error before the screen of my laptop goes automatically to standby :-( Maybe the following helps you anyway: --- snipp --- Unable to handle kernel paging request at virtual address 6f732e80 printing eip: d0b4007e *pde = 00000000 Oops: 0000 sd_mod usb-storage e100 ide-cd cdrom sg scsi_mod keybdev mousedev hid input usb-uhci usbcore ext3 jbd CPU: 0 EIP: 0060:[<d0b4007e>] Not tainted EFLAGS: 00010246 EIP is at usb_dump_config [usbcore] 0x6e (2.4.21-15.0.4.EL/i686) eax: cf120c80 ebx: 6f732e78 ecx: 00000002 edx: 0000002f esi: 00000000 edi: ce52a0cd ebp: ce52bf00 esp: ce319e68 ds: 0068 es: 0068 ss: 0068 Process cat (pid: 3286, stackpage=ce319000) Stack: ce52a0a5 ce52bf00 cf120c80 00000001 00000002 00000000 00000000 cf27be00 ce52bf00 00000018 00000001 d0b403d6 00000002 ce52a0a5 ce52bf00 cf120c80 00000001 ce52bf00 ce52a042 cf27be00 ce52a000 d0b40517 ce52a042 ce52bf00 Call Trace: [<d0b403d6>] usb_dump_desc [usbcore] 0xb6 (0xce319e94) [<d9b40517>] usb_device_dump [usbcore] 0x127 (0xce319ebc) [<d0b44165>} .rodata.str1.1 [usbcore] 0x613 (0xce319ee0) [ HERE MY LAPTOP SCREEN TURNED OF ITSELF / STANDBY *gnarf* :-( ] --- snapp --- Oooh, I understand now. I forgot that reading from /proc/bus/usb/devices fetches all descriptors from the device live. Not surprising it causes oopses on disconnect... This may be a little difficult to fix, unfortunately. 2.4 is somewhat weak in the refcounting department. Created attachment 104809 [details]
Candidate #1
This seems too simple to be the solution, but it appears to work.
All it does is adding the refcounting which prevents oopses;
it does not interlock with disconnects.
The 2.6 takes a different path: they are trying to fracture
big-scope locks and then re-take them as needed, but keeping
semaphores around.
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.3.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html I have in the last few days has this occur with the 2.4.21-40.ELsmp kernel on HS20 blades. This has only been noticed due to a hardware fault causing three USB devices to be automatically assigned/deassigned every few seconds for a period of typically 10 minutes before the problem occurs. So while the bug may have been largely fixed, there might still be a less likely cause floating around in the kernel source. I'm not expecting a quick fix for this, just documenting that the problem has been seen to still occur, albeit requiring a lot more effort to cause it. Is this a new bug, or should this be reopened? Further info for our situation: These are IBM model 8843 blades - and we are using the usb-handoff "workaround" to prevent lockup during boot. Today we had one node experience two seperate spurts of 143 disconnects and 262 disconnects in minutes. The second sequence caused a panic. On another node we had a sequence of 271 USB connects/disconnects cause a panic. In all cases where a panic occurs the last thing logged is the group of connects. I can only assume this means that the disconnects are the final straw. Unfortunately, as we are having the hardware issue fixed this is going to be difficult to replicate without risking the sanity of our production environment... I managed to capture the last bit of a kernel panic this morning... here 'tis: Process modprobe (pid: 29354, stackpage=e1677000) Stack: e429d200 f7502200 00000001 f5d6f000 f898db08 f0d62800 00000000 f89a0708 00000002 f898d150 f5d6f000 f8c24180 f898d0e6 f8999ce4 f8c20ec2 c0387f84 f8c162a9 f8c24180 c012afc6 c04824a4 00000001 f8c12000 0000004d f8c2170c Call Trace: [<f898db08>] usb_check_support [usbcore] 0x68 (0xe1677ed4) [<f89a0708>] usb_bus_list [usbcore] 0x0 (0xe1677ee0) [<f898d150>] usb_scan_devices_Rsmp_ca4f6301 [usbcore] 0x40 (0xe1677ee8) [<f8c24180>] usb_storage_driver [usb-storage] 0x0 (0xe1677ef0) [<f898d0e6>] usb_register_Rsmp_6654b6ff [usbcore] 0x86 (0xe1677ef4) [<f8999ce4>] .rodata.str1.4 [usbcore] 0x0 (0xe1677ef8) [<f8c20ec2>] .rodata.str1.1 [usb-storage] 0x723 (0xe1677efc) [<f8c162a9>] usb_stor_init [usb-storage] 0x59 (0xe1677f04) [<f8c24180>] usb_storage_driver [usb-storage] 0x0 (0xe1677f08) [<c012afc6>] sys_init_module [kernel] 0x5b6 (0xe1677f0c) [<f8c2170c>] .kmodtab [usb-storage] 0x0 (0xe1677f20) [<f8c12060>] host_info [usb-storage] 0x0 (0xe1677f2c) [<f8c21724>] __ksymtab [usb-storage] 0x0 (0xe1677f30) [<f8c12060>] host_info [usb-storage] 0x0 (0xe1677f58) [<c02af06f>] no_timing [kernel] 0x7 (0xe1677fc0) Code: 80 78 04 00 75 07 83 c4 08 5b 5e c3 90 89 5c 24 04 43 89 34 Kernel panic: Fatal exception And this was in the messages just before the panic occurred... May 16 22:41:56 hn /etc/hotplug/usb.agent: Setup usbcore for USB product 4b4/5204/1 May 16 22:41:56 hn /etc/hotplug/usb.agent: Setup usbcore for USB product 4b4/5204/1 May 16 22:41:56 hn devlabel: devlabel's temporary ignore list /etc/sysconfig/devlabel.d/ignore_list has been emptied due to a change in device configuration. May 16 22:41:56 hn devlabel: devlabel service started/restarted May 16 22:41:56 hn /etc/hotplug/usb.agent: Setup hid usb-storage for USB product 4b3/4004/1 May 16 22:41:56 hn last message repeated 2 times May 16 22:41:56 hn kernel: Initializing USB Mass Storage driver... May 16 22:41:56 hn kernel: usb.c: registered new driver usb-storage May 16 22:41:56 hn kernel: scsi3 : SCSI emulation for USB Mass Storage devices May 16 22:41:57 hn /etc/hotplug/usb.agent: Setup hid for USB product 4b3/4004/1 May 16 22:41:57 hn /etc/hotplug/usb.agent: Setup hid for USB product 4b3/4004/1 May 16 22:41:57 hn /etc/hotplug/usb.agent: Setup keybdev mousedev for USB product 4b3/4004/1 May 16 22:41:57 hn /etc/hotplug/usb.agent: Setup keybdev mousedev for USB product 4b3/4004/1 May 16 22:41:57 hn devlabel: devlabel's temporary ignore list /etc/sysconfig/devlabel.d/ignore_list has been emptied due to a change in device configuration. May 16 22:41:57 hn devlabel: devlabel's temporary ignore list /etc/sysconfig/devlabel.d/ignore_list has been emptied due to a change in device configuration. May 16 22:41:57 hn devlabel: devlabel service started/restarted May 16 22:41:57 hn devlabel: devlabel service started/restarted May 16 22:42:12 hn kernel: usb-storage: Refusing to reset a multi-interface device May 16 22:42:16 hn kernel: usb.c: USB disconnect on device 00:1d.0-2 address 8 May 16 22:42:16 hn kernel: usb.c: USB disconnect on device 00:1d.0-2.1 address 9 May 16 22:42:16 hn kernel: usb.c: USB disconnect on device 00:1d.0-2.3 address 10 May 16 22:42:16 hn kernel: inserting floppy driver for 2.4.21-40.ELsmp May 16 22:42:16 hn kernel: Floppy drive(s): fd0 is 1.44M |