| Summary: | [RHEL6.1] PPC64 - Oops: Kernel access of bad area, sig: 11 [#1] | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jeff Burke <jburke> | ||||
| Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.1 | CC: | arozansk, dzickus, jstancek, pbunyan, sbest | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | ppc64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-03-24 13:50:01 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 676037 | ||||||
| Attachments: |
|
||||||
Created attachment 477542 [details]
console log for failed system
Moving the target to RC. I had trouble reproducing this issue and the console log looked strange because a USB device seemed to have been added during scrashme. What Jeff and I discovered is, with the IBM blades there is a button on each blade that tells the console controller to switch USB CDROM to that particular blade. It seemed that while the machine was running a test, someone accidentally pressed the button temporarily giving the CDROM to this blade. Minutes later someone realized their error and pushed the button on the correct blade, thus disconnecting it from the blade in question. Jeff and I tried multiple times to reproduce the panic using this scenario. While we were able to duplicate the error messages exactly as seen in the console log, we could not get the panic to happen. We tried various timings from 5 seconds between presses to 30 seconds. Nothing. Therefore, I don't find this to be a beta blocker at all, as it seems to be a strange race condition. I'll continue investigating to find where the race can happen and if upstream already fixed it. Cheers, Don Discussed this briefly with upstream, we couldn't figure out the race condition upon which this could happen. I haven't been able to reproduce this. Therefore it is very difficult to debug. Closing for now unless someone sees it again. |
Description of problem: While running the scrashme test the system Oops'd Version-Release number of selected component (if applicable): 2.6.32-114.el6.ppc64.debug How reproducible: Unknown Actual results: Unable to handle kernel paging request for data at address 0x00000040 Faulting instruction address: 0xc00000000043ad74 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: sr_mod cdrom ums_cypress usb_storage aes_generic ts_kmp nls_koi8_u nls_cp932 sunrpc ipv6 dm_mirror dm_region_hash dm_log ses enclosure sg ehea ext4 jbd2 mbcache sd_mod crc_t10dif ipr radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core power_supply dm_mod [last unloaded: rmd128] NIP: c00000000043ad74 LR: c00000000043ad64 CTR: c00000000045cf20 REGS: c0000000e31eb4d0 TRAP: 0300 Not tainted (2.6.32-114.el6.ppc64.debug) MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24004024 XER: 00000008 DAR: 0000000000000040, DSISR: 0000000040000000 TASK = c0000000e31da3d0[44] 'khubd' THREAD: c0000000e31e8000 CPU: 2 GPR00: c00000000043ad64 c0000000e31eb750 c0000000014b8c30 0000000000000001 GPR04: c0000000de44fb00 fffffffffffffffe 0000000000000000 0000000000000002 GPR08: 0000000000000001 0000000000000000 c00000000043ad38 0000000000000000 GPR12: 0000000024004022 c000000001592a00 c00000000414f398 c0000000013e5a48 GPR16: c00000000405dbc0 c0000000e50fd080 c00000000414f3a0 0000000000000002 GPR20: 0000000000000001 c00000000414f3a8 c00000000414f3a0 0000000000000000 GPR24: c0000000e50fd080 c0000000de442fa0 c0000000de44fb14 0000000000000001 GPR28: fffffffffffffffe c0000000de44fb00 c00000000145f658 c0000000013e5ca8 NIP [c00000000043ad74] .usb_hcd_unlink_urb+0x74/0x170 LR [c00000000043ad64] .usb_hcd_unlink_urb+0x64/0x170 Call Trace: [c0000000e31eb750] [c00000000043ad64] .usb_hcd_unlink_urb+0x64/0x170 (unreliable) [c0000000e31eb7f0] [c00000000043cdec] .usb_kill_urb+0x8c/0x140 [c0000000e31eb8c0] [c00000000043abe0] .usb_hcd_flush_endpoint+0x120/0x240 [c0000000e31eb970] [c00000000043dfc8] .usb_disable_endpoint+0x68/0xc0 [c0000000e31eba00] [c00000000043e0a0] .usb_disable_device+0x80/0x290 [c0000000e31ebab0] [c000000000435660] .usb_disconnect+0x110/0x250 [c0000000e31ebb60] [c000000000435630] .usb_disconnect+0xe0/0x250 [c0000000e31ebc10] [c000000000435630] .usb_disconnect+0xe0/0x250 [c0000000e31ebcc0] [c000000000437524] .hub_thread+0x6b4/0x1850 [c0000000e31ebea0] [c0000000000bcfcc] .kthread+0xbc/0xd0 [c0000000e31ebf90] [c000000000033174] .kernel_thread+0x54/0x70 Instruction dump: 2f800000 409d0078 e87d0048 4bff52d1 60000000 7fe3fb78 7f64db78 48189521 60000000 e93d0048 7f85e378 7fa4eb78 <e8690040> 4bfffbb9 7c7f1b78 e87d0048 Kernel panic - not syncing: Fatal exception Call Trace: [c0000000e31eb1f0] [c000000000013844] .show_stack+0x74/0x1c0 (unreliable) [c0000000e31eb2a0] [c0000000005ca4ac] .panic+0x80/0x1c0 [c0000000e31eb330] [c000000000030c1c] .die+0x21c/0x2a0 [c0000000e31eb3e0] [c000000000043328] .bad_page_fault+0x98/0xe0 [c0000000e31eb460] [c00000000000525c] handle_page_fault+0x3c/0x74 --- Exception: 300 at .usb_hcd_unlink_urb+0x74/0x170 LR = .usb_hcd_unlink_urb+0x64/0x170 [c0000000e31eb7f0] [c00000000043cdec] .usb_kill_urb+0x8c/0x140 [c0000000e31eb8c0] [c00000000043abe0] .usb_hcd_flush_endpoint+0x120/0x240 [c0000000e31eb970] [c00000000043dfc8] .usb_disable_endpoint+0x68/0xc0 [c0000000e31eba00] [c00000000043e0a0] .usb_disable_device+0x80/0x290 [c0000000e31ebab0] [c000000000435660] .usb_disconnect+0x110/0x250 [c0000000e31ebb60] [c000000000435630] .usb_disconnect+0xe0/0x250 [c0000000e31ebc10] [c000000000435630] .usb_disconnect+0xe0/0x250 [c0000000e31ebcc0] [c000000000437524] .hub_thread+0x6b4/0x1850 [c0000000e31ebea0] [c0000000000bcfcc] .kthread+0xbc/0xd0 [c0000000e31ebf90] [c000000000033174] .kernel_thread+0x54/0x70 panic occurred, switching back to text console Rebooting in 180 seconds..[-- MARK -- Mon Feb 7 13:30:00 2011] Expected results: Additional info: