From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 Description of problem: On very rare occasions, an x86_64 kernel Oopses during boot. Seems the blank_console_t function calls into a work queue function that references an unallocated spinlock. The spinlock code causes the Oops. This has been seen on kernels 2.6.9-22, 2.6.9-28, and 2.6.9-29. Oops output is included in this report. Version-Release number of selected component (if applicable): kernel-2.6.9-22 and others How reproducible: Sometimes Steps to Reproduce: 1. Boot the machine 2. Keep trying. 3. Actual Results: About 1 in hundreds of tries a kernel Oops occurs during boot. Expected Results: No oops. Additional info: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:118 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.9-29.ELmm_nopent RIP: 0010:[<ffffffff803062a7>] <ffffffff803062a7>{_spin_lock_irqsave+40} RSP: 0000:ffffffff8044ee58 EFLAGS: 00010096 RAX: 0000000000000016 RBX: 0000000000000000 RCX: 0000000000020000 RDX: 0000000000001ad9 RSI: 0000000000000046 RDI: ffffffff803dbc20 RBP: 0000010009569ae0 R08: 00000000fffffffe R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff803f67c0 R13: ffffffff8044eea8 R14: 0000000000000206 R15: 0000000000000008 FS: 0000000000000000(0000) GS:ffffffff804d9900(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo 000001000ee70000, task 000001009ff267f0) Stack: ffffffff803f67c0 0000000000000246 0000000000000000 ffffffff80146e28 0000000000000000 ffffffff80233931 ffffffff8044eea8 ffffffff80146ec6 ffffffff80496360 ffffffff8013fa41 Call Trace:<IRQ> <ffffffff80146e28>{__queue_work+16} <ffffffff80233931>{blank_screen_t+0} <ffffffff80146ec6>{queue_work+80} <ffffffff8013fa41>{run_timer_softirq+356} <ffffffff8013c0fc>{__do_softirq+88} <ffffffff8013c1a5>{do_softirq+49} <ffffffff80110ac5>{apic_timer_interrupt+133} <EOI> <ffffffff80241327>{serial_in+83} <ffffffff80243495>{serial8250_console_write+236} <ffffffff80137944>{__call_console_drivers+68} <ffffffff80137bb1>{release_console_sem+276} <ffffffff80137e3c>{vprintk+498} <ffffffff80137ee6>{printk+141} <ffffffff801f6e67>{vgacon_cursor+0} <ffffffff801f6e67>{vgacon_cursor+0} <ffffffff801f6e67>{vgacon_cursor+0} <ffffffff804e8fc5>{setup_boot_APIC_clock+73} <ffffffff804e88fc>{smp_prepare_cpus+3039} <ffffffff8018a2b1>{__pollwait+0} <ffffffff8018aa96>{sys_select+820} <ffffffff8010c3a2>{init+70} <ffffffff80110e17>{child_rip+8} <ffffffff8010c35c>{init+0} <ffffffff80110e0f>{child_rip+0} Code: 0f 0b fd fe 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1d 03 00 RIP <ffffffff803062a7>{_spin_lock_irqsave+40} RSP <ffffffff8044ee58> <0>Kernel panic - not syncing: Oops
Do you have a trace from a Red Hat kernel?
Sorry I missed your question somehow. Here is a trace from an older RH kernel, but we are still seeing this even as recently as 2.6.9-34. And now that more Stratus developers are using the 64-bit platform, we are seeing this bug much more frequently. Some are saying it happens almost every other boot. Anyway, here's the trace: 01-27 09:44:46 ^M----------- [cut here ] --------- [please bite here ] --------- 01-27 09:44:46 ^MKernel BUG at spinlock:118 01-27 09:44:46 ^Minvalid operand: 0000 [1] SMP 01-27 09:44:46 ^MCPU 0 01-27 09:44:46 ^MModules linked in: 01-27 09:44:46 ^MPid: 1, comm: swapper Not tainted 2.6.9-22.ELsmp 01-27 09:44:46 ^MRIP: 0010:[<ffffffff80303e18>] <ffffffff80303e18>{_spin_lock_irqsave+40} 01-27 09:44:46 ^MRSP: 0000:ffffffff8044a758 EFLAGS: 00010096 01-27 09:44:46 ^MRAX: 0000000000000016 RBX: 0000000000000000 RCX: 0000000000020000 01-27 09:44:46 ^MRDX: 00000000000018fd RSI: 0000000000000046 RDI: ffffffff803d7960 01-27 09:44:46 ^MRBP: 0000010005d499e0 R08: 00000000fffffffe R09: 0000000000000000 01-27 09:44:46 ^MR10: 0000000000000000 R11: 0000000000000000 R12: ffffffff803f2500 01-27 09:44:46 ^MR13: ffffffff8044a7a8 R14: 0000000000000206 R15: 0000000000000008 01-27 09:44:46 ^MFS: 0000000000000000(0000) GS:ffffffff804d3100(0000) knlGS:0000000000000000 01-27 09:44:46 ^MCS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 01-27 09:44:46 ^MCR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 01-27 09:44:46 ^MProcess swapper (pid: 1, threadinfo 000001000aa50000, task 000001009ff067f0) 01-27 09:44:46 ^MStack: ffffffff803f2500 0000000000000246 0000000000000000 ffffffff80146204 01-27 09:44:46 ^M 0000000000000000 ffffffff80231ca9 ffffffff8044a7a8 ffffffff801462a2 01-27 09:44:46 ^M ffffffff8048fb60 ffffffff8013ee45 01-27 09:44:46 ^MCall Trace:<IRQ> <ffffffff80146204>{__queue_work+16} <ffffffff80231ca9>{blank_screen_t+0} 01-27 09:44:46 ^M <ffffffff801462a2>{queue_work+80} <ffffffff8013ee45>{run_timer_softirq+356} 01-27 09:44:46 ^M <ffffffff8013b6b8>{__do_softirq+88} <ffffffff8013b761>{do_softirq+49} 01-27 09:44:46 ^M <ffffffff80110951>{apic_timer_interrupt+133} <EOI> <ffffffff8023f69f>{serial_in+83} 01-27 09:44:46 ^M <ffffffff80241792>{serial8250_console_write+113} <ffffffff80136f24>{__call_console_drivers+68} 01-27 09:44:46 ^M <ffffffff80137191>{release_console_sem+276} <ffffffff8013741c>{vprintk+498} 01-27 09:44:46 ^M <ffffffff801374c6>{printk+141} <ffffffff801f51d3>{vgacon_cursor+0} 01-27 09:44:46 ^M <ffffffff801f51d3>{vgacon_cursor+0} <ffffffff801f51d3>{vgacon_cursor+0} 01-27 09:44:46 ^M <ffffffff804e2e83>{setup_boot_APIC_clock+73} <ffffffff804e27ba>{smp_prepare_cpus+3039} 01-27 09:44:46 ^M <ffffffff80188bc5>{__pollwait+0} <ffffffff801893aa>{sys_select+820} 01-27 09:44:46 ^M <ffffffff8010c257>{init+70} <ffffffff80110ca3>{child_rip+8} 01-27 09:44:46 ^M <ffffffff8010c211>{init+0} <ffffffff80110c9b>{child_rip+0} 01-27 09:44:46 ^M 01-27 09:44:46 ^M 01-27 09:44:46 ^MCode: 0f 0b 2e d6 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 17 03 00 01-27 09:44:46 ^MRIP <ffffffff80303e18>{_spin_lock_irqsave+40} RSP <ffffffff8044a758> 01-27 09:44:46 ^M <0>Kernel panic - not syncing: Oops
Just a quick note: This bug appears to be a slightly different manifestation of BZ 165498, which was never fully resolved.
this looks like what we need here: http://marc.theaimsgroup.com/?l=git-commits-head&m=111918275525797&w=2
It sure looks like the right thing. I sent that upstream patch to Stratus to get some run time on the -34 kernel.
great. thanks.
*** This bug has been marked as a duplicate of 184523 ***
I'm not authorized to see bugzilla 184523. Can you give us a hint about what it says? Our customers are still seeing this with the -34 kernel.
errata tool clean up, add to U4 CANFIX list for tracking purposes.