Bug 179334 - kernel boot can Oops in work queue code when console blanks
Summary: kernel boot can Oops in work queue code when console blanks
Keywords:
Status: CLOSED DUPLICATE of bug 184523
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jim Paradis
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 181409
TreeView+ depends on / blocked
 
Reported: 2006-01-30 15:17 UTC by Kimball Murray
Modified: 2013-08-06 01:17 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-14 14:56:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0575 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Description Kimball Murray 2006-01-30 15:17:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
On very rare occasions, an x86_64 kernel Oopses during boot.  Seems the blank_console_t function calls into a work queue function that references an unallocated spinlock.  The spinlock code causes the Oops.  This has been seen on kernels 2.6.9-22, 2.6.9-28, and 2.6.9-29.  Oops output is included in this report.

Version-Release number of selected component (if applicable):
kernel-2.6.9-22 and others

How reproducible:
Sometimes

Steps to Reproduce:
1. Boot the machine
2. Keep trying.
3.
  

Actual Results:  About 1 in hundreds of tries a kernel Oops occurs during boot.

Expected Results:  No oops.

Additional info:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at spinlock:118
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.9-29.ELmm_nopent
RIP: 0010:[<ffffffff803062a7>] <ffffffff803062a7>{_spin_lock_irqsave+40}
RSP: 0000:ffffffff8044ee58  EFLAGS: 00010096
RAX: 0000000000000016 RBX: 0000000000000000 RCX: 0000000000020000
RDX: 0000000000001ad9 RSI: 0000000000000046 RDI: ffffffff803dbc20
RBP: 0000010009569ae0 R08: 00000000fffffffe R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff803f67c0
R13: ffffffff8044eea8 R14: 0000000000000206 R15: 0000000000000008
FS:  0000000000000000(0000) GS:ffffffff804d9900(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo 000001000ee70000, task 000001009ff267f0)
Stack: ffffffff803f67c0 0000000000000246 0000000000000000 ffffffff80146e28
       0000000000000000 ffffffff80233931 ffffffff8044eea8 ffffffff80146ec6
       ffffffff80496360 ffffffff8013fa41
Call Trace:<IRQ> <ffffffff80146e28>{__queue_work+16} <ffffffff80233931>{blank_screen_t+0}
       <ffffffff80146ec6>{queue_work+80} <ffffffff8013fa41>{run_timer_softirq+356}
       <ffffffff8013c0fc>{__do_softirq+88} <ffffffff8013c1a5>{do_softirq+49}
       <ffffffff80110ac5>{apic_timer_interrupt+133}  <EOI> <ffffffff80241327>{serial_in+83}
       <ffffffff80243495>{serial8250_console_write+236} <ffffffff80137944>{__call_console_drivers+68}
       <ffffffff80137bb1>{release_console_sem+276} <ffffffff80137e3c>{vprintk+498}
       <ffffffff80137ee6>{printk+141} <ffffffff801f6e67>{vgacon_cursor+0}
       <ffffffff801f6e67>{vgacon_cursor+0} <ffffffff801f6e67>{vgacon_cursor+0}
       <ffffffff804e8fc5>{setup_boot_APIC_clock+73} <ffffffff804e88fc>{smp_prepare_cpus+3039}
       <ffffffff8018a2b1>{__pollwait+0} <ffffffff8018aa96>{sys_select+820}
       <ffffffff8010c3a2>{init+70} <ffffffff80110e17>{child_rip+8}
       <ffffffff8010c35c>{init+0} <ffffffff80110e0f>{child_rip+0}


Code: 0f 0b fd fe 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1d 03 00
RIP <ffffffff803062a7>{_spin_lock_irqsave+40} RSP <ffffffff8044ee58>
 <0>Kernel panic - not syncing: Oops

Comment 1 Jason Baron 2006-01-30 15:24:25 UTC
Do you have a trace from a Red Hat kernel?

Comment 2 Kimball Murray 2006-03-07 18:59:46 UTC
Sorry I missed your question somehow.  Here is a trace from an older RH kernel,
but we are still seeing this even as recently as 2.6.9-34.  And now that more
Stratus developers are using the 64-bit platform, we are seeing this bug much
more frequently.  Some are saying it happens almost every other boot.  Anyway,
here's the trace:

01-27 09:44:46 ^M----------- [cut here ] --------- [please bite here ] ---------
01-27 09:44:46 ^MKernel BUG at spinlock:118
01-27 09:44:46 ^Minvalid operand: 0000 [1] SMP
01-27 09:44:46 ^MCPU 0
01-27 09:44:46 ^MModules linked in:
01-27 09:44:46 ^MPid: 1, comm: swapper Not tainted 2.6.9-22.ELsmp
01-27 09:44:46 ^MRIP: 0010:[<ffffffff80303e18>]
<ffffffff80303e18>{_spin_lock_irqsave+40}
01-27 09:44:46 ^MRSP: 0000:ffffffff8044a758  EFLAGS: 00010096
01-27 09:44:46 ^MRAX: 0000000000000016 RBX: 0000000000000000 RCX: 0000000000020000
01-27 09:44:46 ^MRDX: 00000000000018fd RSI: 0000000000000046 RDI: ffffffff803d7960
01-27 09:44:46 ^MRBP: 0000010005d499e0 R08: 00000000fffffffe R09: 0000000000000000
01-27 09:44:46 ^MR10: 0000000000000000 R11: 0000000000000000 R12: ffffffff803f2500
01-27 09:44:46 ^MR13: ffffffff8044a7a8 R14: 0000000000000206 R15: 0000000000000008
01-27 09:44:46 ^MFS:  0000000000000000(0000) GS:ffffffff804d3100(0000)
knlGS:0000000000000000
01-27 09:44:46 ^MCS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
01-27 09:44:46 ^MCR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
01-27 09:44:46 ^MProcess swapper (pid: 1, threadinfo 000001000aa50000, task
000001009ff067f0)
01-27 09:44:46 ^MStack: ffffffff803f2500 0000000000000246 0000000000000000
ffffffff80146204
01-27 09:44:46 ^M       0000000000000000 ffffffff80231ca9 ffffffff8044a7a8
ffffffff801462a2
01-27 09:44:46 ^M       ffffffff8048fb60 ffffffff8013ee45
01-27 09:44:46 ^MCall Trace:<IRQ> <ffffffff80146204>{__queue_work+16}
<ffffffff80231ca9>{blank_screen_t+0}
01-27 09:44:46 ^M       <ffffffff801462a2>{queue_work+80}
<ffffffff8013ee45>{run_timer_softirq+356}
01-27 09:44:46 ^M       <ffffffff8013b6b8>{__do_softirq+88}
<ffffffff8013b761>{do_softirq+49}
01-27 09:44:46 ^M       <ffffffff80110951>{apic_timer_interrupt+133}  <EOI>
<ffffffff8023f69f>{serial_in+83}
01-27 09:44:46 ^M       <ffffffff80241792>{serial8250_console_write+113}
<ffffffff80136f24>{__call_console_drivers+68}
01-27 09:44:46 ^M       <ffffffff80137191>{release_console_sem+276}
<ffffffff8013741c>{vprintk+498}
01-27 09:44:46 ^M       <ffffffff801374c6>{printk+141}
<ffffffff801f51d3>{vgacon_cursor+0}
01-27 09:44:46 ^M       <ffffffff801f51d3>{vgacon_cursor+0}
<ffffffff801f51d3>{vgacon_cursor+0}
01-27 09:44:46 ^M       <ffffffff804e2e83>{setup_boot_APIC_clock+73}
<ffffffff804e27ba>{smp_prepare_cpus+3039}
01-27 09:44:46 ^M       <ffffffff80188bc5>{__pollwait+0}
<ffffffff801893aa>{sys_select+820}
01-27 09:44:46 ^M       <ffffffff8010c257>{init+70} <ffffffff80110ca3>{child_rip+8}
01-27 09:44:46 ^M       <ffffffff8010c211>{init+0} <ffffffff80110c9b>{child_rip+0}
01-27 09:44:46 ^M
01-27 09:44:46 ^M
01-27 09:44:46 ^MCode: 0f 0b 2e d6 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 17 03 00
01-27 09:44:46 ^MRIP <ffffffff80303e18>{_spin_lock_irqsave+40} RSP
<ffffffff8044a758>
01-27 09:44:46 ^M <0>Kernel panic - not syncing: Oops


Comment 3 Kimball Murray 2006-03-07 19:01:40 UTC
Just a quick note:  This bug appears to be a slightly different manifestation of
BZ 165498, which was never fully resolved.

Comment 5 Jason Baron 2006-03-09 20:44:44 UTC
this looks like what we need here:

http://marc.theaimsgroup.com/?l=git-commits-head&m=111918275525797&w=2

Comment 6 Kimball Murray 2006-03-09 21:48:51 UTC
It sure looks like the right thing.  I sent that upstream patch to Stratus to
get some run time on the -34 kernel.

Comment 7 Jason Baron 2006-03-09 22:02:43 UTC
great. thanks.

Comment 10 Jason Baron 2006-03-14 14:56:14 UTC

*** This bug has been marked as a duplicate of 184523 ***

Comment 11 Dan Carpenter 2006-03-28 00:56:11 UTC
I'm not authorized to see bugzilla 184523.  Can you give us a hint about what it
says?

Our customers are still seeing this with the -34 kernel.



Comment 12 Linda Wang 2006-05-09 21:50:08 UTC
errata tool clean up, add to U4 CANFIX list for tracking purposes.


Note You need to log in before you can comment on or make changes to this bug.