Bug 453677
| Summary: | nmi_watchdog=2 crashes the RT kernel on boot up | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Steven Rostedt <srostedt> | ||||||
| Component: | realtime-kernel | Assignee: | Clark Williams <williams> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 1.0 | CC: | anderson, bhu, davids, jcm, pzijlstr | ||||||
| Target Milestone: | 1.0.1 | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2008-08-26 19:57:18 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Steven Rostedt
2008-07-01 20:37:03 UTC
we dropped that patch back in the 2.6.24-31 kernel. I'm thinking its something in the IOAPIC munging we do. Looking at that now and adding jcm to the CC... Created attachment 310832 [details]
boot log of dual Opteron running GA (2.6.24.7-65.el5rt) kernel
Note that this system does not claim that the box is UP (like rostedt's), so
it's not that the kernel is mis-understanding the number of cores or type of
system
Ok, I'm wrong; I missed the "SMP alternatives: switching to UP code" in the log bisected this down to two patches: hotplug-page_alloc.patch hotplug-smp-bootstrap.patch The panic originally reported is actually caused by hotplug-smp-bootstrap.patch; the other patch by itself causes a dual Opteron system to lock-up booting the second processor (which is later than the panic site above). Still working on what nmi_watchdog=2 does to cause problems with the code modified by hotplug-smp-bootstrap.patch. works fine on 32-bit kernel (2.6.24.7-70) Very odd. Both stack traces (Steven's original) and mine, point to the function
__delay() in arch/x86/lib/delay_64.c at the following line of code:
if (unlikely(cpu != smp_processor_id())) {
If I decoded the macro expansion correctly for a 64-bit SMP kernel,
smp_processor_id() expands to raw_smp_processor_id() which turns into
read_pda(cpunumber). If it is in fact this asm function that is failing, my
guess is that something has hosed the per-cpu data. Think I'll go poke at that a
bit...
FYI, here's what I saw on a 2.6.24.7-72.el5rt kernel booting with
nmi_watchdog=2, where it died in the same location as yours in
__delay+0x2f, although the actual RIP is slightly different in my
kernel than yours:
SMP alternatives: switching to UP code
ACPI: Core revision 20070126
Unable to handle kernel paging request at fffffffffffffff8 RIP:
[<ffffffff8113c349>] __delay+0x2f/0xb6
PGD 203067 PUD 204067 PMD 0
Oops: 0002 [1] PREEMPT SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24.7-72.el5rt #1
RIP: 0010:[<ffffffff8113c349>] [<ffffffff8113c349>] __delay+0x2f/0xb6
RSP: 0000:ffff81003fa13d90 EFLAGS: 00010246
RAX: 00000000714d6290 RBX: 00000000002aa61d RCX: 0000000000000000
RDX: 0000000000000020 RSI: 0000000000000046 RDI: 00000000002aa61d
RBP: ffff81003fa13db0 R08: 0000000000000000 R09: 000000000000000f
R10: 000000000000000f R11: 0000000000000000 R12: 0000000000176dda
R13: 0000000000000000 R14: 000000007135f41c R15: 00000000ffffffff
FS: 0000000000000000(0000) GS:ffffffff813ef100(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: fffffffffffffff8 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff81003fa12000, task ffff81003fa11480)
Stack: 0000000000000082 0000000000000008 00000000fffb6e69 00000000ffffffff
ffff81003fa13dc0 ffffffff8113c410 ffff81003fa13df0 ffffffff814afd09
ffff81003fa13df0 0000000000000002 0000000000000000 000000000000d800
Call Trace:
[<ffffffff8113c410>] __const_udelay+0x40/0x42
[<ffffffff814afd09>] timer_irq_works+0x2a/0x4f
[<ffffffff814b010e>] setup_IO_APIC+0x3e0/0x8ae
[<ffffffff8101b628>] ? setup_p4_watchdog+0xa8/0xde
[<ffffffff8101b2f4>] ? lapic_watchdog_init+0xe3/0x117
[<ffffffff814aeb2a>] smp_prepare_cpus+0x454/0x48e
[<ffffffff814a4565>] kernel_init+0x69/0x34e
[<ffffffff81036730>] ? finish_task_switch+0x32/0xdc
[<ffffffff8100d048>] child_rip+0xa/0x12
[<ffffffff81168db4>] ? acpi_ds_init_one_object+0x0/0x88
[<ffffffff814a44fc>] ? kernel_init+0x0/0x34e
[<ffffffff8100d03e>] ? child_rip+0x0/0x12
But, what I don't understand, is that the disassembly of __delay()
shows that the exception occurred on a register move (which makes
no sense unless the RIP itself got corrupted?), but __delay+0x2f
is beyond the smp_processor_id() that you reference:
crash> dis __delay -l
/usr/src/debug/kernel-rt-2.6.24.7-72.el5rt/linux-2.6.24.7.x86_64/arch/x86/lib/delay_64.c:
30
0xffffffff8113c31a <__delay>: push %rbp
include/asm/thread_info_64.h: 63
0xffffffff8113c31b <__delay+0x1>: mov %gs:0x10,%rax
/usr/src/debug/kernel-rt-2.6.24.7-72.el5rt/linux-2.6.24.7.x86_64/arch/x86/lib/delay_64.c:
30
0xffffffff8113c324 <__delay+0xa>: mov %rsp,%rbp
0xffffffff8113c327 <__delay+0xd>: push %r14
0xffffffff8113c329 <__delay+0xf>: push %r13
0xffffffff8113c32b <__delay+0x11>: push %r12
0xffffffff8113c32d <__delay+0x13>: push %rbx
0xffffffff8113c32e <__delay+0x14>: mov %rdi,%rbx
/usr/src/debug/kernel-rt-2.6.24.7-72.el5rt/linux-2.6.24.7.x86_64/arch/x86/lib/delay_64.c:
34
0xffffffff8113c331 <__delay+0x17>: incl 0xffffffffffffe044(%rax)
/usr/src/debug/kernel-rt-2.6.24.7-72.el5rt/linux-2.6.24.7.x86_64/arch/x86/lib/delay_64.c:
35
0xffffffff8113c337 <__delay+0x1d>: mov %gs:0x24,%ecx
/usr/src/debug/kernel-rt-2.6.24.7-72.el5rt/linux-2.6.24.7.x86_64/arch/x86/lib/delay_64.c:
36
0xffffffff8113c33f <__delay+0x25>: rdtsc
0xffffffff8113c341 <__delay+0x27>: mov %eax,%r14d
0xffffffff8113c344 <__delay+0x2a>: mov %ecx,%r13d
/usr/src/debug/kernel-rt-2.6.24.7-72.el5rt/linux-2.6.24.7.x86_64/arch/x86/lib/delay_64.c:
38
0xffffffff8113c347 <__delay+0x2d>: rdtsc
/usr/src/debug/kernel-rt-2.6.24.7-72.el5rt/linux-2.6.24.7.x86_64/arch/x86/lib/delay_64.c:
39
0xffffffff8113c349 <__delay+0x2f>: mov %eax,%r12d
0xffffffff8113c34c <__delay+0x32>: sub %r14d,%r12d
0xffffffff8113c34f <__delay+0x35>: cmp %rbx,%r12
0xffffffff8113c352 <__delay+0x38>: jae 0xffffffff8113c3a0 <__delay+0x86>
And here's __delay(), where it appears to be dying at
the instruction immediately following the second rdtcl():
29 void __delay(unsigned long loops)
30 {
31 unsigned bclock, now;
32 int cpu;
33
34 preempt_disable();
35 cpu = smp_processor_id();
36 rdtscl(bclock);
37 for (;;) {
38 rdtscl(now);
39 if ((now - bclock) >= loops)
40 break;
Out of curiousity, how did you pinpoint the exception in
smp_processor_id()?
Dave
I screwed up and looked at the wrong 'if' statement. I went back and caused another panic, ran it through addr2line and got the same line number as you (delay_64.c:39). I probably skipped down while I was looking at the function and decided that function call was problematic, since it's hard to see where there's an opportunity to go wrong in the first if statement. Sigh... I'm not very familiar with x86_64 assembly; is there a memory reference in that mov/sub/cmp block at line 39? They just look like straight register operations to me. I'd say it's almost like an FP got zeroed (maybe something going wrong in a return from interrupt?) and a register+offset local variable reference caused a fault at fffffffffffffff8. I just don't see where that could happen from the assembly. OK, good, so we're equally stumped. From the appearance of things, it looks like the exception occurred during the "mov %eax,%r12d", which doesn't make sense. I was also thinking maybe an interrupt occurred and things weren't restored correctly, but that's just a wild guess on my part, and even then I don't know how it would lead to the page fault seen here. The two nearest memory references are the preceding preempt_enable() and smp_processor_id() calls, well before the exception address, but I still don't see how the "0-minus-8" address (fffffffffffffff8) gets formulated? The GS register does have the correct pointer at exception time: crash> sym ffffffff813ef100 ffffffff813ef100 (D) boot_cpu_pda crash> So I have no explanation, nor have I ever seen such an anomoly like this before. Has to be something to do with the IO-APIC manipulations from the patches in comment #4. I'll go back and look at the hotplug-smp-bootstrap.patch, where I think this manfests. It's a rework of setting up exception stacks during kernel boot. Created attachment 312496 [details]
patch to allocate exception stacks
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0585.html |