Bug 586967
Summary: | RHEL6: x86 32-bit, nmi_watchdog_default() is __init, but called on resume | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Prarit Bhargava <prarit> | ||||||
Component: | kernel | Assignee: | Prarit Bhargava <prarit> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Jan Tluka <jtluka> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 6.0 | CC: | airlied, amarecek, azelinka, bnagendr, emcnabb, frank.arnold, jbroman, joshkayse, jturner, lee, lkundrak, mishu, notting, ptekwork, rvokal, shuang, ypu | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-11-11 16:15:45 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Prarit Bhargava
2010-04-28 15:10:42 UTC
Bhavna alerted us to this -- good catch Bhavna! The issue is that in -19.el6 only the x86_64 case was changed to __cpuinit. All others are left as __init which is causing trouble during CPU hotplug on 32-bit. I'll check the common code to make sure that there aren't any other __init/__cpuinit pitfalls before submitting to RHKL. P. Created attachment 409879 [details]
Initial RHEL6 fix from AMD
Thanks for the patch Frank.
P.
Patch looks good and CONFIG_DEBUG_SECTION_MISMATCH=y didn't show anything else related to this code path. Will post to RHKL shortly. P. Created attachment 409914 [details]
RHEL6 fix for this issue
*** Bug 585003 has been marked as a duplicate of this bug. *** *** Bug 581749 has been marked as a duplicate of this bug. *** *** Bug 582129 has been marked as a duplicate of this bug. *** *** Bug 585766 has been marked as a duplicate of this bug. *** *** Bug 586164 has been marked as a duplicate of this bug. *** *** Bug 586776 has been marked as a duplicate of this bug. *** *** Bug 586830 has been marked as a duplicate of this bug. *** Would this prevent the brightness controller from working after a resume? (In reply to comment #12) > Would this prevent the brightness controller from working after a resume? No brightness controller involved here. With this bug 32-bit shouldn't resume at all. The issue was introduced with 2.6.32-17.el6, x86_64 was fixed with 2.6.32-18.el6, and the fix for all other cases is still pending (persists with 2.6.32-22.el6). Easy way to trigger this issue: $ echo 0 > /sys/devices/system/cpu/cpu1/online $ echo 1 > /sys/devices/system/cpu/cpu1/online <-- box should hang here I triggered the kernel panic following the echo instructions and while it does kernel panic the system does not hang. I am using 2.6.32-19.el6 and my bug was marked as a duplicate of this. The brightness controller does continue to work despite the kernel panic. Should I ask for my bug to be re-opened? Thanks, -josh (In reply to comment #14) > I triggered the kernel panic following the echo instructions and while it does > kernel panic the system does not hang. I am using 2.6.32-19.el6 and my bug was > marked as a duplicate of this. The brightness controller does continue to work > despite the kernel panic. Should I ask for my bug to be re-opened? > > Thanks, > -josh I forgot to mention that when I tested the echo commands I got: # echo 1 > /sys/devices/system/cpu/cpu1/online -bash: echo: write error: Invalid argument (In reply to comment #14) > Should I ask for my bug to be re-opened? No. Your trace looks like the ones in the other duplicates. (In reply to comment #15) > (In reply to comment #14) > > I triggered the kernel panic following the echo instructions and while it does > > kernel panic the system does not hang. I am using 2.6.32-19.el6 and my bug was > > marked as a duplicate of this. The brightness controller does continue to work > > despite the kernel panic. Should I ask for my bug to be re-opened? > > > > Thanks, > > -josh > > I forgot to mention that when I tested the echo commands I got: > > # echo 1 > /sys/devices/system/cpu/cpu1/online > -bash: echo: write error: Invalid argument Josh, I see distinct failures. The first is an actual *oops* which hangs the system, and the second is a BUG warning due to scheduling while atomic, which sometimes allows the system to continue executing. I have traced both of these failures to this BZ. The *critical* portion of the BUG warning or the oops are these three lines: [<c0a49479>] ? nmi_cpu_busy+0x0/0x17 [<c080203e>] ? end_local_APIC_setup+0xd3/0xea [<c08018ca>] ? start_secondary+0x102/0x24e end_local_APIC_setup() does NOT call nmi_cpu_busy(). That is the unwinder going a bit crazy trying to determine what function has been called. end_local_APIC_setup() has actually called nmi_watchdog_default() which is __init and is not in the function table. P. To add some testing data to Prarit's explanations: I tried it the suspend/resume way on one of our boxes, which still had the needed bits installed anyway. 1. With a kernel based on 2.6.32-19.el6, including the attached patch * Did an `echo mem > /sys/power/state` * Let the box resume * Looked at the output of dmesg: No failures. 2. With a plain 2.6.32-19.el6 * Did an `echo mem > /sys/power/state` * Let the box resume again * Resulted in a lot of trouble, including following trace: Kernel panic - not syncing: Fatal exception Pid: 0, comm: swapper Tainted: G D 2.6.32-19.el6.i686 #1 Call Trace: [<c08055d5>] ? panic+0x42/0xed [<c0808bfc>] ? oops_end+0xbc/0xd0 [<c080831e>] ? do_int3+0x6e/0x90 [<c0808184>] ? int3+0x30/0x38 [<c0a49479>] ? nmi_cpu_busy+0x0/0x17 [<c080203e>] ? end_local_APIC_setup+0xd3/0xea [<c08018ca>] ? start_secondary+0x102/0x24e Kernel panic - not syncing: Fatal exception Pid: 0, comm: swapper Tainted: G D 2.6.32-19.el6.i686 #1 Call Trace: [<c08055d5>] ? panic+0x42/0xed [<c0808bfc>] ? oops_end+0xbc/0xd0 [<c080831e>] ? do_int3+0x6e/0x90 [<c0808184>] ? int3+0x30/0x38 [<c0a49479>] ? nmi_cpu_busy+0x0/0x17 [<c080203e>] ? end_local_APIC_setup+0xd3/0xea [<c08018ca>] ? start_secondary+0x102/0x24e That's a nice panic :) end_local_APIC_setup(), as mentioned does not call nmi_cpu_busy() and is actually calling nmi_watchdog_default() which resolves to int3 (0xcc). P. Patch(es) available on kernel-2.6.32-24.el6 *** Bug 588663 has been marked as a duplicate of this bug. *** *** Bug 587509 has been marked as a duplicate of this bug. *** *** Bug 590408 has been marked as a duplicate of this bug. *** *** Bug 591138 has been marked as a duplicate of this bug. *** *** Bug 592348 has been marked as a duplicate of this bug. *** Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |