Bug 1224764 - Kernel panic on boot - native_smp_prepare_cpus / native_apic_mem_read
Summary: Kernel panic on boot - native_smp_prepare_cpus / native_apic_mem_read
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 22
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-25 14:57 UTC by javiermon
Modified: 2016-02-15 15:36 UTC (History)
9 users (show)

Fixed In Version: 4.1.6-201.fc22
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-15 15:36:31 UTC
Type: Bug


Attachments (Terms of Use)
Picture of kernel panic (1.02 MB, image/jpeg)
2015-05-25 14:57 UTC, javiermon
no flags Details
dmesg taken from working kernel 3.19.7-200 (56.98 KB, text/plain)
2015-05-25 15:00 UTC, javiermon
no flags Details
acpidump (182.72 KB, text/plain)
2015-06-02 12:24 UTC, javiermon
no flags Details
[1/2] backtrace of crash with lpj=2195008 loglevel=7 bootdelay=500 (903.28 KB, image/jpeg)
2015-06-06 11:58 UTC, javiermon
no flags Details
[2/2] backtrace of crash with lpj=2195008 loglevel=7 bootdelay=500 (1.07 MB, image/jpeg)
2015-06-06 11:59 UTC, javiermon
no flags Details
[1/2] Kernel msgs before backtrace (148.15 KB, image/jpeg)
2015-07-22 18:23 UTC, javiermon
no flags Details
[2/2] Kernel msgs before backtrace (139.32 KB, image/jpeg)
2015-07-22 18:24 UTC, javiermon
no flags Details
possible fix for apic crash (687 bytes, application/mbox)
2015-08-17 22:52 UTC, Laura Abbott
no flags Details
More kernel panic error msgs. (998.61 KB, image/jpeg)
2015-08-18 20:43 UTC, javiermon
no flags Details
dmesg from 4.1.6 patched, using param nox2apic and booting with local x2apic enabled on bios. (60.99 KB, text/plain)
2015-08-21 11:55 UTC, javiermon
no flags Details
Proposed fix (924 bytes, patch)
2015-08-21 21:11 UTC, Thomas Gleixner
no flags Details | Diff
config for patched 4.1.6 (152.53 KB, text/x-mpsub)
2015-08-22 08:34 UTC, javiermon
no flags Details
dmesg from 4.1.6 with patch 2 booting with local x2apic enabled on bios. (61.44 KB, text/plain)
2015-08-22 14:28 UTC, javiermon
no flags Details
config from 4.1.6 with patch 2. (152.48 KB, text/x-mpsub)
2015-08-22 14:29 UTC, javiermon
no flags Details

Description javiermon 2015-05-25 14:57:55 UTC
Created attachment 1029498 [details]
Picture of kernel panic

Description of problem:
I've upgraded from fedora 21 to 22 (x86_64) and the kernel 4.0.4 doesn't boot, it crashes in a kernel panic that includes the following: 

native_apic_mem_read+0x3/0x10

Previous kernel (kernel-core-3.19.7-200.fc21.x86_64) booted fine.

Version-Release number of selected component (if applicable):
name        : kernel-core
Arch        : x86_64
Epoch       : 0
Version     : 4.0.4
Release     : 301.fc22


How reproducible:
Always.

Additional info:
I've booted with noefi, nox2apic, acpi_rsdp=APIC but same results.

Thanks,

Comment 1 javiermon 2015-05-25 15:00:53 UTC
Created attachment 1029500 [details]
dmesg taken from working kernel 3.19.7-200

Comment 2 javiermon 2015-06-02 12:24:42 UTC
Created attachment 1033734 [details]
acpidump

Comment 3 Laura Abbott 2015-06-02 22:29:45 UTC
Can you try following the instructions at https://fedoraproject.org/wiki/Kernel/EarlyDebugging to get more information? a bit more context would be be helpful.

Comment 4 javiermon 2015-06-06 11:58:43 UTC
Created attachment 1035607 [details]
[1/2] backtrace of crash with lpj=2195008 loglevel=7 bootdelay=500

Booted the problematic kernel with lpj=2195008 loglevel=7 bootdelay=500 and this is all I get.

Comment 5 javiermon 2015-06-06 11:59:30 UTC
Created attachment 1035608 [details]
[2/2] backtrace of crash with lpj=2195008 loglevel=7 bootdelay=500

Comment 6 javiermon 2015-06-06 12:00:45 UTC
With those additional kernel parameters, the crash is shown very quickly, so I guess this happens very early on boot.

Comment 7 javiermon 2015-06-26 21:38:29 UTC
Tested kernel-4.0.5-300.fc22.x86_64 and the issue still remains.

Comment 8 javiermon 2015-07-21 22:41:27 UTC
I've tested kernel-4.2.0-0.rc3.git0.1.fc24.x86_64.rpm from rawhide in this box (Fedora 22) and got the same results (panic).

Please let me know if there's anything else I can test/provide

Thanks,

Comment 9 javiermon 2015-07-22 18:22:43 UTC
I've tried to boot again kernel 4.0.8 with lpj=2195008 loglevel=7 bootdelay=100000 but the previous log messages before the backtrace are printed way to fast to read. I could only get an extremely blurry picture with my phone's video recorder. If there's any way to get the kernel to print this messages so they can be read properly?

Thanks,

Comment 10 javiermon 2015-07-22 18:23:49 UTC
Created attachment 1054976 [details]
[1/2] Kernel msgs before backtrace

Comment 11 javiermon 2015-07-22 18:24:09 UTC
Created attachment 1054977 [details]
[2/2] Kernel msgs before backtrace

Comment 12 Laura Abbott 2015-07-23 16:44:47 UTC
You can try the (experimental) scripts I wrote to do a bisect between the f21 kernel you were using and 4.0.4. This will identify which commit is actually breaking things for you. Please see https://pagure.io/fedbisect

Comment 13 javiermon 2015-08-02 10:41:42 UTC
Hi

I tested your scripts to bisect between kernels. I had some problems with the Makefile in which the script tried to automerge a couple of times. When this happened, I ran the script again. After fixing that the process stopped here:

[root@zotac fedora]# ./fedbisect.sh good
Makefile: needs merge
Makefile: needs merge
Makefile: unmerged (eb4eca56843a9fc205bfefed40c11927542ec368)
Makefile: unmerged (ef748e17702f5109bf2678fb57f7929ef411d938)
Makefile: unmerged (28126de3118a1337f9f83b94b0812ec2058a64fa)
fatal: git-write-tree: error building trees
Cannot save the current index state
Makefile: needs merge
error: you need to resolve your current index first
829a3ada9cc7d4c30fa61f8033403fb6c8f8092a is the first bad commit
commit 829a3ada9cc7d4c30fa61f8033403fb6c8f8092a
Author: Jesse Gross <jesse@nicira.com>
Date:   Fri Jan 2 18:26:03 2015 -0800

    geneve: Simplify locking.
    
    The existing Geneve locking scheme was pulled over directly from
    VXLAN. However, VXLAN has a number of built in mechanisms which make
    the locking more complex and are unlikely to be necessary with Geneve.
    This simplifies the locking to use a basic scheme of a mutex
    when doing updates plus RCU on receive.
    
    In addition to making the code easier to read, this also avoids the
    possibility of a race when creating or destroying sockets since
    UDP sockets and the list of Geneve sockets are protected by different
    locks. After this change, the entire operation is atomic.
    
    Signed-off-by: Jesse Gross <jesse@nicira.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 ae876f8b2255f74b093bc55339356c8e1831754c a59803cc74a79fda347cbbd5a256edcf7a898af8 M      include
:040000 040000 3ae4afd4d2b076ee7f268ce05164b93b984992e8 ddaa34292d988c43eb241e4436ad31f1a6e50b57 M      net
# first bad commit: [829a3ada9cc7d4c30fa61f8033403fb6c8f8092a] geneve: Simplify locking.
Found your commit!

Please let me know if this makes sense or if the conflicts I had have messed the bisect process.

Comment 14 javiermon 2015-08-04 02:37:46 UTC
Hi again,

I tested your scripts again to bisect between kernels because I think I messed up the first time I ran it with the conflicts I mentioned and this time It
reached a different bad commit which seems to make more sense:

./fedbisect.sh start 3.19.8-200.fc21 4.0.0-1.fc22

[...]

./fedbisect.sh bad
No local changes to save
659006bf3ae37a08706907ce1a36ddf57c9131d2 is the first bad commit
commit 659006bf3ae37a08706907ce1a36ddf57c9131d2
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Jan 15 21:22:26 2015 +0000

    x86/x2apic: Split enable and setup function

    enable_x2apic() is a convoluted unreadable mess because it is used for
    both enablement in early boot and for setup in cpu_init().

    Split the code into x2apic_enable() for enablement and x2apic_setup()
    for setup of (secondary cpus). Make use of the new state tracking to
    simplify the logic.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Jiang Liu <jiang.liu@linux.intel.com>
    Cc: Joerg Roedel <joro@8bytes.org>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Link: http://lkml.kernel.org/r/20150115211703.129287153@linutronix.de
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

:040000 040000 d14acf68224b6524568662dba1c3df4a5d4e8e46 979ea61c8245c1a1c47f14179f31cb96619e9357 M      arch
# first bad commit: [659006bf3ae37a08706907ce1a36ddf57c9131d2] x86/x2apic: Split enable and setup function
Found your commit!

Regarding the conflicts I mentioned, this usually happened in the kernel Makefile:

javier@zotac ~ % git diff
diff --git a/Makefile b/Makefile
index e41a335..4a7be84 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
VERSION = 3
PATCHLEVEL = 19
SUBLEVEL = 0
-EXTRAVERSION = -rc4
+EXTRAVERSION = -fedbisect-1
NAME = Diseased Newt

# *DOCUMENTATION*

Which I fixed with:
javier@zotac ~ % git checkout -- ../Makefile

I guess restarting the process by running again the ./fedbisect.sh <good|bad> script messed the first bisect attempt.

Please let me know if you need any other thing for me to test.

Thanks.

Comment 15 javiermon 2015-08-04 07:40:59 UTC
Now that it's clearer why the kernel crashes, I changed a bios setting that says Local x2apic to disabled and managed to boot kernel 4.0.8, but I'm not sure what I'm missing. Still, I would like to continue debugging this problem since this cpu enables that setting by default everytime it crashes and loads bios default settings.

Comment 16 Laura Abbott 2015-08-07 23:48:54 UTC
Thanks for doing the bisect with the experimental scripts. I need to make the script work across rcs as well. The change you found makes sense and it's good to know that changing a BIOS setting works as well. I'll send a report upstream.

Comment 17 Laura Abbott 2015-08-07 23:57:26 UTC
Actually Before I send an e-mail out, can you try the latest kernel and verify that it is still broken?

Comment 18 javiermon 2015-08-08 07:59:43 UTC
I've upgraded to lastest available kernel in f22: kernel-core-4.1.3-201.fc22.x86_64.

Disabling the x2apic in the bios boots and enabling it crashes with the same trace as before, so yes, it's still broken.

Thanks.

Comment 19 Laura Abbott 2015-08-17 22:52:23 UTC
Created attachment 1064089 [details]
possible fix for apic crash

Can you try the following patch from tglx?

Comment 20 javiermon 2015-08-18 20:39:50 UTC
Hi

The patch doesn't work.

I've compiled kernel 4.1.6 with the patch following the instructions found here:
https://fedoraproject.org/wiki/Building_a_custom_kernel

In order to apply the patch, I added it to the kernel.spec file in the standalone patches section:

# Standalone patches
Patch512: 0001-Test-patch-from-tglx.patch

and where the patches are applied:
# Misc fixes 
ApplyPatch 0001-Test-patch-from-tglx.patch

And built the kernel. I also checked that the kernel-4.1.fc22/linux-4.1.6-201.fc22.x86_64/arch/x86/kernel/apic/apic.c file included the patch.

I rebooted, enabled local x2apic in the bios and booted kernel 4.1.6 and the crash happened.

Comment 21 javiermon 2015-08-18 20:43:24 UTC
Created attachment 1064487 [details]
More kernel panic error msgs.

One thing I forgot to mention when doing the bisect is that one of the kernels that crashed included the following error:

Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (255 vs 0).

See the attached screenshot for more details.

Comment 22 Laura Abbott 2015-08-19 01:27:13 UTC
Thanks for testing, did the patch have any effect at all or was it still the same crash? That screen shot you showed during the bisect is useful as well (at least I think so, I'll have to pass that along upstream)

Comment 23 javiermon 2015-08-19 05:42:39 UTC
The crash with the patched kernel appeared to be the same one, so I think it did not have any effect.

Comment 24 Thomas Gleixner 2015-08-20 21:03:04 UTC
Can you please boot with that patch applied and add the following on the kernel command line:

nox2apic

Thanks,

        tglx

Comment 25 Thomas Gleixner 2015-08-20 21:09:30 UTC
From the picture I'm seeing its a zotac zbox. Some of them have a pin header for connecting a serial port. Does yours have one by chance?

Comment 26 javiermon 2015-08-20 21:20:57 UTC
Hi

I've booted with 4.1.6 with the patch applied and nox2apic and the machine booted fine without crashing.

Thanks,

Comment 27 javiermon 2015-08-20 21:53:07 UTC
You're right, this is a zotac zbox ID82. Unfortunately, I've checked the motherboard and the COM 1 doesn't include the pin header.

Comment 28 Thomas Gleixner 2015-08-21 10:42:39 UTC
> I've booted with 4.1.6 with the patch applied and nox2apic and the machine
> booted fine without crashing.

Can you please provide the dmesg of that boot?

Comment 29 javiermon 2015-08-21 11:55:27 UTC
Created attachment 1065560 [details]
dmesg from 4.1.6 patched, using param nox2apic and booting with local x2apic enabled on bios.

Comment 30 Thomas Gleixner 2015-08-21 20:39:28 UTC
Can you upload your .config as well, please?

Comment 31 Thomas Gleixner 2015-08-21 21:11:30 UTC
Created attachment 1065731 [details]
Proposed fix

Can you please replace the first patch by this one. I think I identified the reason for the wreckage. Remove nox2apic from the command line again.

Thanks,

       tglx

Comment 32 javiermon 2015-08-22 08:34:00 UTC
Created attachment 1065794 [details]
config for patched 4.1.6

Comment 33 javiermon 2015-08-22 14:28:11 UTC
Hi again,

The new patch works and the computer boots with local x2apic enabled on bios and without nox2apic kernel parameter.

Thanks!

Comment 34 javiermon 2015-08-22 14:28:59 UTC
Created attachment 1065829 [details]
dmesg from 4.1.6 with patch 2 booting with local x2apic enabled on bios.

Comment 35 javiermon 2015-08-22 14:29:36 UTC
Created attachment 1065830 [details]
config from 4.1.6 with patch 2.

Comment 36 Thomas Gleixner 2015-08-25 19:52:58 UTC
Fix hit Linus tree and is tagged for stable

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a57e456a7b28431b55e407e5ab78ebd5b378d19e

Javier, thanks for your help!

Comment 37 Fedora Update System 2015-09-01 14:59:55 UTC
kernel-4.2.0-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782

Comment 38 Fedora Update System 2015-09-01 20:22:05 UTC
kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782

Comment 39 javiermon 2015-09-01 20:32:05 UTC
Hi

Is this going to be backported to Fedora 22?

Thanks,

Comment 40 Laura Abbott 2015-09-01 21:17:23 UTC
Yes, it's in the tree. The next time a build happens it will be released.

Comment 41 Fedora Update System 2015-09-04 03:23:16 UTC
kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 42 Fedora Update System 2015-09-05 01:03:13 UTC
kernel-4.1.6-201.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-15130

Comment 43 Fedora Update System 2015-09-06 18:52:08 UTC
kernel-4.1.6-201.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-15130

Comment 44 Fedora Update System 2015-09-11 17:21:42 UTC
kernel-4.1.6-201.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 45 Fedora Update System 2015-09-15 17:36:09 UTC
kernel-4.1.7-100.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-15933

Comment 46 Fedora Update System 2015-09-17 01:02:38 UTC
kernel-4.1.7-100.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-15933

Comment 47 Fedora Update System 2015-09-23 00:20:46 UTC
kernel-4.1.7-100.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.