Created attachment 1029498 [details]
Picture of kernel panic
Description of problem:
I've upgraded from fedora 21 to 22 (x86_64) and the kernel 4.0.4 doesn't boot, it crashes in a kernel panic that includes the following:
Previous kernel (kernel-core-3.19.7-200.fc21.x86_64) booted fine.
Version-Release number of selected component (if applicable):
name : kernel-core
Arch : x86_64
Epoch : 0
Version : 4.0.4
Release : 301.fc22
I've booted with noefi, nox2apic, acpi_rsdp=APIC but same results.
Created attachment 1029500 [details]
dmesg taken from working kernel 3.19.7-200
Created attachment 1033734 [details]
Can you try following the instructions at https://fedoraproject.org/wiki/Kernel/EarlyDebugging to get more information? a bit more context would be be helpful.
Created attachment 1035607 [details]
[1/2] backtrace of crash with lpj=2195008 loglevel=7 bootdelay=500
Booted the problematic kernel with lpj=2195008 loglevel=7 bootdelay=500 and this is all I get.
Created attachment 1035608 [details]
[2/2] backtrace of crash with lpj=2195008 loglevel=7 bootdelay=500
With those additional kernel parameters, the crash is shown very quickly, so I guess this happens very early on boot.
Tested kernel-4.0.5-300.fc22.x86_64 and the issue still remains.
I've tested kernel-4.2.0-0.rc3.git0.1.fc24.x86_64.rpm from rawhide in this box (Fedora 22) and got the same results (panic).
Please let me know if there's anything else I can test/provide
I've tried to boot again kernel 4.0.8 with lpj=2195008 loglevel=7 bootdelay=100000 but the previous log messages before the backtrace are printed way to fast to read. I could only get an extremely blurry picture with my phone's video recorder. If there's any way to get the kernel to print this messages so they can be read properly?
Created attachment 1054976 [details]
[1/2] Kernel msgs before backtrace
Created attachment 1054977 [details]
[2/2] Kernel msgs before backtrace
You can try the (experimental) scripts I wrote to do a bisect between the f21 kernel you were using and 4.0.4. This will identify which commit is actually breaking things for you. Please see https://pagure.io/fedbisect
I tested your scripts to bisect between kernels. I had some problems with the Makefile in which the script tried to automerge a couple of times. When this happened, I ran the script again. After fixing that the process stopped here:
[root@zotac fedora]# ./fedbisect.sh good
Makefile: needs merge
Makefile: needs merge
Makefile: unmerged (eb4eca56843a9fc205bfefed40c11927542ec368)
Makefile: unmerged (ef748e17702f5109bf2678fb57f7929ef411d938)
Makefile: unmerged (28126de3118a1337f9f83b94b0812ec2058a64fa)
fatal: git-write-tree: error building trees
Cannot save the current index state
Makefile: needs merge
error: you need to resolve your current index first
829a3ada9cc7d4c30fa61f8033403fb6c8f8092a is the first bad commit
Author: Jesse Gross <email@example.com>
Date: Fri Jan 2 18:26:03 2015 -0800
geneve: Simplify locking.
The existing Geneve locking scheme was pulled over directly from
VXLAN. However, VXLAN has a number of built in mechanisms which make
the locking more complex and are unlikely to be necessary with Geneve.
This simplifies the locking to use a basic scheme of a mutex
when doing updates plus RCU on receive.
In addition to making the code easier to read, this also avoids the
possibility of a race when creating or destroying sockets since
UDP sockets and the list of Geneve sockets are protected by different
locks. After this change, the entire operation is atomic.
Signed-off-by: Jesse Gross <firstname.lastname@example.org>
Signed-off-by: David S. Miller <email@example.com>
:040000 040000 ae876f8b2255f74b093bc55339356c8e1831754c a59803cc74a79fda347cbbd5a256edcf7a898af8 M include
:040000 040000 3ae4afd4d2b076ee7f268ce05164b93b984992e8 ddaa34292d988c43eb241e4436ad31f1a6e50b57 M net
# first bad commit: [829a3ada9cc7d4c30fa61f8033403fb6c8f8092a] geneve: Simplify locking.
Found your commit!
Please let me know if this makes sense or if the conflicts I had have messed the bisect process.
I tested your scripts again to bisect between kernels because I think I messed up the first time I ran it with the conflicts I mentioned and this time It
reached a different bad commit which seems to make more sense:
./fedbisect.sh start 3.19.8-200.fc21 4.0.0-1.fc22
No local changes to save
659006bf3ae37a08706907ce1a36ddf57c9131d2 is the first bad commit
Author: Thomas Gleixner <firstname.lastname@example.org>
Date: Thu Jan 15 21:22:26 2015 +0000
x86/x2apic: Split enable and setup function
enable_x2apic() is a convoluted unreadable mess because it is used for
both enablement in early boot and for setup in cpu_init().
Split the code into x2apic_enable() for enablement and x2apic_setup()
for setup of (secondary cpus). Make use of the new state tracking to
simplify the logic.
Signed-off-by: Thomas Gleixner <email@example.com>
Cc: Jiang Liu <firstname.lastname@example.org>
Cc: Joerg Roedel <email@example.com>
Cc: Tony Luck <firstname.lastname@example.org>
Cc: Borislav Petkov <email@example.com>
Signed-off-by: Thomas Gleixner <firstname.lastname@example.org>
:040000 040000 d14acf68224b6524568662dba1c3df4a5d4e8e46 979ea61c8245c1a1c47f14179f31cb96619e9357 M arch
# first bad commit: [659006bf3ae37a08706907ce1a36ddf57c9131d2] x86/x2apic: Split enable and setup function
Found your commit!
Regarding the conflicts I mentioned, this usually happened in the kernel Makefile:
javier@zotac ~ % git diff
diff --git a/Makefile b/Makefile
index e41a335..4a7be84 100644
@@ -1,7 +1,7 @@
VERSION = 3
PATCHLEVEL = 19
SUBLEVEL = 0
-EXTRAVERSION = -rc4
+EXTRAVERSION = -fedbisect-1
NAME = Diseased Newt
Which I fixed with:
javier@zotac ~ % git checkout -- ../Makefile
I guess restarting the process by running again the ./fedbisect.sh <good|bad> script messed the first bisect attempt.
Please let me know if you need any other thing for me to test.
Now that it's clearer why the kernel crashes, I changed a bios setting that says Local x2apic to disabled and managed to boot kernel 4.0.8, but I'm not sure what I'm missing. Still, I would like to continue debugging this problem since this cpu enables that setting by default everytime it crashes and loads bios default settings.
Thanks for doing the bisect with the experimental scripts. I need to make the script work across rcs as well. The change you found makes sense and it's good to know that changing a BIOS setting works as well. I'll send a report upstream.
Actually Before I send an e-mail out, can you try the latest kernel and verify that it is still broken?
I've upgraded to lastest available kernel in f22: kernel-core-4.1.3-201.fc22.x86_64.
Disabling the x2apic in the bios boots and enabling it crashes with the same trace as before, so yes, it's still broken.
Created attachment 1064089 [details]
possible fix for apic crash
Can you try the following patch from tglx?
The patch doesn't work.
I've compiled kernel 4.1.6 with the patch following the instructions found here:
In order to apply the patch, I added it to the kernel.spec file in the standalone patches section:
# Standalone patches
and where the patches are applied:
# Misc fixes
And built the kernel. I also checked that the kernel-4.1.fc22/linux-4.1.6-201.fc22.x86_64/arch/x86/kernel/apic/apic.c file included the patch.
I rebooted, enabled local x2apic in the bios and booted kernel 4.1.6 and the crash happened.
Created attachment 1064487 [details]
More kernel panic error msgs.
One thing I forgot to mention when doing the bisect is that one of the kernels that crashed included the following error:
Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (255 vs 0).
See the attached screenshot for more details.
Thanks for testing, did the patch have any effect at all or was it still the same crash? That screen shot you showed during the bisect is useful as well (at least I think so, I'll have to pass that along upstream)
The crash with the patched kernel appeared to be the same one, so I think it did not have any effect.
Can you please boot with that patch applied and add the following on the kernel command line:
From the picture I'm seeing its a zotac zbox. Some of them have a pin header for connecting a serial port. Does yours have one by chance?
I've booted with 4.1.6 with the patch applied and nox2apic and the machine booted fine without crashing.
You're right, this is a zotac zbox ID82. Unfortunately, I've checked the motherboard and the COM 1 doesn't include the pin header.
> I've booted with 4.1.6 with the patch applied and nox2apic and the machine
> booted fine without crashing.
Can you please provide the dmesg of that boot?
Created attachment 1065560 [details]
dmesg from 4.1.6 patched, using param nox2apic and booting with local x2apic enabled on bios.
Can you upload your .config as well, please?
Created attachment 1065731 [details]
Can you please replace the first patch by this one. I think I identified the reason for the wreckage. Remove nox2apic from the command line again.
Created attachment 1065794 [details]
config for patched 4.1.6
The new patch works and the computer boots with local x2apic enabled on bios and without nox2apic kernel parameter.
Created attachment 1065829 [details]
dmesg from 4.1.6 with patch 2 booting with local x2apic enabled on bios.
Created attachment 1065830 [details]
config from 4.1.6 with patch 2.
Fix hit Linus tree and is tagged for stable
Javier, thanks for your help!
kernel-4.2.0-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782
kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-14782
Is this going to be backported to Fedora 22?
Yes, it's in the tree. The next time a build happens it will be released.
kernel-4.2.0-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
kernel-4.1.6-201.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-15130
kernel-4.1.6-201.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-15130
kernel-4.1.6-201.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
kernel-4.1.7-100.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-15933
kernel-4.1.7-100.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-15933
kernel-4.1.7-100.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.