Bug 1365917

Summary: kernel panic at boot - x2apic_cluster_probe+0x33/0x70
Product: [Fedora] Fedora Reporter: Peter Gervase <pgervase>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: awilliam, byodlows, gansalmon, itamar, jforbes, jonathan, kernel-maint, kparal, labbott, madhu.chinakonda, mchehab, pgervase, plautrba, pschindl, robatino
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: RejectedBlocker AcceptedFreezeException
Fixed In Version: kernel-4.8.0-0.rc2.git3.1.fc25 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-22 22:07:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1277284, 1277285    
Attachments:
Description Flags
screen shot showing the panic
none
screen shot showing the new panic message two minutes after the first one none

Description Peter Gervase 2016-08-10 13:48:34 UTC
Created attachment 1189631 [details]
screen shot showing the panic

Description of problem:
Using either of 
kernel-4.8.0-0.rc0.git3.1.fc26.x86_64
kernel-4.8.0-0.rc0.git5.1.fc26.x86_64
I get the kernel panic in the attached screen shot IMAG0874 when I boot up.

At boot, I'll select the kernel and then in just a few seconds, it'll panic. I let it go for two minutes and it then gave the panic in the 0875 picture. I am able to boot to 
kernel-4.8.0-0.rc1.git0.1.fc26.x86_64

I tried booting to init 1, but that didn't work any better. 
No core file gets created.

Version-Release number of selected component (if applicable):
kernel-4.8.0-0.rc0.git3.1.fc26.x86_64
kernel-4.8.0-0.rc0.git5.1.fc26.x86_64

How reproducible:
100%

Steps to Reproduce:
1. reboot to one of those kernels
2.
3.

Actual results:
panic as shown

Expected results:
no panic

Additional info:
https://bugzilla.kernel.org/show_bug.cgi?id=151311 looks close

Comment 1 Peter Gervase 2016-08-10 13:49:58 UTC
Created attachment 1189632 [details]
screen shot showing the new panic message two minutes after the first one

Comment 2 Laura Abbott 2016-08-11 16:18:45 UTC
Can you test the following scratch build? It contains a probable fix from the upstream developers http://koji.fedoraproject.org/koji/taskinfo?taskID=15217136

Comment 3 Petr Lautrbach 2016-08-16 10:55:54 UTC
I saw the same/similar kernel panic problem on my Lenovo x240 with kernel-4.8.0-0.rc1.git3.1.fc25.x86_64

http://koji.fedoraproject.org/koji/taskinfo?taskID=15217136 build fixes it. Thanks!

Comment 4 Laura Abbott 2016-08-16 14:35:54 UTC
*** Bug 1367396 has been marked as a duplicate of this bug. ***

Comment 5 Petr Schindler 2016-08-17 10:15:11 UTC
kernel from koji build linked in comment 3 works for me. System boots normally with it.

I tested it with Fedora 24 with kernel-4.8.0-0.rc0.git3.1.fc25.x86_64 installed and it didn't boot (kernel panic). Then I installed kernel from koji and it booted normally.

Comment 6 Adam Williamson 2016-08-17 19:07:15 UTC
Can you confirm that kernel-4.8.0-0.rc1.git0.1.fc25 does not display this behaviour?

Comment 7 Adam Williamson 2016-08-17 19:48:28 UTC
for blocker / release engineering purposes: labbott states she's certain that kernel-4.8.0-0.rc1.git0.1.fc25 - which is the current 'stable' f25 kernel build, i.e. the one in the 'fedora' repo and which is included in composes - *would* be affected by this bug. That means that if we decide the bug is a blocker, we must find a fix for it before we can ship Alpha. But, she and jforbes also believe this is fixed in upstream kernel by commit d52c0569bab4edc888832df44dc7ac28517134f6 , and that furthermore that means the bug should be fixed by these Fedora builds:

f25: http://koji.fedoraproject.org/koji/buildinfo?buildID=792279 (kernel-4.8.0-0.rc2.git1.1.fc25)
Rawhide: http://koji.fedoraproject.org/koji/buildinfo?buildID=792280 (kernel-4.8.0-0.rc2.git1.1.fc26)

that build is not currently submitted as an update for F25. It would be good if reporters could confirm the fix.

Comment 8 Adam Williamson 2016-08-17 19:54:50 UTC
labbott also states she'd vote -1 blocker / +1 FE for this bug, given the range of hardware affected. jforbes says "1365917 could theoretically impact any modern intel machine", the upstream commit can be seen at https://lkml.org/lkml/2016/8/11/516 , describing the issue, if anyone feels up to evaluating its impact themselves. "any modern intel machine" is quite scary to me, I might be more inclined to go +1 blocker for this one, I'm definitely +1 FE.

Comment 9 Justin M. Forbes 2016-08-17 20:33:10 UTC
To clarify the "Any modern intel machine" x2apic was introduced with nehalem, so about 6 years ago. It can also be "opted out" of by firmware, and frequently is. I don't know the percentages of machines that do or don't opt out, I know by a quick look at 3 machines here, 2 have it turned off, 1 has it turned on. You can check by looking at a dmesg after boot, you will either see "x2apic enabled" or "DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit" with instructions on how to override the opt out. A quick google search shows that several people have seen this bug, but it is still hard to determine because no one shipped a kernel to masses of users with the bug.

Comment 10 Adam Williamson 2016-08-17 20:34:00 UTC
Bit more discussion about the range of hardware likely affected by this:

<jwb> jforbes: eh... i won't disagree but that might be stretching it
<jforbes> jwb: theoretically. x2apic came in with nahalem, and it is basically a race condition with CPU state change
 realistically it is probably a smaller subset, but a quick google search says it is non trivial
<jwb> jforbes: yeah, but i thought there was a firmware component to x2apic support too
 i might be thinking of something else
<jforbes> jwb: there is, thus the theoretical part
<jwb> right.  so the stretch is that most laptop class hardware doesn't have the firmware bits for x2apic.  at least not that i've seen
 but desktop/larger servers are certainly a possibility
 now if we only could tell for certainty what most Fedora users have for machines.  IN A WORLD
<jforbes> Well, that would certainly be nice
 only 1 out of 3 machines here has it enabled
 I could power on and check others I suppose
 But even in the ones that disable by default, it can be overridden

Comment 11 Peter Gervase 2016-08-17 20:56:59 UTC
For those that have dep issues installing:
$ sudo rpm -ivh kernel-4.8.0-0.rc2.git1.1.fc26.x86_64.rpm                                  
error: Failed dependencies:
        kernel-core-uname-r = 4.8.0-0.rc2.git1.1.fc26.x86_64 is needed by kernel-4.8.0-0.rc2.git1.1.fc26.x86_64
        kernel-modules-uname-r = 4.8.0-0.rc2.git1.1.fc26.x86_64 is needed by kernel-4.8.0-0.rc2.git1.1.fc26.x86_64

I made
https://bugzilla.redhat.com/show_bug.cgi?id=1367929
to clean up the dep checking - "uname -r" not getting parsed.

I'll test booting to that rc2 kernel...

Comment 12 Adam Williamson 2016-08-17 21:03:16 UTC
er...you're reading that wrong. you have to install at least the kernel, kernel-core and kernel-modules packages when manually installing a kernel build. The package called 'kernel' is basically just a metapackage and doesn't contain anything. The actual kernel is in 'kernel-core', the modules are in 'kernel-modules'. You may also need 'kernel-modules-extra' depending on your hardware.

Comment 13 Peter Gervase 2016-08-17 21:08:09 UTC
Right, you need all three, but the error shouldn't say "uname-r" in the failed deps. kernel-core-4.8.0-0.rc2.git1.1.fc26.x86_64 and kernel-modules-4.8.0-0.rc2.git1.1.fc26.x86_64 are what should be specified, not "kernel-core-uname-r" or "kernel-modules-uname-r".

$ sudo rpm -ivh kernel-4.8.0-0.rc2.git1.1.fc26.x86_64.rpm kernel-core-4.8.0-0.rc2.git1.1.fc26.x86_64.rpm kernel-modules-4.8.0-0.rc2.git1.1.fc26.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-core-4.8.0-0.rc2.git1.1.fc################################# [ 33%]
   2:kernel-modules-4.8.0-0.rc2.git1.1################################# [ 67%]
   3:kernel-4.8.0-0.rc2.git1.1.fc26   ################################# [100%]

Comment 14 Adam Williamson 2016-08-17 21:11:46 UTC
nah, the Provides: are explicitly named that way in the spec, the spec clearly doesn't expect the 'uname-r' to be interpreted as a command:

http://pkgs.fedoraproject.org/cgit/rpms/kernel.git/tree/kernel.spec#n633
http://pkgs.fedoraproject.org/cgit/rpms/kernel.git/tree/kernel.spec#n824
http://pkgs.fedoraproject.org/cgit/rpms/kernel.git/tree/kernel.spec#n847

etc. I dunno why the kernel team decided to use those names, but it's a conscious choice.

Comment 15 Stephen Gallagher 2016-08-17 21:17:32 UTC
Per Paul Whalen: "adding 'nox2apic' (on Fedora-25-20160807.n.0) got the installer booting on an x220 laptop".

Given that there's a relatively straightforward workaround on the kernel boot command line, I'm inclined to say -1 blocker, +1 FE here.

Comment 16 Adam Williamson 2016-08-18 18:56:53 UTC
Discussed at 2016-08-18 go/no-go meeting, functioning as a blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-meeting/2016-08-18/f25-alpha-go_no_go-meeting.2016-08-18-17.00.html . Given our best estimate as to the range of hardware affected, and on the basis there's a simple documentable workaround, we decided to reject it as an Alpha blocker, but accept it as a freeze exception issue.

Comment 17 Fedora Update System 2016-08-19 02:09:40 UTC
kernel-4.8.0-0.rc2.git2.1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8

Comment 18 Petr Schindler 2016-08-19 07:32:36 UTC
I can confirm that with 'nox2apic' I can boot (installer and installed system).

Comment 19 Fedora Update System 2016-08-19 16:50:38 UTC
kernel-4.8.0-0.rc2.git2.1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8

Comment 20 Fedora Update System 2016-08-19 21:53:31 UTC
kernel-4.8.0-0.rc2.git2.1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8

Comment 21 Fedora Update System 2016-08-19 22:08:21 UTC
kernel-4.8.0-0.rc2.git2.1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8

Comment 22 Fedora Update System 2016-08-19 22:11:14 UTC
kernel-4.8.0-0.rc2.git2.1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8

Comment 23 Fedora Update System 2016-08-19 22:15:48 UTC
kernel-4.8.0-0.rc2.git2.1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8

Comment 24 Fedora Update System 2016-08-19 22:19:25 UTC
kernel-4.8.0-0.rc2.git2.1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8

Comment 25 Fedora Update System 2016-08-20 18:50:38 UTC
kernel-4.8.0-0.rc2.git3.1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8

Comment 26 Fedora Update System 2016-08-22 22:07:45 UTC
kernel-4.8.0-0.rc2.git3.1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 27 Petr Schindler 2016-08-23 08:36:36 UTC
kernel-4.8.0-0.rc2.git3.1.fc25 really solves problem for me.