Bug 1592976 - Fedora, with Xen hypervisor don't boot with kernel-4.17.5-200.fc28.x86_64
Summary: Fedora, with Xen hypervisor don't boot with kernel-4.17.5-200.fc28.x86_64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-19 18:44 UTC by nucleo
Modified: 2018-07-22 03:03 UTC (History)
24 users (show)

Fixed In Version: kernel-4.17.7-100.fc27 kernel-4.17.7-200.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-22 02:44:17 UTC


Attachments (Terms of Use)

Description nucleo 2018-06-19 18:44:30 UTC
Description of problem:
'Fedora, with Xen hypervisor' don't boot with kernel-4.17.2-200.fc28.x86_64

Version-Release number of selected component (if applicable):
kernel-4.17.2-200.fc28.x86_64

Steps to Reproduce:
Boot 'Fedora, with Xen hypervisor ' kernel-4.17.2-200.fc28.x86_64

Actual results:
After loading /boot/xen-4.10.1.gz kernel do not start and reboots after couple of seconds

Additional info:
kernel 4.16.16-300.fc28 boots fine.

Comment 1 Michael Young 2018-06-19 22:47:37 UTC
There is a similar sounding bug reported on the xen-devel list at
https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg01304.html

Comment 2 nucleo 2018-06-20 16:19:36 UTC
I added nospec_store_bypass_disable to kernel parameters but it also rebooted.

Comment 3 Mitchell Berger 2018-06-29 22:49:27 UTC
Whatever this issue is, much like the one reported in bug 1544963
back in 4.15.2-4.15.4, it appears to also be preventing me from booting
a domU paravm (at least under Xen 4.4.3) that boots fine with 4.16.16.
As with nucleo, nospec_store_bypass_disable didn't solve the problem.
I also tried applying the patch at the end of the thread Michael linked
to and rebuilding the kernel packages, with no apparent effect.

I haven't found any subject lines in the current month's archive of
xen-devel that sounded overly promising, but the number of threads
there is so massive that it's difficult to tell.  I did look at some
of the patches that have been sent off to Linux for Xen in 4.18, and
based on one of those, tried booting the domU with pci=nomsi and with
pci=off, and neither of those helped, either.

Comment 4 Michael Young 2018-07-01 16:13:26 UTC
I got the backtrace below with a Dom0 boot of xen-4.10.1-5.fc28.x86_64 running kernel-4.17.2-200.fc28.x86_64 (which I have also posted to the xen-devel list)
where addr2line -f -e vmlinux ffffffff81062330 gives
native_irq_disable
/usr/src/debug/kernel-4.17.fc28/linux-4.17.2-200.fc28.x86_64/./arch/x86/include/asm/irqflags.h:44
The equivalent boot with kernel-4.16.16-300.fc28.x86_64 works.

(XEN) d0v0 Unhandled general protection fault fault/trap [#13, ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08035557c x86_64/en
try.S#create_bounce_frame+0x135/0x159
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.10.1  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff81062330>]
(XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: 0000000000000246   rbx: 00000000ffffffff   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 00000000ffffffff   rdi: 0000000000000000
(XEN) rbp: 0000000000000000   rsp: ffffffff82203d90   r8:  ffffffff820bb698
(XEN) r9:  ffffffff82203e38   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: ffffffff820bb698   r14: ffffffff82203e38
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000000006e0
(XEN) cr3: 000000001aacf000   cr2: 0000000000000000
(XEN) fsb: 0000000000000000   gsb: ffffffff82731000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff82203d90:
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff81062330
(XEN)    000000010000e030 0000000000010046 ffffffff82203dd8 000000000000e02b
(XEN)    0000000000000246 ffffffff8110e019 0000000000000000 0000000000000246
(XEN)    0000000000000000 0000000000000000 ffffffff820a6cd8 ffffffff82203e88
(XEN)    ffffffff82739000 8000000000000061 0000000000000000 0000000000000000
(XEN)    ffffffff8110ecb6 0000000000000008 ffffffff82203e98 ffffffff82203e58
(XEN)    0000000000000000 0000000000000000 8000000000000161 0000000000000100
(XEN)    fffffffffffffeff 0000000000000000 0000000000000000 ffffffff82203ef0
(XEN)    ffffffff810ac990 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 8000000000000161 0000000000000100
(XEN)    fffffffffffffeff 0000000000000000 0000000000000000 0000000002739000
(XEN)    0000000000000080 ffffffff8275db62 000000000001a739 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff81037c80
(XEN)    007fffff8275efe7 ffffffff82739000 ffffffff81037f18 ffffffff8102aaf0
(XEN)    ffffffff8275dc8c 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0f00000060c0c748 ccccccccccccc305

Comment 5 Michael Young 2018-07-01 22:18:03 UTC
I think the commit that triggered the crash is
x86/mm: Do not auto-massage page protections
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.17.y&id=fb43d6cb91ef57d9e58d5f69b423784ff4a4c374
which adds a WARN_ONCE call which gets called before xen is ready to handle it - see https://lists.xenproject.org/archives/html/xen-devel/2018-07/msg00014.html for details.

Comment 6 Mitchell Berger 2018-07-02 06:48:41 UTC
You're correct.  I've rebuilt 4.17.3-200.fc28.x86_64 with a patch that
changes only the check_pgprot() call in pfn_pte() back to massage_pgprot().
This rebuilt kernel boots as a PV domU.

I do realize, however, that this is only useful for verification of what's
causing the issue, and is not the correct way to fix the underlying
problem.  I also see that Juergen posted to xen-devel about ten minutes
ago stating the two things he believes need to be done to actually
correct the problem and that he'll have patches for those soon.

Comment 7 Michael Young 2018-07-02 09:35:35 UTC
There is a scratch build of 4.17.3-200.fc28.x86_64 with Juergen's two proposed patches applied at https://koji.fedoraproject.org/koji/taskinfo?taskID=27981555 if anyone wants to test it (it works for me on a DomU, I can't test Dom0 until later).

Comment 8 Mitchell Berger 2018-07-02 11:12:45 UTC
I've tested your 4.17.3-200.2.fc28.x86_64 as a dom0 and as a domU.
Success on both counts!

Comment 9 Mitchell Berger 2018-07-03 21:51:42 UTC
I see that the Xen folks sent the patches on to LKML, and I also see that
4.17.4 was committed to Fedora's f28 branch, built, and on its way to
testing.  I don't see this fix in the 4.17.4 changelog upstream, so that
kernel will still be broken for all of us using Xen.  It would be great
if Juergen's patches could be applied and a kernel containing them pushed
out instead.

Comment 10 Jeremy Cline 2018-07-12 17:22:44 UTC
Hi folks,

Just an update on this, I've picked up one of the patches (the one that avoids the WARN_ON) as it was applied to the xen tree this morning. It should be in the v4.17.7 build. 

The second one needed some adjustments and v2 was submitted today. It fixes the underlying problem, but the one I picked up should avoid triggering that problem so it should address the issue here.

Comment 11 Michael Young 2018-07-15 11:44:11 UTC
It looks like both patches are going to be in upstream 4.17.7.

Comment 12 nucleo 2018-07-17 18:11:14 UTC
'Fedora, with Xen hypervisor' started with kernel-4.17.7-200.fc28.x86_64.

Comment 13 Fedora Update System 2018-07-17 23:44:27 UTC
kernel-tools-4.17.7-200.fc28 kernel-4.17.7-200.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-898f23c2f3

Comment 14 Fedora Update System 2018-07-17 23:45:30 UTC
kernel-tools-4.17.7-100.fc27 kernel-4.17.7-100.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-946b5c7e52

Comment 15 Fedora Update System 2018-07-19 17:28:15 UTC
kernel-4.17.7-100.fc27, kernel-tools-4.17.7-100.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-946b5c7e52

Comment 16 Fedora Update System 2018-07-19 20:18:53 UTC
kernel-4.17.7-200.fc28, kernel-tools-4.17.7-200.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-898f23c2f3

Comment 17 Fedora Update System 2018-07-22 02:44:17 UTC
kernel-4.17.7-100.fc27, kernel-tools-4.17.7-100.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 18 Fedora Update System 2018-07-22 03:03:09 UTC
kernel-4.17.7-200.fc28, kernel-tools-4.17.7-200.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.