Description of problem: 'Fedora, with Xen hypervisor' don't boot with kernel-4.17.2-200.fc28.x86_64 Version-Release number of selected component (if applicable): kernel-4.17.2-200.fc28.x86_64 Steps to Reproduce: Boot 'Fedora, with Xen hypervisor ' kernel-4.17.2-200.fc28.x86_64 Actual results: After loading /boot/xen-4.10.1.gz kernel do not start and reboots after couple of seconds Additional info: kernel 4.16.16-300.fc28 boots fine.
There is a similar sounding bug reported on the xen-devel list at https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg01304.html
I added nospec_store_bypass_disable to kernel parameters but it also rebooted.
Whatever this issue is, much like the one reported in bug 1544963 back in 4.15.2-4.15.4, it appears to also be preventing me from booting a domU paravm (at least under Xen 4.4.3) that boots fine with 4.16.16. As with nucleo, nospec_store_bypass_disable didn't solve the problem. I also tried applying the patch at the end of the thread Michael linked to and rebuilding the kernel packages, with no apparent effect. I haven't found any subject lines in the current month's archive of xen-devel that sounded overly promising, but the number of threads there is so massive that it's difficult to tell. I did look at some of the patches that have been sent off to Linux for Xen in 4.18, and based on one of those, tried booting the domU with pci=nomsi and with pci=off, and neither of those helped, either.
I got the backtrace below with a Dom0 boot of xen-4.10.1-5.fc28.x86_64 running kernel-4.17.2-200.fc28.x86_64 (which I have also posted to the xen-devel list) where addr2line -f -e vmlinux ffffffff81062330 gives native_irq_disable /usr/src/debug/kernel-4.17.fc28/linux-4.17.2-200.fc28.x86_64/./arch/x86/include/asm/irqflags.h:44 The equivalent boot with kernel-4.16.16-300.fc28.x86_64 works. (XEN) d0v0 Unhandled general protection fault fault/trap [#13, ec=0000] (XEN) domain_crash_sync called from entry.S: fault at ffff82d08035557c x86_64/en try.S#create_bounce_frame+0x135/0x159 (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.10.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff81062330>] (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest (d0v0) (XEN) rax: 0000000000000246 rbx: 00000000ffffffff rcx: 0000000000000000 (XEN) rdx: 0000000000000000 rsi: 00000000ffffffff rdi: 0000000000000000 (XEN) rbp: 0000000000000000 rsp: ffffffff82203d90 r8: ffffffff820bb698 (XEN) r9: ffffffff82203e38 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: ffffffff820bb698 r14: ffffffff82203e38 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000006e0 (XEN) cr3: 000000001aacf000 cr2: 0000000000000000 (XEN) fsb: 0000000000000000 gsb: ffffffff82731000 gss: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff82203d90: (XEN) 0000000000000000 0000000000000000 0000000000000000 ffffffff81062330 (XEN) 000000010000e030 0000000000010046 ffffffff82203dd8 000000000000e02b (XEN) 0000000000000246 ffffffff8110e019 0000000000000000 0000000000000246 (XEN) 0000000000000000 0000000000000000 ffffffff820a6cd8 ffffffff82203e88 (XEN) ffffffff82739000 8000000000000061 0000000000000000 0000000000000000 (XEN) ffffffff8110ecb6 0000000000000008 ffffffff82203e98 ffffffff82203e58 (XEN) 0000000000000000 0000000000000000 8000000000000161 0000000000000100 (XEN) fffffffffffffeff 0000000000000000 0000000000000000 ffffffff82203ef0 (XEN) ffffffff810ac990 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 8000000000000161 0000000000000100 (XEN) fffffffffffffeff 0000000000000000 0000000000000000 0000000002739000 (XEN) 0000000000000080 ffffffff8275db62 000000000001a739 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 ffffffff81037c80 (XEN) 007fffff8275efe7 ffffffff82739000 ffffffff81037f18 ffffffff8102aaf0 (XEN) ffffffff8275dc8c 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0f00000060c0c748 ccccccccccccc305
I think the commit that triggered the crash is x86/mm: Do not auto-massage page protections https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.17.y&id=fb43d6cb91ef57d9e58d5f69b423784ff4a4c374 which adds a WARN_ONCE call which gets called before xen is ready to handle it - see https://lists.xenproject.org/archives/html/xen-devel/2018-07/msg00014.html for details.
You're correct. I've rebuilt 4.17.3-200.fc28.x86_64 with a patch that changes only the check_pgprot() call in pfn_pte() back to massage_pgprot(). This rebuilt kernel boots as a PV domU. I do realize, however, that this is only useful for verification of what's causing the issue, and is not the correct way to fix the underlying problem. I also see that Juergen posted to xen-devel about ten minutes ago stating the two things he believes need to be done to actually correct the problem and that he'll have patches for those soon.
There is a scratch build of 4.17.3-200.fc28.x86_64 with Juergen's two proposed patches applied at https://koji.fedoraproject.org/koji/taskinfo?taskID=27981555 if anyone wants to test it (it works for me on a DomU, I can't test Dom0 until later).
I've tested your 4.17.3-200.2.fc28.x86_64 as a dom0 and as a domU. Success on both counts!
I see that the Xen folks sent the patches on to LKML, and I also see that 4.17.4 was committed to Fedora's f28 branch, built, and on its way to testing. I don't see this fix in the 4.17.4 changelog upstream, so that kernel will still be broken for all of us using Xen. It would be great if Juergen's patches could be applied and a kernel containing them pushed out instead.
Hi folks, Just an update on this, I've picked up one of the patches (the one that avoids the WARN_ON) as it was applied to the xen tree this morning. It should be in the v4.17.7 build. The second one needed some adjustments and v2 was submitted today. It fixes the underlying problem, but the one I picked up should avoid triggering that problem so it should address the issue here.
It looks like both patches are going to be in upstream 4.17.7.
'Fedora, with Xen hypervisor' started with kernel-4.17.7-200.fc28.x86_64.
kernel-tools-4.17.7-200.fc28 kernel-4.17.7-200.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-898f23c2f3
kernel-tools-4.17.7-100.fc27 kernel-4.17.7-100.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-946b5c7e52
kernel-4.17.7-100.fc27, kernel-tools-4.17.7-100.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-946b5c7e52
kernel-4.17.7-200.fc28, kernel-tools-4.17.7-200.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-898f23c2f3
kernel-4.17.7-100.fc27, kernel-tools-4.17.7-100.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.
kernel-4.17.7-200.fc28, kernel-tools-4.17.7-200.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.