Red Hat Bugzilla – Bug 492523
xen pvops kernels won't boot on processors without the NX bit
Last modified: 2009-04-22 06:21:57 EDT
Description of problem:
This is going to be kind of a hand wavy bug report, but I wanted a place to at least jot this down, along with some of the data I've gathered so far. It seems that certain people are having a problem where they cannot install Fedora 9, 10, or 11 PV x86_64 guests on a RHEL-5.3 x86_64 dom0. Now, this situation works just fine for me, but in the case I'm currently looking at, it crashes extremely early in boot, before any messages are every printed to the virtual console. Using a pre-pvops kernel, say 5.3 or Fedora 8, boots just fine.
At the moment, I don't really have any good ideas. This clearly works in some circumstances, but isn't working on this particular box. I'll attach snippets of xend.log, domain-builder-ng.log, and xm dmesg output. I'm still waiting on serial console access to the box, to get a little bit of additional information that way.
Created attachment 336975 [details]
The /etc/xen configuration file
This is the /etc/xen configuration file I'm trying to boot. The "vmlinuz-f10" and "initrd.gz-f10" files are known good from booting on another machine.
Created attachment 336976 [details]
Snippet from xend.log when trying to create this domain
Created attachment 336978 [details]
Snippet from domain-builder-ng.log when trying to create this domain
Created attachment 336979 [details]
Snippet from xm dmesg when trying to create this domain
I've also used the xenctx command to get a little more information about where the domain is dying. It's extremely early on in the boot process:
[clalance@host ~]$ sudo /usr/lib64/xen/bin/xenctx -s ./System.map-f10 53
rip: ffffffff8100b8a2 set_page_prot+0x6d
rax: ffffffea rbx: 000016e5 rcx: 00000055 rdx: 00000000
rsi: 800000017c0fe061 rdi: ffffffff816e5000 rbp: ffffffff81575f68
r8: 0000000f r9: ffffffff817ef350 r10: ffffffff817ef550 r11: 00000010
r12: ffffffff816e5000 r13: 800000017c0fe061 r14: 8000000000000161 r15: 00002c00
cs: 0000e033 ds: 00000000 fs: 00000000 gs: 00000000
0000000000000055 0000000000000010 ffffffff8100b8a2 000000010000e030
0000000000010082 ffffffff81575f48 000000000000e02b ffffffff8100b89e
0000000000000200 ffffffff816e8000 0000000000000800 0000000000000016
ffffffff81575ff8 ffffffff815a5c60 0000000000002c00 0000000000000000
6f 60 1d 00 4c 89 e7 4c 89 ee 31 d2 e8 22 d9 ff ff 85 c0 74 04 <0f> 0b eb fe 5b 41 5c 41 5d 41 5e
[<ffffffff8100b8a2>] set_page_prot+0x6d <--
This thread on xen-devel also describes exactly the same problem:
I'm starting to get the feeling that this has to do with the NX bit (or, more specifically, the lack thereof). On the particular machine I'm trying this out on, it has no NX bit. On the machine that I know works, it does have the NX bit. Looking at the (RHEL-5) hypervisor code, here's what I'm seeing:
The failure is in do_update_va_mapping(), which, in turn, calls mod_l1_entry(), which is responsible for printing the "Bad L1 flags" message in the xm dmesg output. Now, it does this iff the flags on the PTE aren't masked out by the L1_DISALLOW_MASK. L1_DISALLOW_MASK essentially is (0xFF800180U & ~_PAGE_NX) | _PAGE_GNTTAB. What's important, though, is that _PAGE_NX is based on whether the machine has NX or not:
/* Bit 23 of a 24-bit flag mask. This corresponds to bit 63 of a pte.*/
#define _PAGE_NX_BIT (1U<<23)
#define _PAGE_NX (cpu_has_nx ? _PAGE_NX_BIT : 0U)
So, if the CPU *does* have NX, then _PAGE_NX is 0x00800000, and OR'ed into the above, the L1_DISALLOW_MASK is:
However, if the CPU *doesn't* have NX, the _PAGE_NX_BIT is 0, and OR'ed into the above, the L1_DISALLOW_MASK is:
So, given this, I took the above and plugged some numbers into it from a working machine. On a working machine, I get console output like:
addr=ffffffff816e5000 pfn=16e5 mfn=3d9d0b prot=8000000000000161 pte=80000003d9d0b061
If you take that last number, which contains the PTE flags, and run it through the math, when NX is on, then you get a 0 at the end; that is, there are no disallowed flags. However, if you take that same number, and run it through the math when NX is off, then you end up with 0x800000, which means the NX bit is on, but not allowed. My guess is that the PV-ops kernels are not properly taking the NX bit into account.
And indeed, looking around for mailing list postings, that does seem to be the case; see here:
So, in the end, what this means at the moment is that no pvops kernel is going to boot on a machine without NX. Now, the patch above hasn't been accepted that I can see, so that the first order of business. Once we have it upstream, then we can get it into F-11 before it ships, and also probably backport it to F-9 and F-10. However, even if we do that, you still won't be able to do an install on F-9 or F-10, because the installer won't have the updated kernel. I guess we are in a bad position no matter what. I'll at least work on seeing what the upstream status is.
Looks like this is going in, does it solve the problem:
*** Bug 480880 has been marked as a duplicate of this bug. ***
cebbert: AFAICS it would fix the problem, yep
Looks like this applies fine against 2.6.29, btw
Okay, it's in 2.6.30-rc2 now
Building in dist-f11-updates-candidate
* Sun Apr 19 2009 Mark McLoughlin <email@example.com> - 18.104.22.168-101
- Fix xen boot on machines without NX support (#492523)
Cool, thanks a lot. What do you think of backporting this to F-10 and F-9? Worth it, or not? We are still in the crappy situation where you won't be able to install those releases, but at least if people have images that they update from say F-8 -> F-9 -> F-10, or otherwise install, they will get a booting kernel.
That build failed, new build is:
(In reply to comment #14)
> Cool, thanks a lot. What do you think of backporting this to F-10 and
2.6.29 is in updates-testing, and I think Chuck will pull the patch in there. That's good enough, I reckon, unless someone specifically asks for it for F9.
kernel-22.214.171.124-101.fc11 now tagged in dist-f11
(In reply to comment #16)
> (In reply to comment #14)
> > Cool, thanks a lot. What do you think of backporting this to F-10 and
> > F-9?
> 2.6.29 is in updates-testing, and I think Chuck will pull the patch in there.
> That's good enough, I reckon, unless someone specifically asks for it for F9.
OK, sounds good to me. Thanks!