Bug 480880

Summary: f10 x86_64 xen guests fail to boot on f8 host (NX issue)
Product: [Fedora] Fedora Reporter: Naoki <naoki>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 10CC: berrange, jon.swanson, kraxel, markmc, mathieu-acct, virt-maint, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: http://forums.fedoraforum.org/showthread.php?t=210763
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-03 07:30:14 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 480594    

Description Naoki 2009-01-20 21:59:31 EST
Description of problem:
f10 x86_64 xen guests fail to boot on f8 hosts which lack the NX cpu flag.

See for logs, xen dmesg, xenctx output in an easier to read format:
http://forums.fedoraforum.org/showthread.php?t=210763


Version-Release number of selected component (if applicable):
Host :
Linux xen.test 2.6.21.7-5.fc8xen #1 SMP Thu Aug 7 12:44:22 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
Virsh version:
Compiled against library: libvir 0.4.4
Using library: libvir 0.4.4
Using API: Xen 3.0.1
Running hypervisor: Xen 3.1.0

Guest Kernel:
2.6.27.5-117.fc10.x86_64 

How reproducible:
Always


Steps to Reproduce:
1. Install f8 on a machine without NX support
2. Update f8 to newest version to avoid python-virtinst bug
3. Try to boot/install an f10 x86_64 guest
  
Actual results:
Guest domain crashes immediately.


Expected results:
Guest domain should boot normally.


Additional info:
Mark on Fedora-virt was able to boil it down to this:
Here's the important bits:

  1) Host kernel is 2.6.21.7-5.fc8xen, that means the hypervisor is 
     xen-3.1.4

  2) The guest kernel is 2.6.27.5-117.fc10.x86_64

  3) Phill points out the faulting instruction is UD2. That just means 
     the guest kernel is hitting a BUG() assertion. See /asm-x86/bug.h:

       #define BUG()                                                   \
       do {                                                            \
               asm volatile("ud2");                                    \
               for (;;) ;                                              \
       } while (0)

   4) The backtrace shows the fault happens in set_page_prot()

   5) Jon's dmesg contains:

        (XEN) mm.c:1362:d46 Bad L1 flags 800000

That means the guest is faulting here:

static void set_page_prot(void *addr, pgprot_t prot) { ....
        if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, 0))
                BUG();
}

because the PTE update is failing in the HV here:

static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e, 
                        unsigned long gl1mfn) { ...
        if ( unlikely(l1e_get_flags(nl1e) & L1_DISALLOW_MASK) )
        {
            MEM_LOG("Bad L1 flags %x",
                    l1e_get_flags(nl1e) & L1_DISALLOW_MASK);
            return 0;
        }
...
}

the PTE flags are 800000 which corresponds to:

#define _PAGE_NX_BIT (1U<<23)

Other Links:
http://www.redhat.com/archives/fedora-xen/2009-January/thread.html#00022
http://www.redhat.com/archives/fedora-virt/2009-January/thread.html#00013
http://www.redhat.com/archives/fedora-virt/2009-January/msg00014.html
Comment 1 Mark McLoughlin 2009-02-13 05:05:21 EST
Phill points out that he can enable NX in his BIOS with "Enable Execute Disable":

  http://www.redhat.com/archives/fedora-virt/2009-February/msg00065.html

Ian Campbell posted a patch upstream:

  http://lkml.org/lkml/2009/1/30/238

But, I don't think the patch has been merged yet.
Comment 2 Mark McLoughlin 2009-04-03 07:30:14 EDT
Okay, #492523 is the same bug and has a bunch more info

*** This bug has been marked as a duplicate of bug 492523 ***