I've installed the Xen hypervisor packages on Fedora 22 but I'm getting a panic early during the boot process: (XEN) Xen call trace: (XEN) [<ffff82d08011d160>] free_domheap_pages+0x240/0x430 (XEN) [<ffff82d08018c944>] mmio_ro_do_page_fault+0x114/0x160 (XEN) [<ffff82d0801a4c10>] do_page_fault+0x1a0/0x4f0 (XEN) [<ffff82d080239768>] handle_exception_saved+0x2e/0x6c (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at page_alloc.c:1738 (XEN) **************************************** Full output: https://gist.github.com/major/baa0e2eee7de51a2bcd1 Packages in use: * kernel-4.0.1-300.fc22.x86_64 * xen-4.5.0-8.fc22.x86_64 I'm able to reproduce the failure on Dell/HP physical servers as well as within a KVM virtual machine (with nested virt enabled). I can't tell if this is a bug in the Linux kernel or within Xen. I'll be glad to reclassify the component in the bug if someone knows this better than I do.
FWIW, the error is identical with kernel-4.0.0-0.rc5.git4.1.fc22.x86_64.
The output is from Xen, so we'll start there.
The same error appears when using these kernels as well: * kernel-3.19.5-200.fc21.x86_64 * kernel-3.18.8-201.fc21.x86_64 * kernel-3.17.8-300.fc21.x86_64
The crash occurs at the line BUG_ON((pg[i].u.inuse.type_info & PGT_count_mask) != 0); in xen/common/page_alloc.c.
Jan suggested on xen-devel that gcc 5.0.1 might be to blame[1]. Is Xen 4.5 working for anyone else on Fedora 22's latest package/kernel set? [1] http://lists.xen.org/archives/html/xen-devel/2015-05/msg02604.html
Yes, it looks like gcc (or something else in the build chain). My newly updated F22 system won't boot in xen (4.5.0-8 or 4.5.1-rc1) but will boot with the 4.5.1-rc1 xen.gz file built on F21.
From the thread http://marc.info/?l=xen-devel&m=143292326301633&w=2 on the xen-devel list GCC 5 is indeed miscompiling the code. Comparing the fc21 vs fc22 builds: The C snippet from mmio_ro_do_page_fault(): struct page_info *page = mfn_to_page(mfn); struct domain *owner = page_get_owner_and_reference(page); if ( owner ) put_page(page); In fc21 is: movabs $0xffff82e000000000,%rbp shr %cl,%rax or %rdx,%rax shl $0x5,%rax add %rax,%rbp mov %rbp,%rdi callq ffff82d080186900 <page_get_owner_and_reference> test %rax,%rax mov %rax,%r12 je ffff82d080189c4e <mmio_ro_do_page_fault+0x11e> mov %rbp,%rdi callq ffff82d080188ec0 <put_page> and in fc22 is: movabs $0xffff82e000000000,%r8 shr %cl,%rax or %rdx,%rax shl $0x5,%rax lea (%r8,%rax,1),%rdi callq ffff82d0801874f0 <page_get_owner_and_reference> test %rax,%rax mov %rax,%rbp je ffff82d08018ca14 <mmio_ro_do_page_fault+0x114> mov %r8,%rdi callq ffff82d080189a90 <put_page> "lea (%r8,%rax,1),%rdi" in FC22 is slightly shorter than "add %rax,%rbp; mov %rbp,%rdi" in FC21. In both cases %rdi is now 'page' from the C snippet. In FC21, the result is stored in %rbp, then reloaded from %rbp into %rdi for call to put_page(). However, in FC22, the result of the calculation is only held in %rdi, and clobbered by the call to page_get_owner_and_reference(). When it comes to call put_page(), %r8 is reloaded, which is still a pointer to the base of the frametable, not the page we actually took a reference on. FC22 is miscompiling the C to: struct page_info *page = mfn_to_page(mfn); struct domain *owner = page_get_owner_and_reference(page); if ( owner ) put_page(mfn_to_page(0)); which is wrong, and why free_domheap_pages() does legitimately complain about the wonky refcount. Further testing links this to the -fcaller-saves option as if the file is built with -fno-caller-saves on F22 then the code snippet goes back to the F21 version. Possibly the mov %r8,%rdi line is incorrect.
Please attach preprocessed source in which this happens and provide full gcc command line used to compile this file.
Created attachment 1035629 [details] preprocessed source The full compile line (with some duplications removed) is gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fomit-frame-pointer -fno-strict-aliasing -std=gnu99 -Wstrict-prototypes -Wdeclaration-after-statement -Wno-unused-but-set-variable -Wno-unused-local-typedefs -DNDEBUG -I/home/michael/rpmbuild/BUILD/xen-4.5.0/xen/include -I/home/michael/rpmbuild/BUILD/xen-4.5.0/xen/include/asm-x86/mach-generic -I/home/michael/rpmbuild/BUILD/xen-4.5.0/xen/include/asm-x86/mach-default -msoft-float -fno-stack-protector -fno-exceptions -Wnested-externs -DHAVE_GAS_VMX -DHAVE_GAS_EPT -DHAVE_GAS_FSGSBASE -mno-red-zone -mno-sse -fpic -fno-asynchronous-unwind-tables -DGCC_HAS_VISIBILITY_ATTRIBUTE -fno-builtin -fno-common -Werror -Wredundant-decls -Wno-pointer-arith -pipe -D__XEN__ -include /home/michael/rpmbuild/BUILD/xen-4.5.0/xen/include/xen/config.h -nostdinc -DXSM_ENABLE -DFLASK_ENABLE -DHAS_ACPI -DHAS_GDBSX -DHAS_PASSTHROUGH -DHAS_MEM_ACCESS -DHAS_MEM_PAGING -DHAS_MEM_SHARING -DHAS_PCI -DHAS_IOPORTS -DHAS_PDX -MMD -MF .xen.d -MF .built_in.o.d -MF .mm.o.d -c mm.c -o mm.o
Thanks, filed upstream: PR66444.
It looks like the patch made it into upstream GCC if I am reading this ticket correctly: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66444#c12
Then it is already in the gcc-5.1.1-3.fc22 errata.