Bug 531311 - 2.6.32 boot fails as Xen PV guest with stackprotector
2.6.32 boot fails as Xen PV guest with stackprotector
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
12
All Linux
high Severity medium
: ---
: ---
Assigned To: Justin M. Forbes
Fedora Extras Quality Assurance
: Triaged
Depends On:
Blocks: F13VirtBlocker 531313
  Show dependency treegraph
 
Reported: 2009-10-27 13:26 EDT by Andrew Jones
Modified: 2010-02-11 11:01 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 531313 (view as bug list)
Environment:
Last Closed: 2010-02-11 11:01:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Set up mmu_ops before setting up gdt (1.12 KB, patch)
2009-10-29 19:15 EDT, Jeremy Fitzhardinge
no flags Details | Diff

  None (edit)
Description Andrew Jones 2009-10-27 13:26:27 EDT
Booting upstream 2.6.32-rc5 and latest f13 kernels as Xen PV guests fails. The failure has something to do with the amount of memory allocated. No output goes to the console, but xm dmesg has some clues.

Steps and output from xm dmesg:

Reboot the host to make sure the hypervisor is in a "fresh" state.

Try booting the 2.6.32-rc5 kernel with only 128 MB of RAM allocated. xm dmesg shows the following

(XEN) mm.c:649:d2 Error getting mfn ba5 (pfn 5555555555555555) from L1 entry 0000000000ba5061 for dom2
(XEN) traps.c:405:d2 Unhandled invalid opcode fault/trap [#6] in domain 2 on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 2 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-3.1.2  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e019:[<00000000c0b17aef>]
(XEN) RFLAGS: 0000000000000282   CONTEXT: guest
(XEN) rax: 00000000ffffffea   rbx: 00000000c0ba5000   rcx: 0000000000ba5061
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 00000000c0ab8638
(XEN) rbp: 00000000c0a67fbc   rsp: 00000000c0a67f74   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026b0
(XEN) cr3: 00000001326a1000   cr2: 0000000000000000
(XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
(XEN) Guest stack trace from esp=c0a67f74:
(XEN)   c0b17aef 0001e019 00010082 00000000 00000000 00000000 00000000 00000000
(XEN)   00000ba5 001328f7 00000100 c0a67fc6 c0a67f80 c0a67f80 c0ba5000 00000010
(XEN)   c4862000 c0ab8638 c0a67fcc c0414b32 00ff8638 c0ba5000 c0a67ffc c0b177f6
(XEN)   dfc00018 c04090ce 00000000 00000000 00000000 00000000 00000000 00000000
(XEN)   c4862000 00000000 00000000

This is from the BUG() in xen_load_gdt_boot due to HYPERVISOR_update_va_mapping being mad about the not being able to read the per_cpu__gdt_page.

Trying with 256 MB gives the same results. Trying with 384 MB boots, but there is still a complaint about reading the per_cpu__gdt_page.

(XEN) mm.c:649:d4 Error getting mfn 100ba5 (pfn 3b1a5) from L1 entry 0000000100ba5061 for dom4
(XEN) mm.c:3341:d4 ptwr_emulate: fixing up invalid PAE PTE 0000000100ba5061

Trying with 512 MB gives the same results. Then trying again with 384, after booting with 512, the guest fails to boot because we hit the same BUG() as with 128 and 256. Likewise if you boot with 1024 then you won't be able to boot again with 512. You can again boot with less memory if you reboot the host (i.e. restart the hypervisor).
Comment 1 Andrew Jones 2009-10-27 13:33:07 EDT
Adding Paolo to CC since he's currently working on bisecting this from the last bootable rev (upstream stable 2.6.31.5).
Comment 2 Paolo Bonzini 2009-10-27 14:49:07 EDT
I have this (partial) result so far:

bad 78f28b7
good 3240a77
Comment 3 Andrew Jones 2009-10-29 10:53:23 EDT
I played with this a bit and found that if I turned off CONFIG_CC_STACKPROTECTOR, and also completely remove the xen_setup_stackprotector() call from xen_start_kernel(), then I can boot with as little as 135 MB, then jump up to 1024 MB, and also back down to whatever allocation I want. Less than 135 MB panics due to being out of memory. So in other words, removing stackprotector seems to "fix" this problem. We need to investigate how to get stackprotector working for Xen PV guests.
Comment 4 Jeremy Fitzhardinge 2009-10-29 19:15:13 EDT
Created attachment 366730 [details]
Set up mmu_ops before setting up gdt

This should fix it.
Comment 5 Paolo Bonzini 2009-10-30 04:46:55 EDT
Makes a lot of sense, considering that the two patches adding xen_init_mmu_ops and xen_setup_stackprotector were very close in time:

- 6b18ae3 (x86: Move memory_setup to x86_init_ops, 2009-08-20)

- 577eebe (xen: make -fstack-protector work under Xen, 2009-08-27)

and they were conflicting.  They were merged with 577eebe first and 6b18ae3 second; you're patch is "simply" ordering them in the other way.
Comment 6 Paolo Bonzini 2009-10-30 04:51:00 EDT
Committed upstream as 973df35
Comment 7 Andrew Jones 2009-10-30 05:05:36 EDT
I tested latest upstream (v2.6.32-rc5-338-g2e2ec95) and it's good to go. Reassigning to Justin for Fedora integ/test.
Comment 8 Bug Zapper 2009-11-16 09:26:02 EST
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Note You need to log in before you can comment on or make changes to this bug.