531311 – 2.6.32 boot fails as Xen PV guest with stackprotector

Bug 531311 - 2.6.32 boot fails as Xen PV guest with stackprotector

Summary: 2.6.32 boot fails as Xen PV guest with stackprotector

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	12
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Justin M. Forbes
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F13VirtBlocker 531313
TreeView+	depends on / blocked

Reported:	2009-10-27 17:26 UTC by Andrew Jones
Modified:	2010-02-11 16:01 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Clones:	531313 (view as bug list)
Environment:
Last Closed:	2010-02-11 16:01:47 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Set up mmu_ops before setting up gdt (1.12 KB, patch) 2009-10-29 23:15 UTC, Jeremy Fitzhardinge	no flags	Details \| Diff
View All

Description Andrew Jones 2009-10-27 17:26:27 UTC

Booting upstream 2.6.32-rc5 and latest f13 kernels as Xen PV guests fails. The failure has something to do with the amount of memory allocated. No output goes to the console, but xm dmesg has some clues.

Steps and output from xm dmesg:

Reboot the host to make sure the hypervisor is in a "fresh" state.

Try booting the 2.6.32-rc5 kernel with only 128 MB of RAM allocated. xm dmesg shows the following

(XEN) mm.c:649:d2 Error getting mfn ba5 (pfn 5555555555555555) from L1 entry 0000000000ba5061 for dom2
(XEN) traps.c:405:d2 Unhandled invalid opcode fault/trap [#6] in domain 2 on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 2 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-3.1.2  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e019:[<00000000c0b17aef>]
(XEN) RFLAGS: 0000000000000282   CONTEXT: guest
(XEN) rax: 00000000ffffffea   rbx: 00000000c0ba5000   rcx: 0000000000ba5061
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 00000000c0ab8638
(XEN) rbp: 00000000c0a67fbc   rsp: 00000000c0a67f74   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026b0
(XEN) cr3: 00000001326a1000   cr2: 0000000000000000
(XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
(XEN) Guest stack trace from esp=c0a67f74:
(XEN)   c0b17aef 0001e019 00010082 00000000 00000000 00000000 00000000 00000000
(XEN)   00000ba5 001328f7 00000100 c0a67fc6 c0a67f80 c0a67f80 c0ba5000 00000010
(XEN)   c4862000 c0ab8638 c0a67fcc c0414b32 00ff8638 c0ba5000 c0a67ffc c0b177f6
(XEN)   dfc00018 c04090ce 00000000 00000000 00000000 00000000 00000000 00000000
(XEN)   c4862000 00000000 00000000

This is from the BUG() in xen_load_gdt_boot due to HYPERVISOR_update_va_mapping being mad about the not being able to read the per_cpu__gdt_page.

Trying with 256 MB gives the same results. Trying with 384 MB boots, but there is still a complaint about reading the per_cpu__gdt_page.

(XEN) mm.c:649:d4 Error getting mfn 100ba5 (pfn 3b1a5) from L1 entry 0000000100ba5061 for dom4
(XEN) mm.c:3341:d4 ptwr_emulate: fixing up invalid PAE PTE 0000000100ba5061

Trying with 512 MB gives the same results. Then trying again with 384, after booting with 512, the guest fails to boot because we hit the same BUG() as with 128 and 256. Likewise if you boot with 1024 then you won't be able to boot again with 512. You can again boot with less memory if you reboot the host (i.e. restart the hypervisor).

Comment 1 Andrew Jones 2009-10-27 17:33:07 UTC

Adding Paolo to CC since he's currently working on bisecting this from the last bootable rev (upstream stable 2.6.31.5).

Comment 2 Paolo Bonzini 2009-10-27 18:49:07 UTC

I have this (partial) result so far:

bad 78f28b7
good 3240a77

Comment 3 Andrew Jones 2009-10-29 14:53:23 UTC

I played with this a bit and found that if I turned off CONFIG_CC_STACKPROTECTOR, and also completely remove the xen_setup_stackprotector() call from xen_start_kernel(), then I can boot with as little as 135 MB, then jump up to 1024 MB, and also back down to whatever allocation I want. Less than 135 MB panics due to being out of memory. So in other words, removing stackprotector seems to "fix" this problem. We need to investigate how to get stackprotector working for Xen PV guests.

Comment 4 Jeremy Fitzhardinge 2009-10-29 23:15:13 UTC

Created attachment 366730 [details]
Set up mmu_ops before setting up gdt

This should fix it.

Comment 5 Paolo Bonzini 2009-10-30 08:46:55 UTC

Makes a lot of sense, considering that the two patches adding xen_init_mmu_ops and xen_setup_stackprotector were very close in time:

- 6b18ae3 (x86: Move memory_setup to x86_init_ops, 2009-08-20)

- 577eebe (xen: make -fstack-protector work under Xen, 2009-08-27)

and they were conflicting.  They were merged with 577eebe first and 6b18ae3 second; you're patch is "simply" ordering them in the other way.

Comment 6 Paolo Bonzini 2009-10-30 08:51:00 UTC

Committed upstream as 973df35

Comment 7 Andrew Jones 2009-10-30 09:05:36 UTC

I tested latest upstream (v2.6.32-rc5-338-g2e2ec95) and it's good to go. Reassigning to Justin for Fedora integ/test.

Comment 8 Bug Zapper 2009-11-16 14:26:02 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Note You need to log in before you can comment on or make changes to this bug.