Bug 459067

Summary: 32 bit 2.6.27-rc3 Xen guest crashing on 32 bit 3.1.2 HV
Product: [Fedora] Fedora Reporter: Chris Lalancette <clalance>
Component: kernelAssignee: Chris Lalancette <clalance>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: clalance, ijc, jeremy, kernel-maint, markmc, orion, riek, sputhenp, syeghiay, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-17 08:25:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output from the domU console during the crash
none
Output from the dom0 serial console during the F-10 kernel bootup/crash
none
Patch to fix booting problem with 768MB of memory
none
Patch to fix booting problem using virt_addr_valid none

Description Chris Lalancette 2008-08-14 07:34:58 UTC
Created attachment 314288 [details]
Output from the domU console during the crash

Description of problem:
I'm running an i386 RHEL-5 dom0.  On top of that, I successfully installed a F-9 i386 PV domU.  After I upgraded the kernel in the F-9 guest to 2.6.27-0.256.rc3.git1.fc10.i686.PAE, I'm getting a kernel crash on boot (I'll attach a full console output).  I also get exactly the same crash in a hand-built kernel from Linus' upstream tree (pulled as of 13-Aug-2008).  During the crash, there are also some messages on the serial console of the dom0; I'll also include those as an attachment.

Comment 1 Chris Lalancette 2008-08-14 07:35:35 UTC
Created attachment 314289 [details]
Output from the dom0 serial console during the F-10 kernel bootup/crash

Comment 2 Mark McLoughlin 2008-08-14 08:22:02 UTC
32 bit DomU crashing on 32 bit 3.1.2 HV

Also reported at:

  http://lists.xensource.com/archives/html/xen-devel/2008-08/msg00112.html

Traceback:

1 multicall(s) failed: cpu 0 Pid: 0, comm: swapper Tainted: G        W 2.6.27-rc1-xenU #4
 [<c0103afa>] xen_mc_flush+0x12a/0x190
 [<c0104a3d>] xen_set_pud_hyper+0x8d/0x90
 [<c0115965>] zap_low_mappings+0x55/0x80
 [<c02fbcb4>] start_kernel+0x1d4/0x2a0
 [<c02fb670>] unknown_bootoption+0x0/0x1f0
 [<c02ff16a>] xen_start_kernel+0x58a/0x650
 =======================
  call  1/1: op=1 arg=[c1203860] result=-22

this could be the relevant error:

 (XEN) mm.c:516:d8 Could not get page ref for pfn 7fffffff
 (XEN) mm.c:2372:d8 Could not get page for normal update

Might be worth seeing if it's something fixed with 3.1.4 HV

Comment 3 Jeremy Fitzhardinge 2008-08-14 09:40:05 UTC
Yes, this is the same Xen bug as the other set_pud crash, I think.  Though the Xen console messages aren't familiar.

Comment 4 Mark McLoughlin 2008-08-14 09:58:21 UTC
(In reply to comment #3)
> Yes, this is the same Xen bug as the other set_pud crash, I think.  Though the
> Xen console messages aren't familiar.

That was a 32-on-64 issue only, right?

Comment 5 Mark McLoughlin 2008-08-14 12:37:44 UTC
Works fine with 3.2.0 HV

Comment 6 Jeremy Fitzhardinge 2008-08-19 07:01:15 UTC
Hm, right, the other bug should only be 32-on-64.  But you're saying that a new Xen fixes it anyway?

Comment 7 Chris Lalancette 2008-08-19 11:58:28 UTC
OK.  It also seems to work fine with a 3.1.4 HV.  I haven't had time to bisect it yet, but it seems as though we should get a fix for this issue in, so that we can boot F-10 kernels on RHEL-5.  I'm changing the component to RHEL-5 as well, since it seems to be specific to the HV.

Chris Lalancette

Comment 8 Chris Lalancette 2008-08-19 12:50:33 UTC
Ug.  I've just found out that this bug happens because of a patch we are carrying in RHEL-5 that is not in the 3.1.x stream; namely, it's the "mprotect" performance enhancements patch.  Now, interestingly enough, I do believe these enhancements are also in the upstream (3.2.0) HV, so we must be missing a fix from there.  I'll have to dig into it more.

Chris Lalancette

Comment 11 Mark McLoughlin 2008-08-19 14:03:23 UTC
Also interesting is that the mprotect() patch and the fix for bug #457879 (the 32-on-64 issue) conflict with each other. Think that just might be a coincidence, though.

32-on-64 fix:

  http://xenbits.xensource.com/xen-3.1-testing.hg?rev/f1574ad9f702

mprotect() patch:

  http://xenbits.xensource.com/xen-unstable.hg?rev/fba4e7357744

Comment 12 Chris Lalancette 2008-08-22 18:31:46 UTC
Heh.  I just realized 3.2 is fine because it doesn't have the mprotect batching fixes.  We'll have to test with 3.3 to see how it fairs (I assume it is OK, but we'll have to see).

Chris Lalancette

Comment 13 Chris Lalancette 2008-08-23 16:23:16 UTC
OK.  I just tested with a recently released 3.3.0 hypervisor, and I got the exact same crash I got with the RHEL-5 hypervisor.  I'm still not sure whether the bug is in the guest kernel or the HV code, but it needs to be looked at and fixed either way.

Chris Lalancette

Comment 14 Jeremy Fitzhardinge 2008-08-25 19:38:37 UTC
Just to be clear: does this happen with any unmodified kernel/Xen combination, or only with some local RH patches in place?

Comment 15 Chris Lalancette 2008-08-26 08:40:09 UTC
Yes, this does happen, with basically any combination I've tried with (except for the 3.2.0 kernel, which doesn't have the batched mprotect patches).  Here are the combinations I've tried (all i386):

RHEL-5 HV (w/ mprotect patch) + RHEL-5 PV guest - good
RHEL-5 HV (w/ mprotect patch) + F-9 2.6.25 pv-ops guest - good
RHEL-5 HV (w/ mprotect patch) + F-10 2.6.27 pv-ops guest - crash
RHEL-5 HV (w/ mprotect patch) + 2.6.27 pv-ops (Linus' tree) guest - crash

Xensource 3.1.2 HV + F-10 2.6.27 pv-ops guest - good
Xensource 3.1.4 HV + F-10 2.6.27 pv-ops guest - good

Xensource 3.2.0 HV + F-10 2.6.27 pv-ops guest - good (according to markmc)

Xensource 3.3.0 HV + F-9 2.6.25 pv-ops guest - good
Xensource 3.3.0 HV + F-10 2.6.27 pv-ops guest - crash
Xensource 3.3.0 HV + 2.6.27 pv-ops (Linus' tree) guest - crash

So that last combination has no Redhat patches at all; only upstream Xensource HV and 2.6.27-rc3 LKML pv-ops kernel.

Chris Lalancette

Comment 16 Chris Lalancette 2008-09-01 13:00:22 UTC
Just as a quick update on this:

The reason that the hypercall is failing is because the MFN that the guest is passing down to the hypervisor is completely bogus.  After adding some debugging, I found that the guest is asking to change protections on page_nr 7fffffff, whereas this machine only has max_page of 400000 (16GB).  What I don't quite understand about that, however, is why 3.2 would work; maybe it is missing a check that both 3.3 and the RHEL-5 HV have.  I'm still investigating.

Chris Lalancette

Comment 17 Chris Lalancette 2008-09-01 15:34:16 UTC
Ah, now I'm getting somewhere.  This bug is memory dependent.  I'm now trying to boot the 2.6.27 kernel with various amounts of memory, and I get different behaviour.

400MB = boot
512MB = boot
602MB = boot
768MB = crash
2000MB = boot
4000MB = boot

Chris Lalancette

Comment 18 Jeremy Fitzhardinge 2008-09-01 16:40:12 UTC
OK, interesting.  The report you referred to was on a 64-bit hypervisor (I think), so this isn't a 32-on-32 specific problem.

The funny thing about the report is that the backtrace is to zap_low_mappings, which simply plugs empty_zero_page into the unused pgd slots.  The only way that could have a bad mfn is if the pfn->mfn table is corrupted or incorrectly updated.

Comment 19 Jeremy Fitzhardinge 2008-10-02 06:05:56 UTC
Is this bug still an issue?

Comment 20 Bill Burns 2008-10-02 10:41:51 UTC
I believe it is. Chris is out this week, so there won't be an answer until
next week.

Comment 21 Chris Lalancette 2008-10-07 14:22:31 UTC
As far as I know, yes.  I've been pulled away to various other things for the time being, but I can reproduce this 100% with the combinations in comment #15 and a 768MB PV guest.

Chris Lalancette

Comment 22 Chris Lalancette 2008-10-10 11:22:41 UTC
*** Bug 449566 has been marked as a duplicate of this bug. ***

Comment 23 Chris Lalancette 2008-10-14 06:58:03 UTC
OK.  I've made a little progress here, but I still haven't found the real cause.  I've added a bunch of debugging in the guest, and what is basically happening is that mm/protect.c:change_pte_range() is calling ptep_modify_prot_commit(), which resolves to arch/x86/xen/mmu.c:xen_pte_prot_commit().  In there, we are calling virt_to_machine(ptep), which is where our woes start.  The PTE that is being passed in is something like 0xf57a8500, which is run through __pa to get 0x357a8500.  Then we PAGE_SHIFT it to get pfn 0x357a8, and then call pfn_to_mfn on that pfn.  But since this machine only has 768MB of memory, it only has 0x30000 pages of memory, which means that pfn_to_mfn looks in the p2m_top[] array, returns a INVALID_MFN, and it's all downhill from there.

So, the question becomes, why are we getting this bogus 0xf57a8500 address to begin with?  What I believe is happening is that in mm/mprotect.c:change_pte_range(), pte_offset_map_lock() is basically resolving down to a kmap_atomic().  So this page frame is up in the kmap fixmap area, meaning that I don't think there is physical memory to cover it.  What I don't quite understand is why this doesn't happen with 512MB of memory, for instance.  In any case, I'll keep digging on this.

Chris Lalancette

Comment 24 Chris Lalancette 2008-10-14 08:46:51 UTC
Oh, and the reason that earlier kernels (say 2.6.25) don't exhibit this behavior is because they weren't using the "lazy" MMU updates at all, so xen_ptep_modify_prot_commit basically became "set_pte_at", and we never looked through the p2m table at all.  Indeed, replacing the whole body of xen_ptep_modify_prot_commit with just a set_pte_at() seems to make it work just fine, but obviously doesn't take advantage of the batching.

Chris Lalancette

Comment 25 Jeremy Fitzhardinge 2008-10-14 18:19:29 UTC
Oh.  So if its kmap_atomic, and you have HIGHPTE, then a pagetable page will be kmapped.  That means that the pfn->mfn will need to do a full pagetable walk (arbitrary_virt_to_machine()).

Unfortunately that's relatively expensive.  But arbitrary_virt_to_machine could special-case vaddrs in the linear mapping.

Comment 26 Chris Lalancette 2008-10-14 20:29:27 UTC
Bingo.  Using arbitrary_virt_to_machine() in xen_ptep_modify_prot_commit() fixed it.  So the next thing to do is to special case the vaddrs in the linear mapping like you said so that we can get the performance back.  I'll look at that next.

Chris Lalancette

Comment 27 Chris Lalancette 2008-10-15 07:09:24 UTC
Created attachment 320396 [details]
Patch to fix booting problem with 768MB of memory

Jeremy,
     This is the patch I've tested out, which seems to fix the bug for me.  Is this the kind of thing you had in mind?  If so, I'll do a little further testing on it and then submit it for you upstream.

Chris Lalancette

Comment 28 Jeremy Fitzhardinge 2008-10-15 07:23:36 UTC
Worried about the test against max_pfn because it precludes the use of sparsemem.  I think using __virt_addr_valid(vaddr) is the right test.

Comment 29 Chris Lalancette 2008-10-15 09:13:16 UTC
Created attachment 320408 [details]
Patch to fix booting problem using virt_addr_valid

Comment 30 Chris Lalancette 2008-10-15 11:05:06 UTC
I sent the latest patch to Jeremy and CC'ed LKML and xen-devel, so switching to POST.

Chris Lalancette

Comment 31 Chris Lalancette 2008-11-17 08:25:05 UTC
Now fixed in upstream, so closing out this bug.

Chris Lalancette