Bug 254208
Summary: | Xen: Mapping random guest pages returns the wrong information | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Chris Lalancette <clalance> | ||||||
Component: | kernel-xen | Assignee: | Markus Armbruster <armbru> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.1 | CC: | xen-maint | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2008-0314 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-05-21 14:53:50 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 249409 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Chris Lalancette
2007-08-24 19:16:25 UTC
Created attachment 182941 [details]
Simple test program to map and read arbitrary mfns
The foreign map is implemented as ioctl in privcmd_ioctl(). It uses
direct_remap_pfn_range() to do the actual work. direct_remap_pfn_range()
diagnoses failed hypercalls correctly and returns an error code.
privcmd_ioctl() case IOCTL_PRIVCMD_MMAP, which implements
xc_map_foreign_range(), returns the error code. Case IOCTL_PRIVCMD_MMAPBATCH,
which implements xc_map_foreign_batch(), does not. Instead, it changes the
mfns that failed in the argument array. This appears to be undocumented. Some
users of xc_map_foreign_batch() check the argument array for failure, some
don't.
I'm inclined to classify this broken-as-designed and close NOTABUG.
Revised test program attached.
On further reflection, I think we have a couple of issues here, and the bug should remain open to track them. One is behavior of IOCTL_PRIVCMD_MMAPBATCH: when it can't map a page, it changes the mfn to indicate the error, but the exact error code is not communicated. That's indeed broken as designed, and any fix needs to come from upstream. It also maps a page of zeros, which doesn't make much sense. Upstream code is different, and Chris Lalancette suspects that it doesn't map anything in that case. We better find out. Another are the users of xc_map_foreign_batch() that neglect to check for failed mappings. I'm working with upstream to get them fixed there. OK, I've been looking at this in more detail. First, I took a Xensource 3.1 i386 kernel, and ran it with a RHEL-5.1 HV and RHEL-5.1 userspace tools. In terms of basic functionality, things work, so it is a good test. When running the "crash249409" program on the upstream Xen kernel, it is indeed the case that the offending program gets a SIGBUS when it tries to map these invalid pages. So this seems to be the behavior we want to shoot for. Now, in RHEL-5, what I think we actually want to do is to put the do_no_page handler back into privcmd, and remove VM_PFNMAP from the vma having to do with privcmd. This fixes the problem on x86_64, and should also cause the appropriate SIGBUS to be returned to a bad process (as in upstream Xen). I have to make sure this is the case. However, when we do what I said above, all guests stop working on i386 (both PV and HVM). Interestingly, dom0 boots up absolutely fine. However, when trying to start a PV guest, they always fail with: [root@dhcp83-141 xenfb]# xm create -c rhel5u1pv Using config file "/etc/xen/rhel5u1pv". Error: (1, 'Internal error', 'arch_setup_bootlate: pin_table failed (pfn 0xe73, rc=1)\n') [root@dhcp83-141 xenfb]# Digging deeper, what ends up happening is that the hypercall to pin_table() is failing. That hypercall is MMUEXT, which gets handled by the hypervisor: do_mmuext_op() -> get_page_and_type_from_pagenr() -> get_page_type() -> alloc_page_type() -> alloc_l3_table() In alloc_l3_table(), there is: if ( is_pv_32bit_domain(d) && (i == 3) ) { if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) || (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) || !get_page_and_type_from_pagenr(l3e_get_pfn(pl3e[i]), PGT_l2_page_table | PGT_pae_xen_l2, d) ) goto fail; After adding some more debugging to this, what's happening is that the flags of this page don't have _PAGE_PRESENT set; so this test fails, the hypercall fails, and the guest creation fails. Note that with the Xensource kernel and our HV and tools, this works fine, so it is pretty clearly a bug in the RHEL-5 kernel. I've been combing through the page table stuff in RHEL-5, and I haven't found the problem, but I haven't found it yet. It seems pretty clear that the kernel needs to mark the pages as PAGE_PRESENT once they get mmap'ed and the IOCTL is run; but I don't see how or where that would be done yet. Chris Lalancette After some further investigation, it is actually userland's responsibility to fill in the initial pagetables and set _PAGE_PRESENT on the entries, which it is, in fact, doing. I dumped the table right before hypercalling, and it looked correct. However, once in the hypervisor, that same page was filled with 0's. It seems to me that the dom0 kernel is somehow not mapping the pages properly and is actually faulting in a dom0 page on the writes, so that it is a *different* page than the one the HV looks at later. So I'm trying to find out what the difference between our kernel and the Xensource one is in this regard. Chris Lalancette Created attachment 217751 [details]
Use the nopfn() handler where nopage() was being used on privcmd
This patch (that needs the previous fix that was included on RHEL-5 to be
applied) changes the privcmd mmap vma to use nopfn(), that is supposed to be
used for VM_PFNMAP vmas.
This should make the program receive a SIGBUS (as the original nopage() handler
was supposed to do) instead of getting a all-zeros page.
Patch untested, as I don't have a RHEL-5 system for testing here right now.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. (In reply to comment #4) > After some further investigation, it is actually userland's responsibility to > fill in the initial pagetables and set _PAGE_PRESENT on the entries, which it > is, in fact, doing. I dumped the table right before hypercalling, and it looked > correct. However, once in the hypervisor, that same page was filled with 0's. This happens because of the missing VM_PFNMAP flag, that is necessary. It doesn't happen upstream because it doesn't have this patch, present on RHEL-5: linux-2.6-mm-tracking-dirty-pages.patch VM_PFNMAP was already supposed to be on the mapping upstream, but tracking-dirty-pages make it fatal. If the flag is not set, the VM doesn't know it is a special mapping and tries to catch the writes to the mapping by write-protecting it. Then, do_wp_page() gets confused and maps a zeroed page in the privcmd vma. The patch in comment#5 applied to 2.6.18-54.el5 works for me: the test program gets SIGBUS, and I can still start guests. in 2.6.18-62.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 -62 compiled by myself works fine with the test program from comment#1. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html |