Bug 253583
Summary: | Live migration of HVM/Fully-Virt guests crashes target host/dom0 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jan Mark Holzer <jmh> | ||||||
Component: | xen | Assignee: | Chris Lalancette <clalance> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 5.1 | CC: | clalance, mjenner, xen-maint | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHEA-2007-0635 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-11-07 17:11:31 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jan Mark Holzer
2007-08-20 17:48:00 UTC
Ah, OK. Here we go. I hooked up a serial console and did: # xm migrate rhel4u3fv <remote> and got the following stack trace out of the remote side: (XEN) save.c:170:d0 HVM restore: saved CPUID (0x100f20) does not match host (0x40f12). (XEN) save.c:176:d0 HVM restore: Xen changeset was not saved. (XEN) lapic_load to rearm the actimer:bus cycle is 10ns, saved tmict count 6250, period 1000000ns, irq=239 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at mm/memory.c:2290 invalid opcode: 0000 [1] SMP last sysfs file: /class/misc/evtchn/dev CPU 0 Modules linked in: tun xt_physdev netloop netbk blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2cap bluetooth sunrpc ipv6 dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi ac parport_pc lp parport snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer snd sg soundcore snd_page_alloc pcspkr forcedeth shpchp serial_core i2c_nforce2 i2c_core serio_raw ide_cd cdrom k8_edac edac_mc k8temp hwmon sata_nv libata mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 4214, comm: qemu-dm Not tainted 2.6.18-41.el5xen #1 RIP: e030:[<ffffffff80208b30>] [<ffffffff80208b30>] __handle_mm_fault+0x379/0xf46 RSP: e02b:ffff88004a73dde8 EFLAGS: 00010202 RAX: ffffffff80514840 RBX: 0000000000000000 RCX: 00003ffffffff000 RDX: 00000000496d4000 RSI: 0000000000000067 RDI: ffff880074412040 RBP: ffff880074412040 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: ffff8800496d4000 R14: 00002aaaac800000 R15: ffff880072bdf870 FS: 00002aaaab11e900(0000) GS:ffffffff80599000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process qemu-dm (pid: 4214, threadinfo ffff88004a73c000, task ffff8800720b10c0) Stack: 00000000fffffff2 0000000180275e4c ffff880074412040 ffff88004a952b20 0000000000001000 00002aaaace1b000 ffff880074412040 ffffffff80261889 ffff8800744120a8 ffffffff80221f0c Call Trace: [<ffffffff80261889>] _spin_lock_irqsave+0x9/0x14 [<ffffffff80221f0c>] __up_read+0x19/0x7f [<ffffffff802641db>] do_page_fault+0xe48/0x11dc [<ffffffff8030b0dd>] file_has_perm+0x94/0xa3 [<ffffffff8025d823>] error_exit+0x0/0x6e Code: 0f 0b 68 ce 50 47 80 c2 f2 08 49 8b 87 90 00 00 00 48 c7 44 RIP [<ffffffff80208b30>] __handle_mm_fault+0x379/0xf46 RSP <ffff88004a73dde8> <0>Kernel panic - not syncing: Fatal exception (XEN) Domain 0 crashed: rebooting machine in 5 seconds. This is the same bug we had in an earlier BZ, namely crashing because we reached do_no_page with VM_PFNMAP. I thought it had been fixed by the 5.1 stuff, but apparently not. I'm going to try to fix this now. Chris Lalancette Created attachment 161969 [details]
Simple patch to fix the crash
This is a simple patch to remove VM_PFNMAP flag from the privcmd mmap pages;
this matches upstream Xen, and also seems to correct the HVM live migrate
crash.
Chris Lalancette
Today's update: There are two bugs that have similar signatures: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=253479 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249409 So they might also be fixed by fixing this bug. After some additional testing, I found that this same bug can happen when live migrating HVM domains from i386->i386. I've also found that although the above patch prevents the dom0 crash on x86_64, it doesn't actually allow guests to work. Additionally, the above patch makes it so that *no* guests will even boot on i686, so something else is going on here. Talking to Rik, it seems that VM_PFNMAP is used for vma's that don't have struct page associated with them. However, VMA's that use VM_PFNMAP should *not* be specifying a nopage handler, since that can cause the crash (handle_pte_fault calls do_no_page iff the vma has a nopage handler that != NULL). Rik thinks that the entire reason we are hitting the nopage stuff, though, is that the tools have some sort of off-by-one error that is causing the page fault. The questions to answer are: 1) What is causing that additional page fault? 2) Why does upstream Xen not have VM_PFNMAP, and how do they get away with not needing it? Does it have the same problem, and we are just missing something, or is it broken as well? Chris Lalancette Created attachment 162501 [details]
A different patch to the privcmd stuff
This is another patch to attempt to solve this issue. As pointed out by Rik,
when we use VM_PFNMAP, we should not have a nopage handler, otherwise
handle_pte_fault() causes us to die. This patch totally removes the nopage
handler from privcmd. This has currently been tested to make x86_64 off-line
and live migrations work successfully. I was also able to start i686 PV and
HVM guests with this patch applied; I still need to test that migrate works on
that arch. So, to confirm that this patch fixes things, I need to:
1) Make sure that I can start, save, restore, off-line migrate, live migrate
PV and HVM guests on x86_64
2) Make sure that I can start, save, restore, off-line migrate, live migrate
PV and HVM guests on i686
3) Make sure that I can start, save, restore, off-line migrate, live migrate
PV and HVM guests on ia64
4) Make sure that this patch fixes the other, related BZ's
Chris Lalancette
Testing results: 1) x86_64: Successfully started PV and HVM guests. Successfully save/restore PV and HVM guests. Successfully off-line migrate PV and HVM guests. Successfully live migrate PV and HVM guests. 2) i686: Successfully started PV and HVM guests. Successfully save/restore PV and HVM guests. Successfully off-line migrate PV and HVM guests. Successfully live migrate PV and HVM guests. 3) ?? I don't have the hardware to test. 4) Succeeds. However, despite this testing, there is still a problem; that will be enumerated in a second post. Chris Lalancette So, despite the successful testing with the above patch, there is still a problem. I'll try to explain what the problem is, and why the above patch is needed as *part* of the solution. Note that the term "local" will mean the machine the live migrate is initiated from, and "remote" will mean the machine the live migrate will end up on. 1) When an HVM live migrate is started, the local machine sends over information about how much memory the new domain will need on the remote machine. The python tools on the remote machine dutifully take that value and balloon down dom0, freeing up memory for the domain. 2) Next, the tools on the remote machine allocate the memory for that domain. On a machine where dom0 was using all of available memory before the migrate started, this means that after this allocation, there will be *precisely* 0 additional pages for the hypervisor to hand out to domains (note that it keeps memory around for itself, but that is not relevant here). 3) However, when qemu starts up on the remote side, it needs a few additional pages for the device emulation. In particular, the Cirrus VGA does a populate_physmap followed by an xc_map_foreign_pages. It is this 3rd step that causes all of the issues. Since the HV has no more memory to hand out to domains, it actually fails the populate_physmap. However, QEMU does *not* check the return code after the populate physmap, so it blindly goes ahead and does an "xc_map_foreign_pages", on pages that have not been successfully mapped. This causes the page_fault to happen in the dom0, and causes the do_no_page() call and the BUG_ON(). So the fix here is in multiple parts. 1) Since userland should not be crashing dom0, regardless if it is doing the wrong thing (as stated in BZ 249409), I believe we need the patch that is already attached to this BZ. 2) QEMU *should* check to see if populate_physmap failed, and take appropriate action. In this case, if it fails because it is out of memory, the migration will actually succeed; it just won't have emulated video available. 3) QEMU should attempt to balloon the dom0 down the number of additional pages it needs to succeed. With the above 3 fixes in place, I am able to not only prevent the crash, but also to succeed in the migration. Note that I am still having problems with video, but I believe that is a secondary bug. Tomorrow I will work on making better versions of items 2) and 3), and pushing those upstream and internally. Chris Lalancette Comment on attachment 162501 [details] A different patch to the privcmd stuff I'm tracking the kernel side of this problem in https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249409 so this patch is stale now. Chris Lalancette Fix built into: * Fri Aug 31 2007 Daniel P. Berrange <berrange> - 3.0.3-38.el5 - Fixed memory ballooning for HVM restore (rhbz #253583) An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2007-0635.html |