Description of problem: I've been testing out migration with various PV guests. Here's my setup: machine1 - i686, running 5.1 Beta bits on dom0, export /var/lib/xen/images via NFS, starts guests machine2 - i686, running 5.1 Beta bits on dom0, mount machine1:/var/lib/xen/images /var/lib/xen/images Both machines have had their relocation servers turned on, iptables disabled, etc. If I start up a rhel5 GA PV guest on machine1, and: xm migrate rhel5pv machine2 It migrates fine, and comes up on machine2. If I take that very same guest, install the RHEL-5.1 Beta kernel (2.6.18-37 as of this writing), then: xm migrate rhel5pv machine2 completes, but when I connect to the console on machine2 (xm console rhel5pv), I see: ------------[ cut here ]------------ kernel BUG at drivers/xen/core/smpboot.c:417! invalid opcode: 0000 [#1] SMP last sysfs file: /block/dm-1/range Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc xennet ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 parport_pc lp parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0061:[<c05432c6>] Not tainted VLI EFLAGS: 00010282 (2.6.18-37.el5xen #1) EIP is at __cpu_up+0x7b/0x8a eax: ffffffea ebx: 00000001 ecx: 00000001 edx: 00000000 esi: 00000001 edi: 00000000 ebp: c06d0900 esp: ed71ef10 ds: 007b es: 007b ss: 0069 Process suspend (pid: 1982, ti=ed71e000 task=c0c03000 task.ti=ed71e000) Stack: 00000001 fffffff0 c0624c07 ed71ef30 c0431f78 00000001 ed71ef37 c0542176 696c6e6f c000656e c0c2ba00 00000000 c0546863 c067cdb0 c05fab04 00000000 2f757063 00000031 00000000 c053be62 c067d08c c067d090 c07a6640 00007ff0 Call Trace: [<c0431f78>] cpu_up+0x84/0xd9 [<c0542176>] vcpu_hotplug+0x87/0xcc [<c0546863>] watch_otherend+0x16/0x19 [<c05fab04>] klist_next+0xc/0x43 [<c053be62>] bus_for_each_dev+0x4f/0x59 [<c05421cf>] smp_resume+0x14/0x29 [<c0542e5e>] __do_suspend+0x3c9/0x3d5 [<c0416993>] complete+0x2b/0x3d [<c0542a95>] __do_suspend+0x0/0x3d5 [<c042cca9>] kthread+0xc0/0xeb [<c042cbe9>] kthread+0x0/0xeb [<c0403005>] kernel_thread_helper+0x5/0xb ======================= Code: 28 6d c0 89 04 b5 00 29 6d c0 c6 44 2a 12 01 89 f0 e8 d3 fe ff ff f0 0f ab 35 44 66 7a c0 89 f1 89 fa e8 3e e0 eb ff 85 c0 74 08 <0f> 0b a1 01 d0 1d 63 c0 89 f8 5b 5e 5f 5d c3 e8 92 8e ec ff e8 EIP: [<c05432c6>] __cpu_up+0x7b/0x8a SS:ESP 0069:ed71ef10 <0>Kernel panic - not syncing: Fatal exception Preliminary investigation shows that this is a failed VCPUOP_up hypercall; I'm not quite sure why that is failing now, but it is. The rhel5pv guest in question has 1500MB of memory, 4 vCPUs, and is using PVFB. I'll attach the full configuration file.
Created attachment 160417 [details] Config file for PV guest
danpb pointed out that this is equivalent to just a "save" and "restore"....so I broke it down to a little more basic case....on machine1, I just: xm create -c rhel5pv # domain running -37 kernel xm save rhel5pv /var/lib/xen/save/rhel5pv-save xm restore /var/lib/xen/save/rhel5pv-save And I got the same crash. So it doesn't really have to do with migrating at all, just with the restore stuff. Chris Lalancette
Grr. I may have missed a patch when pushing the save/restore fixes to Gerd for 5.1. Going to test out adding that patch back in, and see if things are better. Chris Lalancette
Yep, that was it. I added that patch back in, and things started working again. I'm still seeing "softlockup" warnings after the restore, but I also saw that on RHEL5 GA, so that is not something new. I'll roll up the patch and post it soon. Chris Lalancette
Created attachment 160464 [details] Patch that fixes the crash for me
in 2.6.18-40.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html