Description of problem:
I've been testing out migration with various PV guests. Here's my setup:
machine1 - i686, running 5.1 Beta bits on dom0, export /var/lib/xen/images via
NFS, starts guests
machine2 - i686, running 5.1 Beta bits on dom0, mount
Both machines have had their relocation servers turned on, iptables disabled, etc.
If I start up a rhel5 GA PV guest on machine1, and:
xm migrate rhel5pv machine2
It migrates fine, and comes up on machine2.
If I take that very same guest, install the RHEL-5.1 Beta kernel (2.6.18-37 as
of this writing), then:
xm migrate rhel5pv machine2
completes, but when I connect to the console on machine2 (xm console rhel5pv), I
------------[ cut here ]------------
kernel BUG at drivers/xen/core/smpboot.c:417!
invalid opcode: 0000 [#1]
last sysfs file: /block/dm-1/range
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc xennet
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink
iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables
x_tables ipv6 parport_pc lp parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod
xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd
EIP: 0061:[<c05432c6>] Not tainted VLI
EFLAGS: 00010282 (2.6.18-37.el5xen #1)
EIP is at __cpu_up+0x7b/0x8a
eax: ffffffea ebx: 00000001 ecx: 00000001 edx: 00000000
esi: 00000001 edi: 00000000 ebp: c06d0900 esp: ed71ef10
ds: 007b es: 007b ss: 0069
Process suspend (pid: 1982, ti=ed71e000 task=c0c03000 task.ti=ed71e000)
Stack: 00000001 fffffff0 c0624c07 ed71ef30 c0431f78 00000001 ed71ef37 c0542176
696c6e6f c000656e c0c2ba00 00000000 c0546863 c067cdb0 c05fab04 00000000
2f757063 00000031 00000000 c053be62 c067d08c c067d090 c07a6640 00007ff0
Code: 28 6d c0 89 04 b5 00 29 6d c0 c6 44 2a 12 01 89 f0 e8 d3 fe ff ff f0 0f ab
35 44 66 7a c0 89 f1 89 fa e8 3e e0 eb ff 85 c0 74 08 <0f> 0b a1 01 d0 1d 63 c0
89 f8 5b 5e 5f 5d c3 e8 92 8e ec ff e8
EIP: [<c05432c6>] __cpu_up+0x7b/0x8a SS:ESP 0069:ed71ef10
<0>Kernel panic - not syncing: Fatal exception
Preliminary investigation shows that this is a failed VCPUOP_up hypercall; I'm
not quite sure why that is failing now, but it is.
The rhel5pv guest in question has 1500MB of memory, 4 vCPUs, and is using PVFB.
I'll attach the full configuration file.
Created attachment 160417 [details]
Config file for PV guest
danpb pointed out that this is equivalent to just a "save" and "restore"....so I
broke it down to a little more basic case....on machine1, I just:
xm create -c rhel5pv # domain running -37 kernel
xm save rhel5pv /var/lib/xen/save/rhel5pv-save
xm restore /var/lib/xen/save/rhel5pv-save
And I got the same crash. So it doesn't really have to do with migrating at
all, just with the restore stuff.
Grr. I may have missed a patch when pushing the save/restore fixes to Gerd for
5.1. Going to test out adding that patch back in, and see if things are better.
Yep, that was it. I added that patch back in, and things started working again.
I'm still seeing "softlockup" warnings after the restore, but I also saw that
on RHEL5 GA, so that is not something new. I'll roll up the patch and post it soon.
Created attachment 160464 [details]
Patch that fixes the crash for me
You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.