Bug 250420 - [RHEL5.1]: Off-line (non-live) migrate of a RHEL5.1 PV guest panics the guest
Summary: [RHEL5.1]: Off-line (non-live) migrate of a RHEL5.1 PV guest panics the guest
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.1
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Chris Lalancette
QA Contact: Martin Jenner
Keywords: Regression
Depends On:
TreeView+ depends on / blocked
Reported: 2007-08-01 14:34 UTC by Chris Lalancette
Modified: 2007-11-17 01:14 UTC (History)
2 users (show)

Clone Of:
Last Closed: 2007-11-07 19:57:21 UTC

Attachments (Terms of Use)
Config file for PV guest (337 bytes, text/plain)
2007-08-01 14:35 UTC, Chris Lalancette
no flags Details
Patch that fixes the crash for me (2.37 KB, patch)
2007-08-01 20:58 UTC, Chris Lalancette
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0959 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5 Update 1 2007-11-08 00:47:37 UTC

Description Chris Lalancette 2007-08-01 14:34:31 UTC
Description of problem:
I've been testing out migration with various PV guests.  Here's my setup:

machine1 - i686, running 5.1 Beta bits on dom0, export /var/lib/xen/images via
NFS, starts guests
machine2 - i686, running 5.1 Beta bits on dom0, mount
machine1:/var/lib/xen/images /var/lib/xen/images

Both machines have had their relocation servers turned on, iptables disabled, etc.

If I start up a rhel5 GA PV guest on machine1, and:

xm migrate rhel5pv machine2

It migrates fine, and comes up on machine2.

If I take that very same guest, install the RHEL-5.1 Beta kernel (2.6.18-37 as
of this writing), then:

xm migrate rhel5pv machine2

completes, but when I connect to the console on machine2 (xm console rhel5pv), I

------------[ cut here ]------------
kernel BUG at drivers/xen/core/smpboot.c:417!
invalid opcode: 0000 [#1]
last sysfs file: /block/dm-1/range
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc xennet
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink
iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables
x_tables ipv6 parport_pc lp parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod
xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0061:[<c05432c6>]    Not tainted VLI
EFLAGS: 00010282   (2.6.18-37.el5xen #1) 
EIP is at __cpu_up+0x7b/0x8a
eax: ffffffea   ebx: 00000001   ecx: 00000001   edx: 00000000
esi: 00000001   edi: 00000000   ebp: c06d0900   esp: ed71ef10
ds: 007b   es: 007b   ss: 0069
Process suspend (pid: 1982, ti=ed71e000 task=c0c03000 task.ti=ed71e000)
Stack: 00000001 fffffff0 c0624c07 ed71ef30 c0431f78 00000001 ed71ef37 c0542176 
       696c6e6f c000656e c0c2ba00 00000000 c0546863 c067cdb0 c05fab04 00000000 
       2f757063 00000031 00000000 c053be62 c067d08c c067d090 c07a6640 00007ff0 
Call Trace:
 [<c0431f78>] cpu_up+0x84/0xd9
 [<c0542176>] vcpu_hotplug+0x87/0xcc
 [<c0546863>] watch_otherend+0x16/0x19
 [<c05fab04>] klist_next+0xc/0x43
 [<c053be62>] bus_for_each_dev+0x4f/0x59
 [<c05421cf>] smp_resume+0x14/0x29
 [<c0542e5e>] __do_suspend+0x3c9/0x3d5
 [<c0416993>] complete+0x2b/0x3d
 [<c0542a95>] __do_suspend+0x0/0x3d5
 [<c042cca9>] kthread+0xc0/0xeb
 [<c042cbe9>] kthread+0x0/0xeb
 [<c0403005>] kernel_thread_helper+0x5/0xb
Code: 28 6d c0 89 04 b5 00 29 6d c0 c6 44 2a 12 01 89 f0 e8 d3 fe ff ff f0 0f ab
35 44 66 7a c0 89 f1 89 fa e8 3e e0 eb ff 85 c0 74 08 <0f> 0b a1 01 d0 1d 63 c0
89 f8 5b 5e 5f 5d c3 e8 92 8e ec ff e8 
EIP: [<c05432c6>] __cpu_up+0x7b/0x8a SS:ESP 0069:ed71ef10
 <0>Kernel panic - not syncing: Fatal exception

Preliminary investigation shows that this is a failed VCPUOP_up hypercall; I'm
not quite sure why that is failing now, but it is.

The rhel5pv guest in question has 1500MB of memory, 4 vCPUs, and is using PVFB.
 I'll attach the full configuration file.

Comment 1 Chris Lalancette 2007-08-01 14:35:30 UTC
Created attachment 160417 [details]
Config file for PV guest

Comment 3 Chris Lalancette 2007-08-01 14:49:30 UTC
danpb pointed out that this is equivalent to just a "save" and "restore"....so I
broke it down to a little more basic case....on machine1, I just:

xm create -c rhel5pv # domain running -37 kernel
xm save rhel5pv /var/lib/xen/save/rhel5pv-save
xm restore /var/lib/xen/save/rhel5pv-save

And I got the same crash.  So it doesn't really have to do with migrating at
all, just with the restore stuff.

Chris Lalancette

Comment 4 Chris Lalancette 2007-08-01 15:24:10 UTC
Grr.  I may have missed a patch when pushing the save/restore fixes to Gerd for
5.1.  Going to test out adding that patch back in, and see if things are better.

Chris Lalancette

Comment 5 Chris Lalancette 2007-08-01 15:56:47 UTC
Yep, that was it.  I added that patch back in, and things started working again.
 I'm still seeing "softlockup" warnings after the restore, but I also saw that
on RHEL5 GA, so that is not something new.  I'll roll up the patch and post it soon.

Chris Lalancette

Comment 8 Chris Lalancette 2007-08-01 20:58:32 UTC
Created attachment 160464 [details]
Patch that fixes the crash for me

Comment 9 Don Zickus 2007-08-15 19:06:02 UTC
in 2.6.18-40.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 12 errata-xmlrpc 2007-11-07 19:57:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.