Bug 250420 - [RHEL5.1]: Off-line (non-live) migrate of a RHEL5.1 PV guest panics the guest
[RHEL5.1]: Off-line (non-live) migrate of a RHEL5.1 PV guest panics the guest
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.1
All Linux
urgent Severity urgent
: ---
: ---
Assigned To: Chris Lalancette
Martin Jenner
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-08-01 10:34 EDT by Chris Lalancette
Modified: 2007-11-16 20:14 EST (History)
2 users (show)

See Also:
Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 14:57:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Config file for PV guest (337 bytes, text/plain)
2007-08-01 10:35 EDT, Chris Lalancette
no flags Details
Patch that fixes the crash for me (2.37 KB, patch)
2007-08-01 16:58 EDT, Chris Lalancette
no flags Details | Diff

  None (edit)
Description Chris Lalancette 2007-08-01 10:34:31 EDT
Description of problem:
I've been testing out migration with various PV guests.  Here's my setup:

machine1 - i686, running 5.1 Beta bits on dom0, export /var/lib/xen/images via
NFS, starts guests
machine2 - i686, running 5.1 Beta bits on dom0, mount
machine1:/var/lib/xen/images /var/lib/xen/images

Both machines have had their relocation servers turned on, iptables disabled, etc.

If I start up a rhel5 GA PV guest on machine1, and:

xm migrate rhel5pv machine2

It migrates fine, and comes up on machine2.

If I take that very same guest, install the RHEL-5.1 Beta kernel (2.6.18-37 as
of this writing), then:

xm migrate rhel5pv machine2

completes, but when I connect to the console on machine2 (xm console rhel5pv), I
see:

------------[ cut here ]------------
kernel BUG at drivers/xen/core/smpboot.c:417!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /block/dm-1/range
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc xennet
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink
iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables
x_tables ipv6 parport_pc lp parport pcspkr dm_snapshot dm_zero dm_mirror dm_mod
xenblk ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0061:[<c05432c6>]    Not tainted VLI
EFLAGS: 00010282   (2.6.18-37.el5xen #1) 
EIP is at __cpu_up+0x7b/0x8a
eax: ffffffea   ebx: 00000001   ecx: 00000001   edx: 00000000
esi: 00000001   edi: 00000000   ebp: c06d0900   esp: ed71ef10
ds: 007b   es: 007b   ss: 0069
Process suspend (pid: 1982, ti=ed71e000 task=c0c03000 task.ti=ed71e000)
Stack: 00000001 fffffff0 c0624c07 ed71ef30 c0431f78 00000001 ed71ef37 c0542176 
       696c6e6f c000656e c0c2ba00 00000000 c0546863 c067cdb0 c05fab04 00000000 
       2f757063 00000031 00000000 c053be62 c067d08c c067d090 c07a6640 00007ff0 
Call Trace:
 [<c0431f78>] cpu_up+0x84/0xd9
 [<c0542176>] vcpu_hotplug+0x87/0xcc
 [<c0546863>] watch_otherend+0x16/0x19
 [<c05fab04>] klist_next+0xc/0x43
 [<c053be62>] bus_for_each_dev+0x4f/0x59
 [<c05421cf>] smp_resume+0x14/0x29
 [<c0542e5e>] __do_suspend+0x3c9/0x3d5
 [<c0416993>] complete+0x2b/0x3d
 [<c0542a95>] __do_suspend+0x0/0x3d5
 [<c042cca9>] kthread+0xc0/0xeb
 [<c042cbe9>] kthread+0x0/0xeb
 [<c0403005>] kernel_thread_helper+0x5/0xb
 =======================
Code: 28 6d c0 89 04 b5 00 29 6d c0 c6 44 2a 12 01 89 f0 e8 d3 fe ff ff f0 0f ab
35 44 66 7a c0 89 f1 89 fa e8 3e e0 eb ff 85 c0 74 08 <0f> 0b a1 01 d0 1d 63 c0
89 f8 5b 5e 5f 5d c3 e8 92 8e ec ff e8 
EIP: [<c05432c6>] __cpu_up+0x7b/0x8a SS:ESP 0069:ed71ef10
 <0>Kernel panic - not syncing: Fatal exception

Preliminary investigation shows that this is a failed VCPUOP_up hypercall; I'm
not quite sure why that is failing now, but it is.

The rhel5pv guest in question has 1500MB of memory, 4 vCPUs, and is using PVFB.
 I'll attach the full configuration file.
Comment 1 Chris Lalancette 2007-08-01 10:35:30 EDT
Created attachment 160417 [details]
Config file for PV guest
Comment 3 Chris Lalancette 2007-08-01 10:49:30 EDT
danpb pointed out that this is equivalent to just a "save" and "restore"....so I
broke it down to a little more basic case....on machine1, I just:

xm create -c rhel5pv # domain running -37 kernel
xm save rhel5pv /var/lib/xen/save/rhel5pv-save
xm restore /var/lib/xen/save/rhel5pv-save

And I got the same crash.  So it doesn't really have to do with migrating at
all, just with the restore stuff.

Chris Lalancette
Comment 4 Chris Lalancette 2007-08-01 11:24:10 EDT
Grr.  I may have missed a patch when pushing the save/restore fixes to Gerd for
5.1.  Going to test out adding that patch back in, and see if things are better.

Chris Lalancette
Comment 5 Chris Lalancette 2007-08-01 11:56:47 EDT
Yep, that was it.  I added that patch back in, and things started working again.
 I'm still seeing "softlockup" warnings after the restore, but I also saw that
on RHEL5 GA, so that is not something new.  I'll roll up the patch and post it soon.

Chris Lalancette
Comment 8 Chris Lalancette 2007-08-01 16:58:32 EDT
Created attachment 160464 [details]
Patch that fixes the crash for me
Comment 9 Don Zickus 2007-08-15 15:06:02 EDT
in 2.6.18-40.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 12 errata-xmlrpc 2007-11-07 14:57:21 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.