Bug 323031 - kernel BUG at mm/memory.c:2292 while/after xen live migration
Summary: kernel BUG at mm/memory.c:2292 while/after xen live migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel-xen
Version: 7
Hardware: i686
OS: Linux
low
medium
Target Milestone: ---
Assignee: Eduardo Habkost
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 245314 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-08 11:42 UTC by Markus Schade
Modified: 2009-12-14 20:41 UTC (History)
2 users (show)

Fixed In Version: 2.6.20-2943.fc7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-29 01:46:07 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Markus Schade 2007-10-08 11:42:15 UTC
Description of problem:
kernel reports a BUG memory.c on the target system upon "xen migrate --live"
of a hvm guest between two identical machines. The guest OS is running from an
iscsi-target (exported from third source)

Version-Release number of selected component (if applicable): 2931.fc7,
2934.fc7, 2936.fc7

How reproducible:
frequently

Steps to Reproduce:
1. use to iscsi initiator to logon to the lun on both hosts
2. start hvm guest (wait till finished booting)
3. initiate a live migration (xm migrate --live guest target)
  
Actual results:
migration starts and prior/upon last part of completion the
bug occurs sending the guest os to nirvana

Expected results:
no BUG causing crash of guest ;-)

Additional info:
------------[ cut here ]------------
kernel BUG at mm/memory.c:2292!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /class/misc/evtchn/dev
Modules linked in: tun crc32c libcrc32c nfs lockd nfs_acl netbk xenblktap blkbk
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack
nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4
nls_utf8 cifs sunrpc ipv6 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core
ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_multipath video sbs i2c_ec
dock button battery asus_acpi backlight ac snd_hda_intel snd_hda_codec
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm e1000 snd_timer i2c_i801 snd soundcore snd_page_alloc
i2c_core serial_core serio_raw sr_mod cdrom pcspkr sg dm_snapshot dm_zero
dm_mirror dm_mod pata_marvell ata_generic ata_piix libata sd_mod scsi_mod ext3
jbd mbcache ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0061:[<c105c740>]    Not tainted VLI
EFLAGS: 00010202   (2.6.20-2936.fc7xen #1)
EIP is at __handle_mm_fault+0x29c/0x1169
eax: c12e48b8   ebx: 00000000   ecx: 00000000   edx: 00000178
esi: eb6b4cd4   edi: 00000000   ebp: eb06af10   esp: eb06ae0c
ds: 007b   es: 007b   ss: 0069
Process qemu-dm (pid: 3718, ti=eb06a000 task=ebb174f0 task.ti=eb06a000)
Stack: 00000000 00697d60 c12cbe0c c12cbd80 eb191000 00000001 fffffff2 eb06ae30
       c10529e5 b522f000 eb6b4cd4 c0795a80 00000000 00000000 00000000 00000000
       00000000 00000000 c2063000 c1697d48 eb190178 c0795a80 eb191000 00000001
Call Trace:
 [<c1005e65>] show_trace_log_lvl+0x1a/0x2f
 [<c1005f15>] show_stack_log_lvl+0x9b/0xa3
 [<c10060b1>] show_registers+0x194/0x26a
 [<c10062c1>] die+0x13a/0x24f
 [<c11fb943>] do_trap+0x79/0x91
 [<c100686a>] do_invalid_op+0x97/0xa1
 [<c11fb82d>] error_code+0x35/0x3c
 [<c11fd168>] do_page_fault+0x70a/0xbe0
 [<c11fb82d>] error_code+0x35/0x3c
 =======================
Code: 07 00 00 8b b5 24 ff ff ff 8b 46 44 85 c0 0f 84 9a 04 00 00 83 78 08 00 0f
84 54 04 00 00 c7 45 f0 02 00 00 00 f6 46 19 04 74 04 <0f> 0b eb fe 8b bd 24 ff
ff ff 8b 47 4c c7 85 6c ff ff ff 00 00
EIP: [<c105c740>] __handle_mm_fault+0x29c/0x1169 SS:ESP 0069:eb06ae0c

Comment 1 Eduardo Habkost 2007-10-08 13:22:09 UTC
It seems to be the same BUG_ON() of bug #254208.

Comment 2 Eduardo Habkost 2007-10-08 16:36:48 UTC
Could you test the package at:
http://koji.fedoraproject.org/koji/taskinfo?taskID=186943

It includes a fix for the Oops you are getting.

It is possible that the problem move to user-space, as the expected behaviour 
instead of the Oops is the process getting a SIGBUS, and I don't know if 
qemu-dm is really expecting a SIGBUS.

Comment 3 Markus Schade 2007-10-09 08:18:59 UTC
The BUG doesn't occur anymore. But while the migration is now successful most of
the time, it occasionally leaves a dead (no status, all dashes) guest behind
without any visible error in any log. However, I guess this is a problem with xen,
not the kernel.

Comment 4 Eduardo Habkost 2007-10-09 12:32:06 UTC
I am marking as MODIFIED as the fix is on CVS and will go to the next F-7 
update.

This bug can be cloned to the 'xen' component for the dead domain problem.

When the dead guest appears, is the migration successful, or it is aborted?

Comment 5 Markus Schade 2007-10-09 12:49:33 UTC
The dead guest is on the target, so the migration itself is successful.
Thanks for the fix!

Comment 6 Eduardo Habkost 2007-10-15 18:08:33 UTC
*** Bug 245314 has been marked as a duplicate of this bug. ***

Comment 7 Fedora Update System 2007-11-09 23:38:29 UTC
kernel-xen-2.6-2.6.20-2943.fc7 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel-xen-2.6'

Comment 8 Fedora Update System 2007-11-29 01:46:00 UTC
kernel-xen-2.6-2.6.20-2943.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.