360791 – Live migration fails to complete

Bug 360791 - Live migration fails to complete

Summary: Live migration fails to complete

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	xen
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Xen Maintainance List
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-31 19:37 UTC by Rodrigo roldan
Modified:	2010-11-09 13:34 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-10-22 15:14:07 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rodrigo roldan 2007-10-31 19:37:27 UTC

Description of problem:
I cant migrating a VM.

Version-Release number of selected component (if applicable):
Kernel:2.6.18-53.el5xen

How reproducible:
root@xen1#xm migrate squid xen2 

Steps to Reproduce:
  
Actual results:
migrating-Squid                           1      511     1 ---s--   3403.1

Expected results:
Squid                           1      511     1 r-----   3403.1

Additional info:
xm log
[2007-10-31 16:30:52 xend.XendDomainInfo 13238] INFO (XendDomainInfo:941) Domain
has shutdown: name=migrating-Nagios id=1 reason=suspend.
[2007-10-31 16:30:52 xend.XendDomainInfo 13238] INFO (XendDomainInfo:941) Domain
has shutdown: name=migrating-Nagios id=1 reason=suspend.

And never migrate de VM..

Comment 1 Daniel Berrangé 2007-10-31 19:47:17 UTC

This is not nearly enough information to diagnose the problem. Please provide

  - /var/log/xen/xend.log   from the source host
  - /var/log/xen/xend-error.log  fromthe source host

  - /var/log/xen/xend.log   from the destination host
  - /var/log/xen/xend-error.log  from the destination host

  - The /etc/xen/[DOMAIN NAME]  config file for the guest in question
  - The /etc/xen/xend-config.sxp file from both hosts.

Comment 2 Rodrigo roldan 2007-10-31 20:37:24 UTC

/var/log/xen/xend.log from de Destination host
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line
1321, in check_name
    raise VmError("VM name '%s' already in use by domain %d" %
VmError: VM name 'Nagios' already in use by domain 3

But #xm list from the destination host
xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      733     8 r-----    438.4


The /etc/xen/xend-config.sxp file from both hosts.
Both hosts are same config with different ip address.

xend-unix-server yes)
(xend-relocation-server yes)
(xend-port            8000)
(xend-relocation-port 8002)
(xend-address '10.10.1.147')
(xend-relocation-address '10.10.1.147')
(xend-relocation-hosts-allow '')
(network-script network-bridge)
(vif-script vif-bridge)
(dom0-min-mem 196)
(dom0-cpus 0)
(vnc-listen '10.10.1.147')
(vncpasswd 'xxxxxx')

Comment 3 Daniel Berrangé 2007-10-31 20:50:21 UTC

I need the *FULL* log files I asked for, not merely a couple of lines. Please
attach the full logs to this ticket. I also still need the guest configuration file.

Comment 4 Alain RICHARD 2007-12-24 09:14:08 UTC

I get the same problem : since the upgrade to 5.1, I am unable to successfully migrate from one server to an other. The setup 
has not changed, only 5.0 -> 5.1 was done (and reboot under kernel2.6.18-53.el5xen).

During the migration, I alway get a problem during xm save (from /var/log/xen/xend.log) :

[2007-12-24 09:53:18 xend 7487] DEBUG (XendCheckpoint:89) [xc_save]: /usr/lib/xen/bin/xc_save 22 3 0 0 1
[2007-12-24 09:53:18 xend 7487] INFO (XendCheckpoint:351) ERROR Internal error: Couldn't enable shadow mode
[2007-12-24 09:53:18 xend 7487] INFO (XendCheckpoint:351) Save exit rc=1
[2007-12-24 09:53:18 xend 7487] ERROR (XendCheckpoint:133) Save failed on domain cube1 (3).
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 110, in save
    forkHelper(cmd, fd, saveInputHandler, False)
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 339, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_save 22 3 0 0 1 failed
[2007-12-24 09:53:18 xend.XendDomainInfo 7487] DEBUG (XendDomainInfo:1598) XendDomainInfo.resumeDomain(3)
[2007-12-24 09:53:18 xend.XendDomainInfo 7487] WARNING (XendDomainInfo:923) Domain has crashed: name=migrating-
cube1 id=3.
[2007-12-24 09:53:18 xend.XendDomainInfo 7487] INFO (XendDomainInfo:1719) Dev 51712 still active, looping...

Comment 5 Alain RICHARD 2008-07-09 12:27:26 UTC

I have successfully solved this problem by adding a dom0_mem parameter to the kernel xen line on the 
domain 0 servers :

title CentOS (2.6.18-92.1.6.el5xen)
	root (hd0,0)
	kernel /xen.gz-2.6.18-92.1.6.el5 dom0_mem=256M com2=57600,8n1 console=com2
	module /vmlinuz-2.6.18-92.1.6.el5xen ro root=/dev/vg00/root xencons=xvc console=xvc0
	module /initrd-2.6.18-92.1.6.el5xen.img

It seams that without this parameter, domain0 tries to allocate automatically physical memory between 
dom0 and domU, but this seams to fails during migration.

With this parameter, dom0 receive only the needed memory (here 256M) and I don't get migration 
failure anymore. Also this is probably a better setting to limit dom0 memory.

Comment 6 Daniel Berrangé 2008-07-09 12:44:16 UTC

Artificially limiting Dom0 is not an acceptable fix to this issue. To fix this
properly we need the complete log files and config files from a time immediately
after the migration failed

  - /var/log/xen/xend.log   from the source host
  - /var/log/xen/xend-error.log  fromthe source host

  - /var/log/xen/xend.log   from the destination host
  - /var/log/xen/xend-error.log  from the destination host

  - The /etc/xen/[DOMAIN NAME]  config file for the guest in question
  - The /etc/xen/xend-config.sxp file from both hosts.

And also 'xm info' output from both nods, and 'xm list --long' output from both
nodes

Comment 7 Chris Lalancette 2009-06-16 08:14:47 UTC

Actually, we recently put a patch into RHEL-5.4 that reduces the likelihood of live migration failing due to fragmentation (https://bugzilla.redhat.com/show_bug.cgi?id=469130).  Given that limiting dom0 memory helped, this actually could explain this situation.  Is there any chance one of the original reporters can boot their dom0 with the latest kernel here:

http://people.redhat.com/dzickus/el5/

And see if it improves the situation?

Chris Lalancette

Comment 8 Chris Lalancette 2009-10-22 15:14:07 UTC

No response from the reporters in many months, and I believe this issue is now fixed in 5.4.  I'm going to close this out as CURRENTRELEASE; if it is still a problem, please feel free to reopen the bug.

Chris Lalancette

Comment 9 Paolo Bonzini 2010-04-08 15:51:13 UTC

This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).

Note You need to log in before you can comment on or make changes to this bug.