Bug 360791 - Live migration fails to complete
Live migration fails to complete
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.1
All Linux
low Severity high
: ---
: ---
Assigned To: Xen Maintainance List
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-31 15:37 EDT by Rodrigo roldan
Modified: 2010-11-09 08:34 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-10-22 11:14:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rodrigo roldan 2007-10-31 15:37:27 EDT
Description of problem:
I cant migrating a VM.

Version-Release number of selected component (if applicable):
Kernel:2.6.18-53.el5xen

How reproducible:
root@xen1#xm migrate squid xen2 

Steps to Reproduce:
  
Actual results:
migrating-Squid                           1      511     1 ---s--   3403.1

Expected results:
Squid                           1      511     1 r-----   3403.1

Additional info:
xm log
[2007-10-31 16:30:52 xend.XendDomainInfo 13238] INFO (XendDomainInfo:941) Domain
has shutdown: name=migrating-Nagios id=1 reason=suspend.
[2007-10-31 16:30:52 xend.XendDomainInfo 13238] INFO (XendDomainInfo:941) Domain
has shutdown: name=migrating-Nagios id=1 reason=suspend.

And never migrate de VM..
Comment 1 Daniel Berrange 2007-10-31 15:47:17 EDT
This is not nearly enough information to diagnose the problem. Please provide

  - /var/log/xen/xend.log   from the source host
  - /var/log/xen/xend-error.log  fromthe source host

  - /var/log/xen/xend.log   from the destination host
  - /var/log/xen/xend-error.log  from the destination host

  - The /etc/xen/[DOMAIN NAME]  config file for the guest in question
  - The /etc/xen/xend-config.sxp file from both hosts.
Comment 2 Rodrigo roldan 2007-10-31 16:37:24 EDT
/var/log/xen/xend.log from de Destination host
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line
1321, in check_name
    raise VmError("VM name '%s' already in use by domain %d" %
VmError: VM name 'Nagios' already in use by domain 3

But #xm list from the destination host
xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      733     8 r-----    438.4


The /etc/xen/xend-config.sxp file from both hosts.
Both hosts are same config with different ip address.

xend-unix-server yes)
(xend-relocation-server yes)
(xend-port            8000)
(xend-relocation-port 8002)
(xend-address '10.10.1.147')
(xend-relocation-address '10.10.1.147')
(xend-relocation-hosts-allow '')
(network-script network-bridge)
(vif-script vif-bridge)
(dom0-min-mem 196)
(dom0-cpus 0)
(vnc-listen '10.10.1.147')
(vncpasswd 'xxxxxx')
Comment 3 Daniel Berrange 2007-10-31 16:50:21 EDT
I need the *FULL* log files I asked for, not merely a couple of lines. Please
attach the full logs to this ticket. I also still need the guest configuration file.
Comment 4 Alain RICHARD 2007-12-24 04:14:08 EST
I get the same problem : since the upgrade to 5.1, I am unable to successfully migrate from one server to an other. The setup 
has not changed, only 5.0 -> 5.1 was done (and reboot under kernel2.6.18-53.el5xen).

During the migration, I alway get a problem during xm save (from /var/log/xen/xend.log) :

[2007-12-24 09:53:18 xend 7487] DEBUG (XendCheckpoint:89) [xc_save]: /usr/lib/xen/bin/xc_save 22 3 0 0 1
[2007-12-24 09:53:18 xend 7487] INFO (XendCheckpoint:351) ERROR Internal error: Couldn't enable shadow mode
[2007-12-24 09:53:18 xend 7487] INFO (XendCheckpoint:351) Save exit rc=1
[2007-12-24 09:53:18 xend 7487] ERROR (XendCheckpoint:133) Save failed on domain cube1 (3).
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 110, in save
    forkHelper(cmd, fd, saveInputHandler, False)
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 339, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_save 22 3 0 0 1 failed
[2007-12-24 09:53:18 xend.XendDomainInfo 7487] DEBUG (XendDomainInfo:1598) XendDomainInfo.resumeDomain(3)
[2007-12-24 09:53:18 xend.XendDomainInfo 7487] WARNING (XendDomainInfo:923) Domain has crashed: name=migrating-
cube1 id=3.
[2007-12-24 09:53:18 xend.XendDomainInfo 7487] INFO (XendDomainInfo:1719) Dev 51712 still active, looping...

Comment 5 Alain RICHARD 2008-07-09 08:27:26 EDT
I have successfully solved this problem by adding a dom0_mem parameter to the kernel xen line on the 
domain 0 servers :

title CentOS (2.6.18-92.1.6.el5xen)
	root (hd0,0)
	kernel /xen.gz-2.6.18-92.1.6.el5 dom0_mem=256M com2=57600,8n1 console=com2
	module /vmlinuz-2.6.18-92.1.6.el5xen ro root=/dev/vg00/root xencons=xvc console=xvc0
	module /initrd-2.6.18-92.1.6.el5xen.img

It seams that without this parameter, domain0 tries to allocate automatically physical memory between 
dom0 and domU, but this seams to fails during migration.

With this parameter, dom0 receive only the needed memory (here 256M) and I don't get migration 
failure anymore. Also this is probably a better setting to limit dom0 memory.

Comment 6 Daniel Berrange 2008-07-09 08:44:16 EDT
Artificially limiting Dom0 is not an acceptable fix to this issue. To fix this
properly we need the complete log files and config files from a time immediately
after the migration failed

  - /var/log/xen/xend.log   from the source host
  - /var/log/xen/xend-error.log  fromthe source host

  - /var/log/xen/xend.log   from the destination host
  - /var/log/xen/xend-error.log  from the destination host

  - The /etc/xen/[DOMAIN NAME]  config file for the guest in question
  - The /etc/xen/xend-config.sxp file from both hosts.

And also 'xm info' output from both nods, and 'xm list --long' output from both
nodes
Comment 7 Chris Lalancette 2009-06-16 04:14:47 EDT
Actually, we recently put a patch into RHEL-5.4 that reduces the likelihood of live migration failing due to fragmentation (https://bugzilla.redhat.com/show_bug.cgi?id=469130).  Given that limiting dom0 memory helped, this actually could explain this situation.  Is there any chance one of the original reporters can boot their dom0 with the latest kernel here:

http://people.redhat.com/dzickus/el5/

And see if it improves the situation?

Chris Lalancette
Comment 8 Chris Lalancette 2009-10-22 11:14:07 EDT
No response from the reporters in many months, and I believe this issue is now fixed in 5.4.  I'm going to close this out as CURRENTRELEASE; if it is still a problem, please feel free to reopen the bug.

Chris Lalancette
Comment 9 Paolo Bonzini 2010-04-08 11:51:13 EDT
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).

Note You need to log in before you can comment on or make changes to this bug.