Bug 513431

Summary: PV guest migrations happen even if there is no image file on the destination side.
Product: Red Hat Enterprise Linux 5 Reporter: Gurhan Ozen <gozen>
Component: xenAssignee: Michal Novotny <minovotn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: areis, casmith, clalance, jburke, leiwang, llim, minovotn, pbonzini, syeghiay, xen-maint, yuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: xen-3.0.3-112.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 22:17:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514498    
Attachments:
Description Flags
xend.log file from *source* machine
none
Add check for disk image existence on restore/migrate none

Description Gurhan Ozen 2009-07-23 15:55:34 UTC
Description of problem:
# xm migrate x86_64_hvm_guest tyan-gt24-07.rhts.bos.redhat.com
Error: /usr/lib64/xen/bin/xc_save 19 9 0 0 4 failed
Usage: xm migrate <Domain> <Host>

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=portnum, --port=portnum
                     Use specified port for migration.
-r=MBIT, --resource=MBIT
                     Set level of resource usage for migration.


From xend.log:

[2009-07-23 11:43:57 xend 5146] DEBUG (XendCheckpoint:89) [xc_save]: /usr/lib64/xen/bin/xc_save 19 9 0 0 4
[2009-07-23 11:43:57 xend 5146] DEBUG (XendCheckpoint:324) suspend
[2009-07-23 11:43:57 xend 5146] DEBUG (XendCheckpoint:92) In saveInputHandler suspend
[2009-07-23 11:43:57 xend 5146] DEBUG (XendCheckpoint:94) Suspending 9 ...
[2009-07-23 11:43:57 xend.XendDomainInfo 5146] DEBUG (XendDomainInfo:1256) XendDomainInfo.handleShutdownWatch
[2009-07-23 11:43:57 xend.XendDomainInfo 5146] DEBUG (XendDomainInfo:1256) XendDomainInfo.handleShutdownWatch
[2009-07-23 11:43:57 xend.XendDomainInfo 5146] INFO (XendDomainInfo:1214) Domain has shutdown: name=migrating-x86_64_hvm_guest id=9 reason=suspend.
[2009-07-23 11:43:57 xend 5146] INFO (XendCheckpoint:99) Domain 9 suspended.
[2009-07-23 11:43:57 xend 5146] INFO (XendCheckpoint:104) release_devices for hvm domain
[2009-07-23 11:43:57 xend 5146] INFO (image:482) use sigusr1 to signal qemu 3044
[2009-07-23 11:43:57 xend 5146] DEBUG (XendCheckpoint:108) Written done
[2009-07-23 11:43:57 xend 5146] INFO (XendCheckpoint:353) Saving memory pages: iter 1   0%ERROR Internal error: Error when writing to state file (2) (errno 32)
[2009-07-23 11:43:57 xend 5146] INFO (XendCheckpoint:353) Save exit rc=1
[2009-07-23 11:43:57 xend 5146] ERROR (XendCheckpoint:133) Save failed on domain x86_64_hvm_guest (9).
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 110, in save
    forkHelper(cmd, fd, saveInputHandler, False)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 341, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_save 19 9 0 0 4 failed
[2009-07-23 11:43:57 xend.XendDomainInfo 5146] DEBUG (XendDomainInfo:2095) XendDomainInfo.resumeDomain(9)
[2009-07-23 11:43:57 xend 5146] DEBUG (XendCheckpoint:136) XendCheckpoint.save: resumeDomain


This seems to be recent regression as these worked as recent as 0715.0 trees. It's only happening to HVM guests, paravirt ones seem to be fine. Also, even though the error message says there is an error on the save stage, I can successfully save/restore this domains.  I ran into same issue with both xen and libvirt apis.

Another bit of info is that, after the migration failure the guests seem to crash:
# xm console x86_64_hvm_guest
xenconsole: Could not open tty `/dev/pts/1': No such file or directory


Version-Release number of selected component (if applicable):
# rpm -qa | grep xen 
xen-3.0.3-90.el5
kernel-xen-2.6.18-159.el5
xen-libs-3.0.3-90.el5
xen-libs-3.0.3-90.el5
kernel-xen-devel-2.6.18-159.el5


How reproducible:
Everytime.

Steps to Reproduce:
1. Install the latest nightly. Install hvm guests. Try to migrate the hvm guests.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Jiri Denemark 2009-07-24 09:53:55 UTC
Hmm, I can't reproduce it... Could you provide xend.log from the target machine? When migrating, xend.log from the source machine is rarely useful.

Thanks

Comment 3 Paolo Bonzini 2009-07-24 11:19:39 UTC
I can reproduce this.  After migration, the guest is indeed in a "confused" state:

Name                                      ID Mem(MiB) VCPUs State   Time(s)
RHEL5-64-HVM                               2      511     1 ------      0.4

Comment 4 Paolo Bonzini 2009-07-24 11:21:26 UTC
Created attachment 355000 [details]
xend.log file from *source* machine

Comment 5 Paolo Bonzini 2009-07-24 11:42:53 UTC
I stand corrected, I cannot reproduce this.  The target machine didn't have VT enabled.  The xend.log file there is quite clear:

[2009-07-24 13:17:26 xend 5739] ERROR (XendDomain:285) Restore failed
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 280, in domain_restore_fd
    return XendCheckpoint.restore(self, fd, relocating=relocating)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 166, in restore
    dominfo = xd.restore_(vmconfig, relocating)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 305, in restore_
    dominfo = XendDomainInfo.restore(config, relocating)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 292, in restore
    vm.construct()
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1787, in construct
    raise VmError("HVM guest support is unavailable: is VT/AMD-V "
VmError: HVM guest support is unavailable: is VT/AMD-V supported by your CPU and enabled in your BIOS?

Maybe something similar is happening for you?

Comment 6 Paolo Bonzini 2009-07-24 11:43:26 UTC
I'm creating another bug for the "confusion upon failed migration".

Comment 7 Gurhan Ozen 2009-07-24 14:49:14 UTC
(In reply to comment #5)
>     raise VmError("HVM guest support is unavailable: is VT/AMD-V "
> VmError: HVM guest support is unavailable: is VT/AMD-V supported by your CPU
> and enabled in your BIOS?
> 
> Maybe something similar is happening for you?  

My tests were run over rhts which was scheduled to run in machines that were hvm enabled. However, i'll go in and check to ensure and also to get xend.log from the destination machine and will update the BZ.

Comment 8 Gurhan Ozen 2009-07-24 15:44:44 UTC
(In reply to comment #7)
> (In reply to comment #5)
> >     raise VmError("HVM guest support is unavailable: is VT/AMD-V "
> > VmError: HVM guest support is unavailable: is VT/AMD-V supported by your CPU
> > and enabled in your BIOS?
> > 
> > Maybe something similar is happening for you?  
> 
> My tests were run over rhts which was scheduled to run in machines that were
> hvm enabled. However, i'll go in and check to ensure and also to get xend.log
> from the destination machine and will update the BZ.  

Oy.. Looks like the setup was wrong, both sides had HVM machines however during installation the destination didn't get the shared drive with image files set up correctly. 
  What has really confused me was that the paravirt migrations still do happen, even though there are no images of them on the destination machine either. How could this happen? Shouldn't it also error like hvm guests happen?
  Upon closer inspection, the paravirt guests aren't usable after the migration, you can connect to console but can't do much, nonetheless the migration still happens. Should this be an issue? Paolo , what's the number of the other bug you've created?

Comment 9 Paolo Bonzini 2009-07-24 16:29:49 UTC
Jiri told me he was already aware of that bug, it is https://bugzilla.redhat.com/show_bug.cgi?id=486308

Comment 11 Paolo Bonzini 2009-07-25 01:15:57 UTC
Gurhan, are you really sure this is a regression (since you changed the summary)?

Comment 12 Jiri Denemark 2009-07-27 12:50:24 UTC
Yeah, since this is a completely different bug which has been with us for-probably-ever, we should remove Regression keyword... Any objections, Gurhan?

Comment 14 Gurhan Ozen 2009-07-27 14:58:17 UTC
(In reply to comment #12)
> Yeah, since this is a completely different bug which has been with us
> for-probably-ever, we should remove Regression keyword... Any objections,
> Gurhan?  

Nope not at all, sorry i forgot to remove the keyword when i made changes to the BZ.

Comment 16 Michal Novotny 2010-03-22 13:04:36 UTC
So, does this affect PV or HVM guests? There's something about HVM guests but title states something about PV guest migration.

Thanks,
Michal

Comment 18 Michal Novotny 2010-06-09 11:46:56 UTC
Well, I was able to reproduce it so this is still the issue. Unfortunately situation here is not that easy since the guest memory gets transferred to the target machine but then it fails with reason that the device could not be connected:

[2010-06-09 13:12:57 xend 4400] ERROR (XendCheckpoint:279) Device 51728 (vbd) could not be connected. /home2/test.img does not exist.
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 277, in restore
    dominfo.waitForDevices() # Wait for backends to set up
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2423, in waitForDevices
    self.waitForDevices_(c)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1442, in waitForDevices_
    return self.getDeviceController(deviceClass).waitForDevices()
  File "/usr/lib64/python2.4/site-packages/xen/xend/server/DevController.py", line 162, in waitForDevices
    return map(self.waitForDevice, self.deviceIDs())
  File "/usr/lib64/python2.4/site-packages/xen/xend/server/DevController.py", line 183, in waitForDevice
    raise VmError("Device %s (%s) could not be connected. "
VmError: Device 51728 (vbd) could not be connected. /home2/test.img does not exist.

The guest was working for me since I've been having using it as xvdb device but the device was not present in the guest so when having the boot disk inaccessible it won't be working at all. However, since the data are transferred in the beginning of the connection (before the guest memory) it should be checked here and an error could be returned before the memory is being transferred at all resulting into guest not being migrated and to resume on the source machine. I'll try to investigate this further.

Michal

Comment 19 Michal Novotny 2010-06-09 12:38:55 UTC
Created attachment 422529 [details]
Add check for disk image existence on restore/migrate

Well, investigation revealed that there's a very good option to fix it since prior to the guest's memory gets transferred there is a VM configuration transfer so I just defined a function to parse the disk unames for vbd and tap devices and check for the existence. If the file doesn't exists an error is returned as can be seen on following lines:

[2010-06-09 14:22:20 xend 16856] ERROR (XendDomain:284) Restore failed
Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 279, in domain_restore_fd
    return XendCheckpoint.restore(self, fd, relocating=relocating)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 182, in restore
    check_file_exists(vmconfig)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 160, in check_file_exists
    raise XendError("Disk image %s doesn't exist" % fn)
XendError: Disk image /home2/test.img doesn't exist

It's been tested with both PV and HVM guests and it was working fine.

Upstream doesn't seem to have this fixed but unfortunately I'm now having 2 machines with working upstream Xen so I need to install second machine with upstream Xen-4.1 to see whether it's an issue or not.

Michal

Comment 20 Paolo Bonzini 2010-06-10 13:48:13 UTC
+        if (type(x) != str):
+            if x[0] == 'device':
+                if x[1][0] in ('tap', 'vbd'):

 if type(x) != str and x[0] == 'device' and x[1][0] in ('tap', 'vbd'):

?

or maybe

 if type(x) == list and x[0] == 'device' and x[1][0] in ('tap', 'vbd'):

+                        if (type(p) != str):
+                            if p[0] == 'uname':

Likewise,

 if type(p) == list and p[0] == 'uname':

would help saving some indentation.  Otherwise looks fine!

Comment 27 errata-xmlrpc 2011-01-13 22:17:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0031.html