Bug 630989

Summary: HVM guest w/ UP and PV driver hangs after live migration or suspend/resume [rhel-5.5.z]
Product: Red Hat Enterprise Linux 5 Reporter: RHEL Program Management <pm-rhel>
Component: kernel-xenAssignee: Jiri Pirko <jpirko>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: urgent    
Version: 5.5CC: dhoward, dmair, drjones, jpirko, jwest, leiwang, mjenner, moshiro, mrezanin, myamazak, plyons, pm-eus, rkhan, tao, xen-maint
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, migrating a hardware virtual machine (HVM) guest with both, UP and PV drivers, may have caused the guest to stop responding. With this update, HVM guest migration works as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-09 18:07:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 629773    
Bug Blocks:    

Description RHEL Program Management 2010-09-07 14:51:06 UTC
This bug has been copied from bug #629773 and has been proposed
to be backported to 5.5 z-stream (EUS).

Comment 2 Jiri Pirko 2010-10-11 08:55:24 UTC
in kernel 2.6.18-194.20.1.el5

xen-hvm-fix-up-suspend-resume-migration-w-pv-drivers.patch

Comment 4 Lei Wang 2010-10-27 10:17:14 UTC
I can reproduce and verify this bug on x86_64 platform, details as below:

Reproduce this issue with:
host:
RHEL-5.5 x86_64
kernel-xen-2.6.18-194.el5

guest:
RHEL-5.5 x86_64
guest with one vcpu and netfront vif.
(could not reproduce this issue with 32bit guest)

HVM guest will 100% hang after restore.
HVM guest will hang after migration approximately 50% of the time.

Verify this issue with: 
kernel-xen-2.6.18-194.24.1.el5
save/restore and migration works correctly.
======================================================
But on IA64 host with 2.6.18-194.24.1.el5xen, when restore the HVM guest from image, met Error:

[root@dhcp-66-82-141 bug630989]# xm restore vm3.save
Error: Restore failed
Usage: xm restore <CheckpointFile>

Restore a domain from a saved state.

Found "ERROR Internal error: HVM Restore is unsupported" in xend.log
... ...
[2010-09-18 02:05:27 xend 2879] INFO (XendCheckpoint:181) restore hvm domain 7, apic=0, pae=0
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: boot, val: dc
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: fda, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: fdb, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: soundhw, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: localtime, val: 0
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: serial, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: std-vga, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: isa, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: vcpus, val: 1
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: acpi, val: 1
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: usb, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: usbdevice, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:330) args: k, val: None
[2010-09-18 02:05:27 xend 2879] DEBUG (image:390) No VNC passwd configured for vfb access
[2010-09-18 02:05:27 xend 2879] DEBUG (XendCheckpoint:200) restore:shadow=0x0, _static_max=0x400, _static_min=0x400,
[2010-09-18 02:05:27 xend 2879] DEBUG (balloon:145) Balloon: 3076320 KiB free; need 1065024; done.
[2010-09-18 02:05:27 xend 2879] DEBUG (XendCheckpoint:217) [xc_restore]: /usr/lib/xen/bin/xc_restore 19 7 1 2 1 0 0
[2010-09-18 02:05:27 xend 2879] INFO (XendCheckpoint:353) ERROR Internal error: HVM Restore is unsupported
[2010-09-18 02:05:27 xend 2879] INFO (XendCheckpoint:353) Restore exit with rc=1
[2010-09-18 02:05:27 xend.XendDomainInfo 2879] DEBUG (XendDomainInfo:2189) XendDomainInfo.destroy: domid=7
[2010-09-18 02:05:27 xend.XendDomainInfo 2879] ERROR (XendDomainInfo:2198) XendDomainInfo.destroy: xc.domain_destroy failed.
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2194, in destroy
    xc.domain_pause(self.domid)
Error: (3, 'No such process')
[2010-09-18 02:05:27 xend.XendDomainInfo 2879] INFO (XendDomainInfo:2330) Dev 768 still active, looping...
[2010-09-18 02:05:27 xend.XendDomainInfo 2879] INFO (XendDomainInfo:2330) Dev 768 still active, looping...
[2010-09-18 02:05:27 xend.XendDomainInfo 2879] INFO (XendDomainInfo:2330) Dev 768 still active, looping...
[2010-09-18 02:05:28 xend.XendDomainInfo 2879] INFO (XendDomainInfo:2330) Dev 768 still active, looping...
[2010-09-18 02:05:28 xend.XendDomainInfo 2879] INFO (XendDomainInfo:2330) Dev 768 still active, looping...
[2010-09-18 02:05:28 xend.XendDomainInfo 2879] INFO (XendDomainInfo:2330) Dev 768 still active, looping...
[2010-09-18 02:05:28 xend.XendDomainInfo 2879] INFO (XendDomainInfo:2330) Dev 768 still active, looping...
[2010-09-18 02:05:28 xend.XendDomainInfo 2879] DEBUG (XendDomainInfo:2114) UUID Created: True
[2010-09-18 02:05:28 xend.XendDomainInfo 2879] DEBUG (XendDomainInfo:2115) Devices to release: [5], domid = 7
[2010-09-18 02:05:28 xend.XendDomainInfo 2879] DEBUG (XendDomainInfo:2127) Releasing PVFB backend devices ...
[2010-09-18 02:05:28 xend 2879] ERROR (XendDomain:284) Restore failed
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 279, in domain_restore_fd
    return XendCheckpoint.restore(self, fd, relocating=relocating)
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 221, in restore
    forkHelper(cmd, fd, handler.handler, True)
  File "/usr/lib/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 341, in forkHelper
    raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_restore 19 7 1 2 1 0 0 failed

Comment 5 Lei Wang 2010-10-27 10:29:23 UTC
Hi, Miroslav

Would you please help to confirm this issue? 
Do we need to verify this issue on IA64 platform? 
I saw the "Architecture:x86, IA64" in the Description of bug 629773 from which this bug was cloned. 
If not I think this bug could be verified according to the test result on x86_64 host in Comment 4. 

Thanks
Lei Wang

Comment 6 Miroslav Rezanina 2010-10-27 11:03:57 UTC
No, this is not issue on IA64 as HVM save/migrate is not supported on it.

Comment 7 Lei Wang 2010-10-28 01:47:26 UTC
Thanks, Miroslav

According to comment 4 and comment 6, move to VERIFIED.

Comment 9 errata-xmlrpc 2010-11-09 18:07:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0839.html

Comment 10 Martin Prpič 2010-11-11 13:56:45 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, migrating a hardware virtual machine (HVM) guest with both, UP and PV drivers, may have caused the guest to stop responding. With this update, HVM guest migration works as expected.