Bug 629773 - HVM guest w/ UP and PV driver hangs after live migration or suspend/resume
Summary: HVM guest w/ UP and PV driver hangs after live migration or suspend/resume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.5
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Miroslav Rezanina
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 573926 (view as bug list)
Depends On:
Blocks: 630987 630989
TreeView+ depends on / blocked
 
Reported: 2010-09-02 22:29 UTC by Bill Braswell
Modified: 2018-10-27 13:16 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, migrating a hardware virtual machine (HVM) guest with both, UP and PV drivers, may have caused the guest to stop responding. With this update, HVM guest migration works as expected.
Clone Of:
Environment:
Last Closed: 2011-01-13 21:15:18 UTC


Attachments (Terms of Use)
Patch fixing save/restore with xen vnif device (1.26 KB, patch)
2010-09-07 09:10 UTC, Miroslav Rezanina
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Bill Braswell 2010-09-02 22:29:21 UTC
Description of problem:
The live migration of HVM guest w/ UP + PV driver causes its guest to hang approximately 50% of the time.  The customer does not  experience this problem w/o UP and w/o PV driver.  Live migration is similar to resume/restore.  If the customer does a restore on same machine, it causes domain to hang  100% of the time.

The dump trace is follows.
crash> bt -a
PID: 2595   TASK: ffff81001362f0c0  CPU: 0   COMMAND: 
"suspend"
#0 [ffff8100115abca0] schedule at ffffffff80063f96
#1 [ffff8100115abca8] thread_return at ffffffff80063ff8
#2 [ffff8100115abd78] read_reply at ffffffff880fc1fa
#3 [ffff8100115abe68] _spin_unlock_irqrestore at ffffffff80065b50
#4 [ffff8100115abe98] __xen_suspend at ffffffff880fb96a
#5 [ffff8100115abed8] xen_suspend at ffffffff880fb605
#6 [ffff8100115abee8] kthread at ffffffff80032bdc
#7 [ffff8100115abf48] kernel_thread at ffffffff8005efb1

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Version Number:5.5
Release Number:
Architecture:x86, IA64
Kernel Version:2.6.18-194.el5xen
Related Package Version:none
Related Middleware / Application:none

How reproducible:
create HVM w/ UP w/ PV driver
do suspend/resume.
  
Actual results:
HVM guest hungs

Expected results:
Guest does not hang

Additional info:
I am seeing if we can get the dump

Comment 4 Miroslav Rezanina 2010-09-03 06:12:18 UTC
This is UP variant of BZ #555910.

Comment 5 Andrew Jones 2010-09-05 14:05:57 UTC
As Miroslav says, this is the UP variant of bug 555910, but we can take a fresh look at both cases, starting with the UP case this time. I've tried a couple experimental patches, but haven't had any luck keeping xenbus from jumping in on the suspend. Unfortunately upstream code is quite a bit different in this area, but we should still attempt this with an upstream 2.6.18-based kernel running on the full virt guest to see what happens.

Comment 8 Miroslav Rezanina 2010-09-07 09:10:56 UTC
Created attachment 443458 [details]
Patch fixing save/restore with xen vnif device

This is a backport of c/s 15691 fixing this problem.

Comment 9 Miroslav Rezanina 2010-09-07 11:11:45 UTC
*** Bug 573926 has been marked as a duplicate of this bug. ***

Comment 13 Miroslav Rezanina 2010-09-15 10:23:46 UTC
Test packages containing fix can be downloaded from people.redhat.com/mrezanin/bz629773.

Comment 16 Jarod Wilson 2010-09-17 14:03:19 UTC
in kernel-2.6.18-222.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 19 Jaromir Hradilek 2010-10-12 22:42:27 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, migrating a hardware virtual machine (HVM) guest with both UP and PV drivers may have caused the guest to stop responding. With this update, HVM guest migration works as expected.

Comment 20 Martin Prpič 2010-11-11 13:56:55 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, migrating a hardware virtual machine (HVM) guest with both UP and PV drivers may have caused the guest to stop responding. With this update, HVM guest migration works as expected.+Previously, migrating a hardware virtual machine (HVM) guest with both, UP and PV drivers, may have caused the guest to stop responding. With this update, HVM guest migration works as expected.

Comment 21 Yufang Zhang 2010-11-18 05:40:59 UTC
QA verified this bug with kernel-xen-2.6.18-232.el5:

1. Start a RHEL-5.4 HVM guest with vcpus=1 and memory=1024, also using netfront as the network device type.

2. save and restore the guest

For kernel-xen package(kernel-xen-2.6.18-194.8.1.el5.x86_64.rpm) without the patch, the guest would hang there after migration. But for kernel-xen-2.6.18-232.el5, the guest runs well after restore.


So change this bug to VERIFIED.

Comment 23 errata-xmlrpc 2011-01-13 21:15:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.