Bug 1271145 - Guest OS paused after migration.
Summary: Guest OS paused after migration.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1265902
TreeView+ depends on / blocked
 
Reported: 2015-10-13 09:03 UTC by Fangge Jin
Modified: 2015-12-04 16:47 UTC (History)
15 users (show)

Fixed In Version: qemu-kvm-rhev-2.3.0-31.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-04 16:47:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
libvirtd log from target host (8.62 MB, text/plain)
2015-10-13 09:03 UTC, Fangge Jin
no flags Details
libvirtd log from source host (12.03 MB, text/plain)
2015-10-13 09:05 UTC, Fangge Jin
no flags Details
qemu log form source host (6.72 KB, text/plain)
2015-10-13 09:06 UTC, Fangge Jin
no flags Details
qemu log form target host (6.56 KB, text/plain)
2015-10-13 09:06 UTC, Fangge Jin
no flags Details
The guest XML (2.66 KB, application/xml)
2015-10-13 09:07 UTC, Fangge Jin
no flags Details
libvirtd log when test managedsave (1.15 MB, application/x-gzip)
2015-10-13 09:11 UTC, Fangge Jin
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2546 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2015-12-04 21:11:56 UTC

Description Fangge Jin 2015-10-13 09:03:38 UTC
Created attachment 1082306 [details]
libvirtd log from target host

Description of problem:
Migrating a guest whose machine type is pc-i440fx-rhel7.2.0 without --live option, after migration, guest OS doesn't respond, although "virsh list" indicates that the guest is running:
# virsh list
 Id    Name                           State
----------------------------------------------------
 10    rhel7.2                          running

Then I use "virsh suspend rhel7.2" and "virsh  resume rhel7.2", guest OS can respond again.
# virsh suspend rhel7.2
Domain rhel7.2 suspended

# virsh resume rhel7.2
Domain rhel7.2 resumed


Version-Release number of selected component (if applicable):
libvirt-1.2.17-13.el7.x86_64
qemu-kvm-rhev-2.3.0-30.el7.x86_64

How reproducible:
maybe >80%

Steps to Reproduce:
1. Prepare a running guest with machine type ='pc-i440fx-rhel7.2.0'

2. Do migration without --live:
# virsh migrate rhel7.2 qemu+ssh://10.66.5.20/system --verbose
Migration: [100 %]

3. Check the guest status on target:
# virsh list
 Id    Name                           State
----------------------------------------------------
 10    rhel7.2                          running

But actually, try to do some operation in the guest, the guest OS is not responding.

4.Suspend and resume the guest.
# virsh suspend rhel7.2
Domain rhel7.2 suspended

# virsh resume rhel7.2
Domain rhel7.2 resumed

Now try to do some operation in the guest, the guest OS can respond again.


Actual results:
As step3~4.

Expected results:
The guest can work normally after migration.

Additional info:
1) When migration with --live option, there is no such issue.
2) Machine type "pc-i440fx-rhel7.1.0"/"pc-i440fx-rhel7.0.0" have no such issue.

Comment 1 Fangge Jin 2015-10-13 09:05:44 UTC
Created attachment 1082307 [details]
libvirtd log from source host

Comment 2 Fangge Jin 2015-10-13 09:06:20 UTC
Created attachment 1082308 [details]
qemu log form source host

Comment 3 Fangge Jin 2015-10-13 09:06:53 UTC
Created attachment 1082309 [details]
qemu log form target host

Comment 4 Fangge Jin 2015-10-13 09:07:27 UTC
Created attachment 1082310 [details]
The guest XML

Comment 5 Fangge Jin 2015-10-13 09:10:10 UTC
Please also have a look at this test case:

My guest paused unexpectedly when I test managedsave, the behaviour is similar, but I'm not sure if they are same issue. And it's not 100% reproducible.  What I did generally is:

How reproducible:
< 10%

Versions:
libvirt-1.2.17-13.el7.x86_64
qemu-kvm-rhev-2.3.0-29.el7.x86_64

Steps:
1.Prepare a running guest.

2.Restart service libvirt-guests.

3.Check the state of guest:
# virsh list
 Id    Name                           State
----------------------------------------------------
 2     rhel7.2                          running

But actually, try to do some operation in the guest, the guest OS is not responding.

4.Then I use "virsh suspend rhel7.2" and "virsh resume rhel7.2", the guest can respond again.

Comment 6 Fangge Jin 2015-10-13 09:11:29 UTC
Created attachment 1082312 [details]
libvirtd log when test managedsave

Comment 7 Fangge Jin 2015-10-13 09:14:10 UTC
There is a relevant bug in libvirt which has been resolved: 1265902.
Refer to this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1265902#c29

Comment 9 Dr. David Alan Gilbert 2015-10-13 09:43:55 UTC
migration.c/process_incoming_migration_co sends the COMPLETED event before the announce self, and before the bdrv_invlidate_cache_all and before the global_state stuff - so there's a race there.

quintela: Any reason not to just move the generate_event to the bottom of that function? (and add a failed event in the other exit path)

Comment 12 Juan Quintela 2015-10-13 11:00:45 UTC
David: nothin7g wrong that I can see.  I stopped there, because migration "properly" has been finished, but nothing againsnt moving it to the end of the function.

Comment 13 Dr. David Alan Gilbert 2015-10-13 11:42:42 UTC
Posted fix upstream:

Migration: Generate the completed event only when we complete

Comment 15 Miroslav Rezanina 2015-10-14 12:33:17 UTC
Fix included in qemu-kvm-rhev-2.3.0-31.el7

Comment 18 juzhang 2015-10-15 05:42:39 UTC
According to comment17, set this issue as verified.

Best Regards,
Junyi

Comment 20 errata-xmlrpc 2015-12-04 16:47:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html


Note You need to log in before you can comment on or make changes to this bug.