Bug 1413707 - live migrated domain shown as paused when receiving host does not already have a definition for the domain [NEEDINFO]
Summary: live migrated domain shown as paused when receiving host does not already hav...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Virtualization Tools
Classification: Community
Component: virt-manager
Version: unspecified
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Cole Robinson
QA Contact:
URL:
Whiteboard:
: 1247593 1388403 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-16 18:12 UTC by Jamin W. Collins
Modified: 2020-03-25 17:58 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-25 17:58:15 UTC
Embargoed:
crobinso: needinfo?


Attachments (Terms of Use)
requested virt-manager output (22.75 KB, text/plain)
2017-08-10 17:53 UTC, Jamin W. Collins
no flags Details
VMM migrate logs (11.21 KB, text/plain)
2018-08-20 21:21 UTC, phatfish
no flags Details

Description Jamin W. Collins 2017-01-16 18:12:14 UTC
When live migrating a domain from one server to another, Virtual Machine Manager incorrectly shows the migrated domain state as "Paused" on the receiving server after live migration. if the receiving server does not already have a definition for the domain.

However, if the receiving server already had a definition for the domain, Virtual Machine Manager correctly displays the domain state as "Running" after live migration.

In either case, checking the domain's state via "virsh -c qemu://${recipient}/system list" correctly shows the domain as "running".

Likewise, disconnecting and reconnecting to the receiving host will correct the domain's displayed state.  Additionally, even when the domain is displayed in Virtual Machine Manager as paused opening the domain will connect to the "running" domain. 

This leads me to believe that Virtual Machine Manager is simply missing the state change when the receiving host does not have a domain definition.

=====

Management laptop monitoring both primary and secondary VM server using virt-manager.  Connected to both servers over both qemu+tls and qemu+ssh.

Live migrate domain from primary to secondary VM server using either the Virtual Machine Manager interface or the following command:

virsh \
    -c qemu:///system \
    migrate \
    --live \
    --persistent \
    --p2p \
    --tunneled \
    --verbose \
    ${DOMAIN}
    qemu+tls://${SECONDARY}/system


All machines are running Arch Linux.
Domains are backed by CEPH RDB volumes.

management laptop
$ yaourt -Q libvirt virt-manager
community/libvirt 2.4.0-2
community/virt-manager 1.4.0-2

primary VM server
$ yaourt -Q qemu libvirt qemu-block-rbd
extra/qemu 2.8.0-1
community/libvirt 2.4.0-2
extra/qemu-block-rbd 2.8.0-1

secondary VM server
$ yaourt -Q qemu libvirt qemu-block-rbd
extra/qemu 2.8.0-1
community/libvirt 2.4.0-2
extra/qemu-block-rbd 2.8.0-1

Comment 1 Jamin W. Collins 2017-01-16 18:17:23 UTC
The above migration command should have been:

virsh \
    -c qemu:///system \
    migrate \
    --live \
    --persistent \
    --p2p \
    --tunneled \
    --verbose \
    ${DOMAIN} \
    qemu+tls://${SECONDARY}/system

Comment 2 Jamin W. Collins 2017-07-17 14:34:32 UTC
It's been a while, any update?

Comment 3 Cole Robinson 2017-07-17 14:56:10 UTC
Sorry for the delay. Please provide virt-manager --debug output, from app startup to app shutdown, when reproducing the issue

Comment 4 Jamin W. Collins 2017-08-10 17:53:45 UTC
Created attachment 1311854 [details]
requested virt-manager output

Here's the requested virt-manger output.

Comment 5 Cole Robinson 2017-08-17 19:02:32 UTC
Thanks for the info. Tough to tell from the log if this is a libvirt issue or virt-manager issue. I'll need to get a setup to reproduce

Comment 6 Jamin W. Collins 2017-08-18 15:12:28 UTC
I can gather more information and run whatever tests you need.  Just let me know.

Comment 7 Michael Chapman 2018-02-23 05:10:37 UTC
I am seeing something similar, though not related to migration.

If a domain is defined and then immediately started (through some other libvirt API client, not from virt-manager), virt-manager shows the newly created guest is Paused.

From my virt-manager debug logs I see:

  (connection:788) domain lifecycle event: domain=example event=0 reason=0
  (connection:788) domain lifecycle event: domain=example event=4 reason=0
  (connection:788) domain lifecycle event: domain=example event=2 reason=0
  (connection:1196) domain=example status=Paused added
  (connection:1190) New domain=example requested, but it's already tracked.
  (connection:1190) New domain=example requested, but it's already tracked.

Those three events are defined, resumed, started.

If I put a small delay between the point where the domain is defined and it is started, virt-manager seems happier:

  (connection:788) domain lifecycle event: domain=example event=0 reason=0
  (connection:1196) domain=example status=Shutoff added
  (connection:788) domain lifecycle event: domain=example event=4 reason=0
  (connection:788) domain lifecycle event: domain=example event=2 reason=0

It looks to me like virt-manager gets the domain state when it sees it being defined, and this can race against the domain's state transitions (though I'm not sure why that would end up with status=Paused...).

Comment 8 Cole Robinson 2018-02-27 21:39:52 UTC
*** Bug 1247593 has been marked as a duplicate of this bug. ***

Comment 9 Cole Robinson 2018-02-27 21:40:21 UTC
*** Bug 1388403 has been marked as a duplicate of this bug. ***

Comment 10 Cole Robinson 2018-03-03 20:57:03 UTC
Thanks Michael for the details. Indeed after going over the code there's a few race conditions here for sure, I couldn't reproduce through regular activity but I can manually trigger them by adding some hacks into the code. Fixing them in an efficient manner is not trivial though, but it's on my todo list for the next release

Comment 11 Dr. David Alan Gilbert 2018-03-05 09:40:20 UTC
(In reply to Cole Robinson from comment #10)
> Thanks Michael for the details. Indeed after going over the code there's a
> few race conditions here for sure, I couldn't reproduce through regular
> activity but I can manually trigger them by adding some hacks into the code.
> Fixing them in an efficient manner is not trivial though, but it's on my
> todo list for the next release

It might be worth trying to add some latency on your connection; it's not too unusual for me to get this type of failure with live-migration related things;  but I do tend to have a ~150ms latency to the host with the VMs.

Comment 12 phatfish 2018-08-20 21:19:08 UTC
I have this issue as well. VMM 1.4.3 on Fedora 27.

Migrating results in VMM showing the VM as paused, but it is still running and can be accessed through the console fine.

Temporary migration to a new host shows it as paused, migrating back to original host shows it as running.

A "persistent" migration will show paused as well. But closing and opening VMM will show it as running (this also works with the temp migration).

I attached some debug logs from VMM.

Comment 13 phatfish 2018-08-20 21:21:22 UTC
Created attachment 1477371 [details]
VMM migrate logs

Comment 14 Cole Robinson 2020-01-26 21:17:22 UTC
Been a while and virt-manager internals have changed quite a lot since then. Is anybody still reproducing this issue?


Note You need to log in before you can comment on or make changes to this bug.