Bug 1925556 - After a live migration, VMs cannot be paused/resumed
Summary: After a live migration, VMs cannot be paused/resumed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.6.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 2.6.1
Assignee: Jed Lejosne
QA Contact: vsibirsk
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-05 14:39 UTC by Fabian Deutsch
Modified: 2021-04-07 08:46 UTC (History)
7 users (show)

Fixed In Version: virt-operator-container-v2.6.1-4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-07 08:46:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 5010 0 None open Migrate to persistent target domains instead of transient ones 2021-02-21 07:08:17 UTC
Red Hat Product Errata RHEA-2021:1126 0 None None None 2021-04-07 08:46:36 UTC

Description Fabian Deutsch 2021-02-05 14:39:45 UTC
Description of problem:
After live migration, VMIs become based on a transient libvirt domain
Transient domains suffer multiple downsides, documented in the libvirt documentation.
The most obvious of those downsides being that pausing a VM will trigger its deletion.
Transient domains also do not survive libvirtd restarts, which can happen for various reasons.

Version-Release number of selected component (if applicable):
2.6.0

How reproducible:
Always

Steps to Reproduce:
1. Create and start a (migrate-able) VM
2. Live-migrate the VM
3. Get a shell into the new virt-launcher and run `virsh list --transient`

Actual results:
The domain is listed

Expected results:
The domain should stay persistent after a migration and show in `virsh list --persistent` instead.

Additional info:

Comment 2 lpivarc 2021-02-08 09:08:17 UTC
Addressing GA in https://github.com/kubevirt/kubevirt/pull/4982

Comment 6 sgott 2021-02-16 19:00:21 UTC
Jed, Does the PR linked in this BZ address the problem? Are there other PRs that will be linked, or should this BZ be in POST?

Comment 7 Jed Lejosne 2021-02-16 20:07:59 UTC
The problem here is that the title is very vague. ("all kind of functionality")
To be just as vague, the linked PR fixes many kind of functionality, but not all :)

More seriously, the PR make the migration target persistent (instead of transient), just like the source already is.
It means that after a migration, the VM won't suffer all the limitations associated with being transient, which can be found scattered in the libvirt documentation.
For example, one obvious downside of transient domains is that they can't be suspended to RAM.

However, the fix does not address host resource allocation, like the CPU pinning mentioned in the reproduction steps.
How would you feel about renaming this bug to "Migrated VMs can't be paused/resumed" and moving it to POST?
Then we could create a new one about CPU allocation, and potentially other ones for other problems (not sure what's up with guest agent).
One bug to address "all kind" of issues is just not practical.

Let me know if you'd like me to make the necessary changes.

Comment 8 Shaul Garbourg 2021-02-18 13:25:52 UTC
@sgott What Jed mentioned above was also agreed with QE so it will be easier for them to create test plans and retest the bug.
Please modify the bug and create additional bugs so it will be easier to retest and make sure everything is reflected properly

Comment 11 Shaul Garbourg 2021-03-29 06:08:57 UTC
back ported to release-0.36

Comment 12 vsibirsk 2021-03-29 17:11:08 UTC
verified on v2.6.1-5

After migrating VM domain remains persistent

$ oc -n supported-os-common-templates-fedora-test-fedora-os-support exec pod/virt-launcher-fedora-33-1617036829-4956977-bpv7k -c compute -it /bin/bash
[root@fedora-33-1617036829-4956977 /]# virsh list --persistent
 Id   Name                                                                                       State
----------------------------------------------------------------------------------------------------------
 1    supported-os-common-templates-fedora-test-fedora-os-support_fedora-33-1617036829-4956977   running

$ virtctl migrate fedora-33-1617036829-4956977 -n supported-os-common-templates-fedora-test-fedora-os-support
VM fedora-33-1617036829-4956977 was scheduled to migrate

$ oc get pods -n supported-os-common-templates-fedora-test-fedora-os-support
NAME                                               READY   STATUS    RESTARTS   AGE
virt-launcher-fedora-33-1617036829-4956977-bpv7k   1/1     Running   0          8m48s
virt-launcher-fedora-33-1617036829-4956977-xpmhc   1/1     Running   0          13s

$ oc -n supported-os-common-templates-fedora-test-fedora-os-support exec pod/virt-launcher-fedora-33-1617036829-4956977-xpmhc -c compute -it /bin/bash
[root@virt-launcher-fedora-33-1617036829-4956977-xpmhc /]# virsh list --persistent
 Id   Name                                                                                       State
----------------------------------------------------------------------------------------------------------
 1    supported-os-common-templates-fedora-test-fedora-os-support_fedora-33-1617036829-4956977   running

Comment 17 errata-xmlrpc 2021-04-07 08:46:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (CNV 2.6.1 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1126


Note You need to log in before you can comment on or make changes to this bug.