Bug 1925556

Summary: After a live migration, VMs cannot be paused/resumed
Product: Container Native Virtualization (CNV) Reporter: Fabian Deutsch <fdeutsch>
Component: VirtualizationAssignee: Jed Lejosne <jlejosne>
Status: CLOSED ERRATA QA Contact: vsibirsk
Severity: high Docs Contact:
Priority: urgent    
Version: 2.6.0CC: cnv-qe-bugs, jlejosne, lpivarc, rgarcia, sgarbour, sgott, vromanso
Target Milestone: ---   
Target Release: 2.6.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: virt-operator-container-v2.6.1-4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-07 08:46:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fabian Deutsch 2021-02-05 14:39:45 UTC
Description of problem:
After live migration, VMIs become based on a transient libvirt domain
Transient domains suffer multiple downsides, documented in the libvirt documentation.
The most obvious of those downsides being that pausing a VM will trigger its deletion.
Transient domains also do not survive libvirtd restarts, which can happen for various reasons.

Version-Release number of selected component (if applicable):
2.6.0

How reproducible:
Always

Steps to Reproduce:
1. Create and start a (migrate-able) VM
2. Live-migrate the VM
3. Get a shell into the new virt-launcher and run `virsh list --transient`

Actual results:
The domain is listed

Expected results:
The domain should stay persistent after a migration and show in `virsh list --persistent` instead.

Additional info:

Comment 2 lpivarc 2021-02-08 09:08:17 UTC
Addressing GA in https://github.com/kubevirt/kubevirt/pull/4982

Comment 6 sgott 2021-02-16 19:00:21 UTC
Jed, Does the PR linked in this BZ address the problem? Are there other PRs that will be linked, or should this BZ be in POST?

Comment 7 Jed Lejosne 2021-02-16 20:07:59 UTC
The problem here is that the title is very vague. ("all kind of functionality")
To be just as vague, the linked PR fixes many kind of functionality, but not all :)

More seriously, the PR make the migration target persistent (instead of transient), just like the source already is.
It means that after a migration, the VM won't suffer all the limitations associated with being transient, which can be found scattered in the libvirt documentation.
For example, one obvious downside of transient domains is that they can't be suspended to RAM.

However, the fix does not address host resource allocation, like the CPU pinning mentioned in the reproduction steps.
How would you feel about renaming this bug to "Migrated VMs can't be paused/resumed" and moving it to POST?
Then we could create a new one about CPU allocation, and potentially other ones for other problems (not sure what's up with guest agent).
One bug to address "all kind" of issues is just not practical.

Let me know if you'd like me to make the necessary changes.

Comment 8 Shaul Garbourg 2021-02-18 13:25:52 UTC
@sgott What Jed mentioned above was also agreed with QE so it will be easier for them to create test plans and retest the bug.
Please modify the bug and create additional bugs so it will be easier to retest and make sure everything is reflected properly

Comment 11 Shaul Garbourg 2021-03-29 06:08:57 UTC
back ported to release-0.36

Comment 12 vsibirsk 2021-03-29 17:11:08 UTC
verified on v2.6.1-5

After migrating VM domain remains persistent

$ oc -n supported-os-common-templates-fedora-test-fedora-os-support exec pod/virt-launcher-fedora-33-1617036829-4956977-bpv7k -c compute -it /bin/bash
[root@fedora-33-1617036829-4956977 /]# virsh list --persistent
 Id   Name                                                                                       State
----------------------------------------------------------------------------------------------------------
 1    supported-os-common-templates-fedora-test-fedora-os-support_fedora-33-1617036829-4956977   running

$ virtctl migrate fedora-33-1617036829-4956977 -n supported-os-common-templates-fedora-test-fedora-os-support
VM fedora-33-1617036829-4956977 was scheduled to migrate

$ oc get pods -n supported-os-common-templates-fedora-test-fedora-os-support
NAME                                               READY   STATUS    RESTARTS   AGE
virt-launcher-fedora-33-1617036829-4956977-bpv7k   1/1     Running   0          8m48s
virt-launcher-fedora-33-1617036829-4956977-xpmhc   1/1     Running   0          13s

$ oc -n supported-os-common-templates-fedora-test-fedora-os-support exec pod/virt-launcher-fedora-33-1617036829-4956977-xpmhc -c compute -it /bin/bash
[root@virt-launcher-fedora-33-1617036829-4956977-xpmhc /]# virsh list --persistent
 Id   Name                                                                                       State
----------------------------------------------------------------------------------------------------------
 1    supported-os-common-templates-fedora-test-fedora-os-support_fedora-33-1617036829-4956977   running

Comment 17 errata-xmlrpc 2021-04-07 08:46:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (CNV 2.6.1 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1126