Bug 1907973 - Unable to migrate VM after upgrading OS to centos 8.3
Summary: Unable to migrate VM after upgrading OS to centos 8.3
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.35.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.4.5-1
: 4.40.50.10
Assignee: Michal Skrivanek
QA Contact: Qin Yuan
URL:
Whiteboard:
Depends On: 1939013
Blocks: 1910282 1940673 1944724
TreeView+ depends on / blocked
 
Reported: 2020-12-15 15:44 UTC by caignec
Modified: 2021-11-04 19:28 UTC (History)
6 users (show)

Fixed In Version: vdsm-4.40.50.10
Clone Of:
: 1910282 1940673 (view as bug list)
Environment:
Last Closed: 2021-04-15 07:28:51 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+
pm-rhel: exception+


Attachments (Terms of Use)
host with 8.2 os (13.70 KB, text/plain)
2020-12-15 15:44 UTC, caignec
no flags Details
node with 8.3 os (81.38 KB, text/plain)
2020-12-15 15:46 UTC, caignec
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 113985 0 master MERGED spec: update kernel dependency 2021-03-24 08:51:46 UTC
oVirt gerrit 113986 0 ovirt-4.4.5 MERGED spec: update kernel dependency 2021-03-24 09:37:08 UTC

Description caignec 2020-12-15 15:44:39 UTC
Created attachment 1739366 [details]
host with 8.2 os

Description of problem:
After upgrade host with CentOS 8.3 i'm unable to migrate my guest from a 8.2 host to 8.3 newly updated.

If i do a full power off / power on cycle of the guest, it can be migrated wherever i want.


Version-Release number of selected component (if applicable):
vdsm.x86_64              4.40.35.1-1
ovirt-engine.noarch      4.4.3.12-1
Cluster CPU type : Secure Intel Cascadelake Server Family

How reproducible:
Upgrade nodes of cluster to centos 8.3



Steps to Reproduce:
1. Get a cluster (cascalake server family) with host rhel 8.2
2. Upgrade one node to 8.3 reboot and activate id
3. Trying to migrate guest on it.

Actual results:
Migration failed
with error : guest CPU doesn't match specification: missing features: tsx-ctrl (migration:294)



Expected results:
Migration run fine

Additional info:

Comment 1 caignec 2020-12-15 15:46:21 UTC
Created attachment 1739367 [details]
node with 8.3 os

Comment 2 RHEL Program Management 2020-12-15 15:52:23 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 Arik 2020-12-15 16:04:30 UTC
That's important because VMs that were started before the cluster settings were updated to the ones introduced in ovirt 4.4.3 cannot migrate to centos 8.3 hosts.

We discussed in a separate context that we probably can't change much in the CPU settings of the VM in the destination host, but need to check if we can drop that +tsx-ctrl on VMs that migrate to a centos/rhel 8.3 or what's the alternative way to enable such a migration.

Comment 4 Milan Zamazal 2020-12-16 18:48:12 UTC
> but need to check if we can drop that +tsx-ctrl on VMs that migrate to a centos/rhel 8.3

I don't have a machine with TSX around, but I tried to remove another CPU feature in the libvirt hook on the destination (both the source and destination are 8.3) and libvirt is not happy about it:

  libvirtd[1579]: unsupported configuration: Target CPU feature count 2 does not match source 3

So this is clearly not going to work.

>  or what's the alternative way to enable such a migration

We will have to find some.

Comment 5 Milan Zamazal 2020-12-18 09:56:18 UTC
Apparently the only, but crucial, problem is tsx-ctrl feature presence. The feature doesn't make sense when TSX is disabled but it may still modify guests in some way.

As discussed with Jiří Denemark, libvirt cannot help us with the feature removal, whether the feature modifies guests under given circumstances or not. If removal of the feature while the guest is running would be harmless to the guest then QEMU could be modified not to fail on tsx-ctrl feature request when TSX is disabled (which is the case, since `rtm' and `hle' features are disabled). Arik, do we want to file a QEMU bug for requesting such a change and discussing whether it is possible?

The only other options are to instruct users either to restart the VMs or to enable TSX on the destination hosts (with all the implications regarding security and performance). Note that we have the same problem with file migrations; in theory there is an additional danger with them that a VM is suspended and once there is no 8.2 host present then the VM can no longer be resumed and must be powered off.

Comment 6 Arik 2020-12-23 10:04:30 UTC
(In reply to Milan Zamazal from comment #5)
> As discussed with Jiří Denemark, libvirt cannot help us with the feature
> removal, whether the feature modifies guests under given circumstances or
> not. If removal of the feature while the guest is running would be harmless
> to the guest then QEMU could be modified not to fail on tsx-ctrl feature
> request when TSX is disabled (which is the case, since `rtm' and `hle'
> features are disabled). Arik, do we want to file a QEMU bug for requesting
> such a change and discussing whether it is possible?

Yes please, I think it will significantly simplify the upgrade process.
 
> The only other options are to instruct users either to restart the VMs or to
> enable TSX on the destination hosts (with all the implications regarding
> security and performance). Note that we have the same problem with file
> migrations; in theory there is an additional danger with them that a VM is
> suspended and once there is no 8.2 host present then the VM can no longer be
> resumed and must be powered off.

I don't think that enabling TSX on the destination hosts would be recommended - but worth documenting that issue and that restarting the VMs can solve this.
I'll create a documentation bug.

Comment 7 Milan Zamazal 2021-01-04 14:10:54 UTC
A platform bug filed: https://bugzilla.redhat.com/1912448

Comment 11 RHEL Program Management 2021-03-18 20:10:25 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 14 Qin Yuan 2021-03-30 07:47:54 UTC
Test Versions:
ovirt-engine-4.4.2.6-0.2.el8ev.noarch
ovirt-engine-4.4.5.11-0.1.el8ev.noarch
rhel 8.2 host:
- kernel-4.18.0-193.19.1.el8_2.x86_64
- vdsm-4.40.26.3-1.el8ev.x86_64
rhel 8.3 host:
- kernel-4.18.0-240.22.1.el8_3.x86_64
- vdsm-4.40.50.10-1.el8ev.x86_64

Test Steps:
1. Set up 4.4.2 engine
2. Create 4.4 Data Center, add a cluster with Secure Intel Cascadelake Server Family cpu type
3. Add a rhel 8.2 host with kernel-4.18.0-193.19.1.el8_2.x86_64
# virsh domcapabilities
  <cpu>
    <mode name='host-passthrough' supported='yes'/>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Cascadelake-Server</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='umip'/>
      <feature policy='require' name='pku'/>
      <feature policy='require' name='md-clear'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='ibpb'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='ibrs-all'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
      <feature policy='require' name='tsx-ctrl'/>
    </mode>

4. Create and run a VM
# virsh -r dumpxml vm_82
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Cascadelake-Server</model>
    <topology sockets='16' dies='1' cores='1' threads='1'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='mds-no'/>
    <feature policy='disable' name='hle'/>
    <feature policy='disable' name='rtm'/>
    <feature policy='require' name='tsx-ctrl'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='mpx'/>
    <feature policy='require' name='pku'/>

5. Upgrade engine to 4.4.5
6. Add a rhel 8.3 host with kernel-4.18.0-240.22.1.el8_3.x86_64
# virsh domcapabilities
<cpu>
    <mode name='host-passthrough' supported='yes'>
      <enum name='hostPassthroughMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Cascadelake-Server</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='umip'/>
      <feature policy='require' name='pku'/>
      <feature policy='require' name='md-clear'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='ibpb'/>
      <feature policy='require' name='amd-stibp'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='ibrs-all'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
      <feature policy='require' name='tsx-ctrl'/>
      <feature policy='disable' name='hle'/>
      <feature policy='disable' name='rtm'/>
    </mode>

7. Migrate the VM from rhel 8.2 host to rhel 8.3 host
2021-03-30 06:57:25,491+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-17) [5d162d86] EVENT_ID: VM_MIGRATION_DONE(63), Migration completed (VM: vm_82, Source: host_82, Destination: host_83, Duration: 3 seconds, Total: 3 seconds, Actual downtime: (N/A))

VM with Cascadelake-Server,-hle,-rtm,+tsx-ctrl cpu configuration can be migrated from rhel 8.2 host to rhel 8.3 host with kernel-4.18.0-240.22.1.el8_3.x86_64.


Note You need to log in before you can comment on or make changes to this bug.