Bug 2092269

Summary: We cant migrate to newer target node and than return to the source node when using host-model cpu
Product: Container Native Virtualization (CNV) Reporter: Barak <bmordeha>
Component: VirtualizationAssignee: Barak <bmordeha>
Status: CLOSED ERRATA QA Contact: Kedar Bidarkar <kbidarka>
Severity: high Docs Contact:
Priority: high    
Version: 4.9.0CC: acardace, akrgupta, cnv-qe-bugs, ibezukh, sgott
Target Milestone: ---   
Target Release: 4.9.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.9.6-26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-22 08:17:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Barak 2022-06-01 08:19:27 UTC
Description of problem:
- We cant migrate to newer target node and than return to the source node when using host-model cpu.

- vmi with host-model can't migrate even when it should

Version-Release number of selected component (if applicable):


How reproducible:
for instance:
(1) if vmi start with hosmodel-cpu in node01 that doesn't have AES feature and than migrate to node02 that has AES we won't be able to migrate back to node01.
also
(2) vmi with host-model can't migrate if the target node does't have the same host model even if the target node support the host-model of the source node and 
    has all the required features.

Steps to Reproduce (1):
1. Deploy kubevirt in heterogeneous Cluster that has a Node with unique feature
   (after deploying kubevirt you can use the following command to know which 
   features exist in the node: `kubectl get node <node_name> -oyaml | grep host- 
   model-required-features ` )
2. Start a vm with host-model cpu in a node without any unique feature 
3. migrate to a node with  unique feature 
4. try to migrate back to the inital node

Actual results:
the migration in step 4 will fail because of a node selector in virt-launcher that shouldn't be there.

Steps to Reproduce (2):
1. Deploy kubevirt in heterogeneous Cluster with at least two nodes with diffrent host-model cpuModel.
   (after deploying kubevirt you can use the following command to know which host-model a node has:
   `kubectl get node  <node_name> -oyaml | grep host-model-cpu.node ` )
2. start a vm with host-model cpu in a node
3. try to migrate it to node that support the source node's host-model cpuModel but with different host-model 

Actual results:
the migration will fail because of a node selector in virt-launcher.

Expected results:
The migration should Succeed

Additional info:

Comment 1 Barak 2022-06-01 11:57:47 UTC
update:
backporting fix to 0.44 , 0.49 , 0.53

Comment 2 sgott 2022-06-06 14:28:46 UTC
Deferring to the next point release as the backport is complicated and will take some time.

Comment 5 Akriti Gupta 2022-09-19 21:11:53 UTC
Verified with 

[akrgupta@fedora auth]$for n in $(oc get node -o name | grep worker); do echo ""; echo $n;oc describe $n | grep "cpu-model.node.kubevirt.io"; done
node/virt-akr-49-pcdnh-worker-0-ft7rw
                    cpu-model.node.kubevirt.io/Haswell-noTSX=true
                    cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/IvyBridge=true
                    cpu-model.node.kubevirt.io/IvyBridge-IBRS=true
                    cpu-model.node.kubevirt.io/Nehalem=true
                    cpu-model.node.kubevirt.io/Nehalem-IBRS=true
                    cpu-model.node.kubevirt.io/Opteron_G1=true
                    cpu-model.node.kubevirt.io/Opteron_G2=true
                    cpu-model.node.kubevirt.io/Penryn=true
                    cpu-model.node.kubevirt.io/SandyBridge=true
                    cpu-model.node.kubevirt.io/SandyBridge-IBRS=true
                    cpu-model.node.kubevirt.io/Westmere=true
                    cpu-model.node.kubevirt.io/Westmere-IBRS=true

node/virt-akr-49-pcdnh-worker-0-ph99k
                    cpu-model.node.kubevirt.io/Broadwell=true
                    cpu-model.node.kubevirt.io/Broadwell-IBRS=true
                    cpu-model.node.kubevirt.io/Broadwell-noTSX=true
                    cpu-model.node.kubevirt.io/Broadwell-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/Haswell=true
                    cpu-model.node.kubevirt.io/Haswell-IBRS=true
                    cpu-model.node.kubevirt.io/Haswell-noTSX=true
                    cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/IvyBridge=true
                    cpu-model.node.kubevirt.io/IvyBridge-IBRS=true
                    cpu-model.node.kubevirt.io/Nehalem=true
                    cpu-model.node.kubevirt.io/Nehalem-IBRS=true
                    cpu-model.node.kubevirt.io/Opteron_G1=true
                    cpu-model.node.kubevirt.io/Opteron_G2=true
                    cpu-model.node.kubevirt.io/Penryn=true
                    cpu-model.node.kubevirt.io/SandyBridge=true
                    cpu-model.node.kubevirt.io/SandyBridge-IBRS=true
                    cpu-model.node.kubevirt.io/Skylake-Client=true
                    cpu-model.node.kubevirt.io/Skylake-Client-IBRS=true
                    cpu-model.node.kubevirt.io/Skylake-Client-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/Skylake-Server=true
                    cpu-model.node.kubevirt.io/Skylake-Server-IBRS=true
                    cpu-model.node.kubevirt.io/Skylake-Server-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/Westmere=true
                    cpu-model.node.kubevirt.io/Westmere-IBRS=true

node/virt-akr-49-pcdnh-worker-0-s7rqn
                    cpu-model.node.kubevirt.io/Broadwell=true
                    cpu-model.node.kubevirt.io/Broadwell-IBRS=true
                    cpu-model.node.kubevirt.io/Broadwell-noTSX=true
                    cpu-model.node.kubevirt.io/Broadwell-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/Haswell=true
                    cpu-model.node.kubevirt.io/Haswell-IBRS=true
                    cpu-model.node.kubevirt.io/Haswell-noTSX=true
                    cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS=true
                    cpu-model.node.kubevirt.io/IvyBridge=true
                    cpu-model.node.kubevirt.io/IvyBridge-IBRS=true
                    cpu-model.node.kubevirt.io/Nehalem=true
                    cpu-model.node.kubevirt.io/Nehalem-IBRS=true
                    cpu-model.node.kubevirt.io/Opteron_G1=true
                    cpu-model.node.kubevirt.io/Opteron_G2=true
                    cpu-model.node.kubevirt.io/Penryn=true
                    cpu-model.node.kubevirt.io/SandyBridge=true
                    cpu-model.node.kubevirt.io/SandyBridge-IBRS=true
                    cpu-model.node.kubevirt.io/Westmere=true
                    cpu-model.node.kubevirt.io/Westmere-IBRS=true

[akrgupta@fedora auth]$ oc get nodes
NAME                               STATUS                     ROLES    AGE     VERSION
virt-akr-49-pcdnh-master-0         Ready                      master   3h24m   v1.22.8+9e95cb9
virt-akr-49-pcdnh-master-1         Ready                      master   3h24m   v1.22.8+9e95cb9
virt-akr-49-pcdnh-master-2         Ready                      master   3h23m   v1.22.8+9e95cb9
virt-akr-49-pcdnh-worker-0-ft7rw   Ready                      worker   3h7m    v1.22.8+9e95cb9
virt-akr-49-pcdnh-worker-0-ph99k   Ready,SchedulingDisabled   worker   3h5m    v1.22.8+9e95cb9
virt-akr-49-pcdnh-worker-0-s7rqn   Ready,SchedulingDisabled   worker   3h7m    v1.22.8+9e95cb9
[akrgupta@fedora auth]$ oc get vm
NAME                  AGE   STATUS    READY
vm-fedora-hostmodel   23m   Stopped   False
[akrgupta@fedora auth]$ cat vm_yaml | grep spec -A 5
    spec:
      domain:
        cpu:
          cores: 1
          model: host-model
        devices:

[akrgupta@fedora auth]$ virtctl start vm-fedora-hostmodel
VM vm-fedora-hostmodel was scheduled to start
[akrgupta@fedora auth]$ oc get vmi
NAME                  AGE   PHASE     IP            NODENAME                           READY
vm-fedora-hostmodel   73s   Running   10.131.0.44   virt-akr-49-pcdnh-worker-0-ft7rw   True
[akrgupta@fedora auth]$ oc adm uncordon virt-akr-49-pcdnh-worker-0-ph99k
node/virt-akr-49-pcdnh-worker-0-ph99k uncordoned
[akrgupta@fedora auth]$ oc get nodes
NAME                               STATUS                     ROLES    AGE     VERSION
virt-akr-49-pcdnh-master-0         Ready                      master   3h26m   v1.22.8+9e95cb9
virt-akr-49-pcdnh-master-1         Ready                      master   3h26m   v1.22.8+9e95cb9
virt-akr-49-pcdnh-master-2         Ready                      master   3h25m   v1.22.8+9e95cb9
virt-akr-49-pcdnh-worker-0-ft7rw   Ready                      worker   3h9m    v1.22.8+9e95cb9
virt-akr-49-pcdnh-worker-0-ph99k   Ready                      worker   3h7m    v1.22.8+9e95cb9
virt-akr-49-pcdnh-worker-0-s7rqn   Ready,SchedulingDisabled   worker   3h9m    v1.22.8+9e95cb9
[akrgupta@fedora auth]$ virtctl migrate vm-fedora-hostmodel
VM vm-fedora-hostmodel was scheduled to migrate
[akrgupta@fedora auth]$ oc get vmi
NAME                  AGE     PHASE     IP            NODENAME                           READY
vm-fedora-hostmodel   3m37s   Running   10.129.2.65   virt-akr-49-pcdnh-worker-0-ph99k   True
[akrgupta@fedora auth]$ virtctl migrate vm-fedora-hostmodel
VM vm-fedora-hostmodel was scheduled to migrate
[akrgupta@fedora auth]$ oc get vmi
NAME                  AGE     PHASE     IP            NODENAME                           READY
vm-fedora-hostmodel   5m23s   Running   10.131.0.46   virt-akr-49-pcdnh-worker-0-ft7rw   True

We can migrate to newer target node and than return to the source node when using host-model cpu

Comment 6 Akriti Gupta 2022-09-19 21:16:52 UTC
(In reply to Akriti Gupta from comment #5)
 Verified with v4.9.6-51 
> 
> [akrgupta@fedora auth]$for n in $(oc get node -o name | grep worker); do
> echo ""; echo $n;oc describe $n | grep "cpu-model.node.kubevirt.io"; done
> node/virt-akr-49-pcdnh-worker-0-ft7rw
>                     cpu-model.node.kubevirt.io/Haswell-noTSX=true
>                     cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS=true
>                     cpu-model.node.kubevirt.io/IvyBridge=true
>                     cpu-model.node.kubevirt.io/IvyBridge-IBRS=true
>                     cpu-model.node.kubevirt.io/Nehalem=true
>                     cpu-model.node.kubevirt.io/Nehalem-IBRS=true
>                     cpu-model.node.kubevirt.io/Opteron_G1=true
>                     cpu-model.node.kubevirt.io/Opteron_G2=true
>                     cpu-model.node.kubevirt.io/Penryn=true
>                     cpu-model.node.kubevirt.io/SandyBridge=true
>                     cpu-model.node.kubevirt.io/SandyBridge-IBRS=true
>                     cpu-model.node.kubevirt.io/Westmere=true
>                     cpu-model.node.kubevirt.io/Westmere-IBRS=true
> 
> node/virt-akr-49-pcdnh-worker-0-ph99k
>                     cpu-model.node.kubevirt.io/Broadwell=true
>                     cpu-model.node.kubevirt.io/Broadwell-IBRS=true
>                     cpu-model.node.kubevirt.io/Broadwell-noTSX=true
>                     cpu-model.node.kubevirt.io/Broadwell-noTSX-IBRS=true
>                     cpu-model.node.kubevirt.io/Haswell=true
>                     cpu-model.node.kubevirt.io/Haswell-IBRS=true
>                     cpu-model.node.kubevirt.io/Haswell-noTSX=true
>                     cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS=true
>                     cpu-model.node.kubevirt.io/IvyBridge=true
>                     cpu-model.node.kubevirt.io/IvyBridge-IBRS=true
>                     cpu-model.node.kubevirt.io/Nehalem=true
>                     cpu-model.node.kubevirt.io/Nehalem-IBRS=true
>                     cpu-model.node.kubevirt.io/Opteron_G1=true
>                     cpu-model.node.kubevirt.io/Opteron_G2=true
>                     cpu-model.node.kubevirt.io/Penryn=true
>                     cpu-model.node.kubevirt.io/SandyBridge=true
>                     cpu-model.node.kubevirt.io/SandyBridge-IBRS=true
>                     cpu-model.node.kubevirt.io/Skylake-Client=true
>                     cpu-model.node.kubevirt.io/Skylake-Client-IBRS=true
>                     cpu-model.node.kubevirt.io/Skylake-Client-noTSX-IBRS=true
>                     cpu-model.node.kubevirt.io/Skylake-Server=true
>                     cpu-model.node.kubevirt.io/Skylake-Server-IBRS=true
>                     cpu-model.node.kubevirt.io/Skylake-Server-noTSX-IBRS=true
>                     cpu-model.node.kubevirt.io/Westmere=true
>                     cpu-model.node.kubevirt.io/Westmere-IBRS=true
> 
> node/virt-akr-49-pcdnh-worker-0-s7rqn
>                     cpu-model.node.kubevirt.io/Broadwell=true
>                     cpu-model.node.kubevirt.io/Broadwell-IBRS=true
>                     cpu-model.node.kubevirt.io/Broadwell-noTSX=true
>                     cpu-model.node.kubevirt.io/Broadwell-noTSX-IBRS=true
>                     cpu-model.node.kubevirt.io/Haswell=true
>                     cpu-model.node.kubevirt.io/Haswell-IBRS=true
>                     cpu-model.node.kubevirt.io/Haswell-noTSX=true
>                     cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS=true
>                     cpu-model.node.kubevirt.io/IvyBridge=true
>                     cpu-model.node.kubevirt.io/IvyBridge-IBRS=true
>                     cpu-model.node.kubevirt.io/Nehalem=true
>                     cpu-model.node.kubevirt.io/Nehalem-IBRS=true
>                     cpu-model.node.kubevirt.io/Opteron_G1=true
>                     cpu-model.node.kubevirt.io/Opteron_G2=true
>                     cpu-model.node.kubevirt.io/Penryn=true
>                     cpu-model.node.kubevirt.io/SandyBridge=true
>                     cpu-model.node.kubevirt.io/SandyBridge-IBRS=true
>                     cpu-model.node.kubevirt.io/Westmere=true
>                     cpu-model.node.kubevirt.io/Westmere-IBRS=true
> 
> [akrgupta@fedora auth]$ oc get nodes
> NAME                               STATUS                     ROLES    AGE  
> VERSION
> virt-akr-49-pcdnh-master-0         Ready                      master   3h24m
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-master-1         Ready                      master   3h24m
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-master-2         Ready                      master   3h23m
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-worker-0-ft7rw   Ready                      worker   3h7m 
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-worker-0-ph99k   Ready,SchedulingDisabled   worker   3h5m 
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-worker-0-s7rqn   Ready,SchedulingDisabled   worker   3h7m 
> v1.22.8+9e95cb9
> [akrgupta@fedora auth]$ oc get vm
> NAME                  AGE   STATUS    READY
> vm-fedora-hostmodel   23m   Stopped   False
> [akrgupta@fedora auth]$ cat vm_yaml | grep spec -A 5
>     spec:
>       domain:
>         cpu:
>           cores: 1
>           model: host-model
>         devices:
> 
> [akrgupta@fedora auth]$ virtctl start vm-fedora-hostmodel
> VM vm-fedora-hostmodel was scheduled to start
> [akrgupta@fedora auth]$ oc get vmi
> NAME                  AGE   PHASE     IP            NODENAME                
> READY
> vm-fedora-hostmodel   73s   Running   10.131.0.44  
> virt-akr-49-pcdnh-worker-0-ft7rw   True
> [akrgupta@fedora auth]$ oc adm uncordon virt-akr-49-pcdnh-worker-0-ph99k
> node/virt-akr-49-pcdnh-worker-0-ph99k uncordoned
> [akrgupta@fedora auth]$ oc get nodes
> NAME                               STATUS                     ROLES    AGE  
> VERSION
> virt-akr-49-pcdnh-master-0         Ready                      master   3h26m
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-master-1         Ready                      master   3h26m
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-master-2         Ready                      master   3h25m
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-worker-0-ft7rw   Ready                      worker   3h9m 
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-worker-0-ph99k   Ready                      worker   3h7m 
> v1.22.8+9e95cb9
> virt-akr-49-pcdnh-worker-0-s7rqn   Ready,SchedulingDisabled   worker   3h9m 
> v1.22.8+9e95cb9
> [akrgupta@fedora auth]$ virtctl migrate vm-fedora-hostmodel
> VM vm-fedora-hostmodel was scheduled to migrate
> [akrgupta@fedora auth]$ oc get vmi
> NAME                  AGE     PHASE     IP            NODENAME              
> READY
> vm-fedora-hostmodel   3m37s   Running   10.129.2.65  
> virt-akr-49-pcdnh-worker-0-ph99k   True
> [akrgupta@fedora auth]$ virtctl migrate vm-fedora-hostmodel
> VM vm-fedora-hostmodel was scheduled to migrate
> [akrgupta@fedora auth]$ oc get vmi
> NAME                  AGE     PHASE     IP            NODENAME              
> READY
> vm-fedora-hostmodel   5m23s   Running   10.131.0.46  
> virt-akr-49-pcdnh-worker-0-ft7rw   True
> 
> We can migrate to newer target node and than return to the source node when
> using host-model cpu

Comment 11 errata-xmlrpc 2022-09-22 08:17:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.9.6 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6681