Bug 1945586 - CPU pinning is incorrect after live migration
Summary: CPU pinning is incorrect after live migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Jed Lejosne
QA Contact: zhe peng
URL:
Whiteboard:
Depends On: 2029343
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-01 10:52 UTC by Fabian Deutsch
Modified: 2022-10-10 16:57 UTC (History)
11 users (show)

Fixed In Version: virt-operator-container-v4.10.0-195 hco-bundle-registry-container-v4.10.0-593
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 15:50:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 6200 0 None None None 2021-08-09 07:49:50 UTC
Github kubevirt kubevirt pull 6821 0 None open [WIP] Update dedicated CPUs on migration 2021-11-19 15:25:42 UTC
Github kubevirt kubevirt pull 7043 0 None open [release-0.49] Update dedicated CPUs on migration 2022-01-12 13:21:50 UTC
Red Hat Product Errata RHSA-2022:0947 0 None None None 2022-03-16 15:51:05 UTC

Description Fabian Deutsch 2021-04-01 10:52:46 UTC
Description of problem:
A VM with enabled CPU pinning will pin it's CPUs to (likely) the wrong CPU set after a migration.

When the VM is started it is getting set (A) of cores to pin to and will do so.
After migration the VM will get a different set (B), the pinning would need to be re-done, but this is not the case today.

Version-Release number of selected component (if applicable):
2.6

How reproducible:
always

Steps to Reproduce:
1. VM with pinning enabled
2. Live migrated
3. Check if VM is pinend to the new CPU set

Actual results:
It is not pinned to the new CPU set, it is pinned to the old

Expected results:
It is pinend to the new CPU set

Additional info:

Comment 8 sgott 2021-08-18 12:52:09 UTC
Omer,

Any update on this?

Comment 9 Omer Yahud 2021-08-24 07:11:38 UTC
(In reply to sgott from comment #8)
> Omer,
> 
> Any update on this?

Hi Stu!

Yes, a PR is open (linked in the bug), and is being reviewed by Vladik while I am writing functional tests

Comment 23 Kedar Bidarkar 2022-02-04 19:53:42 UTC
Both the nodes have CPU Manager Enabled.
-------------------------------------------

[kbidarka@localhost migration]$ oc describe node node-11.redhat.com | grep cpumanager
                    cpumanager=true
[kbidarka@localhost migration]$ oc describe node node-12.redhat.com | grep cpumanager
                    cpumanager=true

Cordoned the node-13, so that LiveMigration happens between node-11 and node-12
---------------------------------------------------------------------------------

[kbidarka@localhost migration]$ oc get nodes
NAME                                             STATUS                     ROLES    AGE   VERSION
cnv-qe-infra-08.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      master   26h   v1.23.0+20a057a
cnv-qe-infra-09.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      master   26h   v1.23.0+20a057a
cnv-qe-infra-10.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      master   26h   v1.23.0+20a057a
cnv-qe-infra-11.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      worker   25h   v1.23.0+20a057a
cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      worker   25h   v1.23.0+20a057a
cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com   Ready,SchedulingDisabled   worker   26h   v1.23.0+20a057a




1) Creating First VM "vm-rhel84-ocs-cpupin" on node-12 with dedicatedCPU.
[cloud-user@vm-rhel84-ocs-cpupin ~]$ [kbidarka@localhost cpu-pinning]$ 
[kbidarka@localhost cpu-pinning]$ 
[kbidarka@localhost cpu-pinning]$ oc get pods
NAME                                       READY   STATUS    RESTARTS   AGE
virt-launcher-vm-rhel84-ocs-cpupin-xh85m   1/1     Running   0          3m1s
[kbidarka@localhost cpu-pinning]$ oc rsh virt-launcher-vm-rhel84-ocs-cpupin-xh85m
sh-4.4# virsh list
 Id   Name                           State
----------------------------------------------
 1    default_vm-rhel84-ocs-cpupin   running
 
2) Below is the CPU Set found on node-12 with "vm-rhel84-ocs-cpupin"

sh-4.4# cat /sys/fs/cgroup/cpuset/cpuset.cpus
2,4,6,8,42,44,46,48
sh-4.4# virsh dumpxml default_vm-rhel84-ocs-cpupin
...
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='42'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='44'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='46'/>
    <vcpupin vcpu='6' cpuset='8'/>
    <vcpupin vcpu='7' cpuset='48'/>
  </cputune>

2)
a) Second VM "vm-rhel84-ocs-cpupin2" on node-11 with dedicatedCPU.
This too got created with the same CPUSet
-----------------------------------------------------------

[cloud-user@vm-rhel84-ocs-cpupin2 ~]$ [kbidarka@localhost cpu-pinning]$ 
[kbidarka@localhost cpu-pinning]$ oc get pods
NAME                                        READY   STATUS    RESTARTS   AGE
virt-launcher-vm-rhel84-ocs-cpupin-xh85m    1/1     Running   0          26m
virt-launcher-vm-rhel84-ocs-cpupin2-qsv8b   1/1     Running   0          92s
[kbidarka@localhost cpu-pinning]$ oc rsh virt-launcher-vm-rhel84-ocs-cpupin2-qsv8b 
sh-4.4# virsh list
 Id   Name                            State
-----------------------------------------------
 1    default_vm-rhel84-ocs-cpupin2   running
 
b) Below is the CPU Set found on node-11 with "vm-rhel84-ocs-cpupin2"

sh-4.4# cat /sys/fs/cgroup/cpuset/cpuset.cpus
2,4,6,8,42,44,46,48
sh-4.4# virsh dumpxml default_vm-rhel84-ocs-cpupin2
...
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='42'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='44'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='46'/>
    <vcpupin vcpu='6' cpuset='8'/>
    <vcpupin vcpu='7' cpuset='48'/>
  </cputune>

3) As seen below vm-rhel84-ocs-cpupin is running node node-12
As seen below vm-rhel84-ocs-cpupin2 is running node node-11


------------------------------------------------------

[kbidarka@localhost migration]$ oc get vmi 
NAME                    AGE   PHASE     IP            NODENAME                                         READY
vm-rhel84-ocs-cpupin    36m   Running   xx.yyy.z.78   node-12.redhat.com   True
vm-rhel84-ocs-cpupin2   10m   Running   xx.yyy.zz.68   node-11.redhat.com   True
---------------------------------------------------------------------------------------------


4) Trigger a LiveMigration
---------------------------

[kbidarka@localhost migration]$ cat migration-job-vm-rhel84-ocs-cpupin.yaml
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstanceMigration
metadata:
  name: vm-rhel84-ocs-cpupin-vmim1
  namespace: default
spec:
  vmiName: vm-rhel84-ocs-cpupin
status: {}
[kbidarka@localhost migration]$ oc apply -f migration-job-vm-rhel84-ocs-cpupin.yaml
virtualmachineinstancemigration.kubevirt.io/vm-rhel84-ocs-cpupin-vmim1 created


--------------------------------------------------------
[kbidarka@localhost migration]$ oc get vmi 
NAME                    AGE   PHASE     IP            NODENAME                                         READY
vm-rhel84-ocs-cpupin    38m   Running   xx.yyy.zz.69   node-11.redhat.com   True
vm-rhel84-ocs-cpupin2   12m   Running   xx.yyy.zz.68   node-11.redhat.com   True
[kbidarka@localhost migration]$ oc get pods
NAME                                        READY   STATUS      RESTARTS   AGE
virt-launcher-vm-rhel84-ocs-cpupin-c4gfm    1/1     Running     0          33s
virt-launcher-vm-rhel84-ocs-cpupin-xh85m    0/1     Completed   0          38m
virt-launcher-vm-rhel84-ocs-cpupin2-qsv8b   1/1     Running     0          12m

[kbidarka@localhost migration]$ virtctl console vm-rhel84-ocs-cpupin
Successfully connected to vm-rhel84-ocs-cpupin console. The escape sequence is ^]

[cloud-user@vm-rhel84-ocs-cpupin ~]$ [kbidarka@localhost migration]$ 
[kbidarka@localhost migration]$ 


5) The CPUSet for a VM using dedicatedCPUs is different now, 
after the LiveMigration.

----------------------------------------------------------

$ oc rsh virt-launcher-vm-rhel84-ocs-cpupin-c4gfm
sh-4.4# cat /sys/fs/cgroup/cpuset/cpuset.cpus
10,12,14,16,50,52,54,56
sh-4.4# exit
exit
[kbidarka@localhost migration]$ 
[kbidarka@localhost migration]$ oc rsh virt-launcher-vm-rhel84-ocs-cpupin-c4gfm
sh-4.4# virsh list  
 Id   Name                           State
----------------------------------------------
 1    default_vm-rhel84-ocs-cpupin   running

sh-4.4# cat /sys/fs/cgroup/cpuset/cpuset.cpus
10,12,14,16,50,52,54,56
sh-4.4# virsh dumpxml default_vm-rhel84-ocs-cpupin
...
  <cputune>
    <vcpupin vcpu='0' cpuset='10'/>
    <vcpupin vcpu='1' cpuset='50'/>
    <vcpupin vcpu='2' cpuset='12'/>
    <vcpupin vcpu='3' cpuset='52'/>
    <vcpupin vcpu='4' cpuset='14'/>
    <vcpupin vcpu='5' cpuset='54'/>
    <vcpupin vcpu='6' cpuset='16'/>
    <vcpupin vcpu='7' cpuset='56'/>
  </cputune>


Summary: CPU pinning is now correct after LiveMigration, It is now pinned to the new CPU set.

Comment 24 Kedar Bidarkar 2022-02-04 19:54:40 UTC
VERIFIED with v4.10.0-648

Comment 29 errata-xmlrpc 2022-03-16 15:50:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947


Note You need to log in before you can comment on or make changes to this bug.