Description of problem: A VM with enabled CPU pinning will pin it's CPUs to (likely) the wrong CPU set after a migration. When the VM is started it is getting set (A) of cores to pin to and will do so. After migration the VM will get a different set (B), the pinning would need to be re-done, but this is not the case today. Version-Release number of selected component (if applicable): 2.6 How reproducible: always Steps to Reproduce: 1. VM with pinning enabled 2. Live migrated 3. Check if VM is pinend to the new CPU set Actual results: It is not pinned to the new CPU set, it is pinned to the old Expected results: It is pinend to the new CPU set Additional info:
Omer, Any update on this?
(In reply to sgott from comment #8) > Omer, > > Any update on this? Hi Stu! Yes, a PR is open (linked in the bug), and is being reviewed by Vladik while I am writing functional tests
Both the nodes have CPU Manager Enabled. ------------------------------------------- [kbidarka@localhost migration]$ oc describe node node-11.redhat.com | grep cpumanager cpumanager=true [kbidarka@localhost migration]$ oc describe node node-12.redhat.com | grep cpumanager cpumanager=true Cordoned the node-13, so that LiveMigration happens between node-11 and node-12 --------------------------------------------------------------------------------- [kbidarka@localhost migration]$ oc get nodes NAME STATUS ROLES AGE VERSION cnv-qe-infra-08.cnvqe2.lab.eng.rdu2.redhat.com Ready master 26h v1.23.0+20a057a cnv-qe-infra-09.cnvqe2.lab.eng.rdu2.redhat.com Ready master 26h v1.23.0+20a057a cnv-qe-infra-10.cnvqe2.lab.eng.rdu2.redhat.com Ready master 26h v1.23.0+20a057a cnv-qe-infra-11.cnvqe2.lab.eng.rdu2.redhat.com Ready worker 25h v1.23.0+20a057a cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com Ready worker 25h v1.23.0+20a057a cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com Ready,SchedulingDisabled worker 26h v1.23.0+20a057a 1) Creating First VM "vm-rhel84-ocs-cpupin" on node-12 with dedicatedCPU. [cloud-user@vm-rhel84-ocs-cpupin ~]$ [kbidarka@localhost cpu-pinning]$ [kbidarka@localhost cpu-pinning]$ [kbidarka@localhost cpu-pinning]$ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-vm-rhel84-ocs-cpupin-xh85m 1/1 Running 0 3m1s [kbidarka@localhost cpu-pinning]$ oc rsh virt-launcher-vm-rhel84-ocs-cpupin-xh85m sh-4.4# virsh list Id Name State ---------------------------------------------- 1 default_vm-rhel84-ocs-cpupin running 2) Below is the CPU Set found on node-12 with "vm-rhel84-ocs-cpupin" sh-4.4# cat /sys/fs/cgroup/cpuset/cpuset.cpus 2,4,6,8,42,44,46,48 sh-4.4# virsh dumpxml default_vm-rhel84-ocs-cpupin ... <cputune> <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='42'/> <vcpupin vcpu='2' cpuset='4'/> <vcpupin vcpu='3' cpuset='44'/> <vcpupin vcpu='4' cpuset='6'/> <vcpupin vcpu='5' cpuset='46'/> <vcpupin vcpu='6' cpuset='8'/> <vcpupin vcpu='7' cpuset='48'/> </cputune> 2) a) Second VM "vm-rhel84-ocs-cpupin2" on node-11 with dedicatedCPU. This too got created with the same CPUSet ----------------------------------------------------------- [cloud-user@vm-rhel84-ocs-cpupin2 ~]$ [kbidarka@localhost cpu-pinning]$ [kbidarka@localhost cpu-pinning]$ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-vm-rhel84-ocs-cpupin-xh85m 1/1 Running 0 26m virt-launcher-vm-rhel84-ocs-cpupin2-qsv8b 1/1 Running 0 92s [kbidarka@localhost cpu-pinning]$ oc rsh virt-launcher-vm-rhel84-ocs-cpupin2-qsv8b sh-4.4# virsh list Id Name State ----------------------------------------------- 1 default_vm-rhel84-ocs-cpupin2 running b) Below is the CPU Set found on node-11 with "vm-rhel84-ocs-cpupin2" sh-4.4# cat /sys/fs/cgroup/cpuset/cpuset.cpus 2,4,6,8,42,44,46,48 sh-4.4# virsh dumpxml default_vm-rhel84-ocs-cpupin2 ... <cputune> <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='42'/> <vcpupin vcpu='2' cpuset='4'/> <vcpupin vcpu='3' cpuset='44'/> <vcpupin vcpu='4' cpuset='6'/> <vcpupin vcpu='5' cpuset='46'/> <vcpupin vcpu='6' cpuset='8'/> <vcpupin vcpu='7' cpuset='48'/> </cputune> 3) As seen below vm-rhel84-ocs-cpupin is running node node-12 As seen below vm-rhel84-ocs-cpupin2 is running node node-11 ------------------------------------------------------ [kbidarka@localhost migration]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel84-ocs-cpupin 36m Running xx.yyy.z.78 node-12.redhat.com True vm-rhel84-ocs-cpupin2 10m Running xx.yyy.zz.68 node-11.redhat.com True --------------------------------------------------------------------------------------------- 4) Trigger a LiveMigration --------------------------- [kbidarka@localhost migration]$ cat migration-job-vm-rhel84-ocs-cpupin.yaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachineInstanceMigration metadata: name: vm-rhel84-ocs-cpupin-vmim1 namespace: default spec: vmiName: vm-rhel84-ocs-cpupin status: {} [kbidarka@localhost migration]$ oc apply -f migration-job-vm-rhel84-ocs-cpupin.yaml virtualmachineinstancemigration.kubevirt.io/vm-rhel84-ocs-cpupin-vmim1 created -------------------------------------------------------- [kbidarka@localhost migration]$ oc get vmi NAME AGE PHASE IP NODENAME READY vm-rhel84-ocs-cpupin 38m Running xx.yyy.zz.69 node-11.redhat.com True vm-rhel84-ocs-cpupin2 12m Running xx.yyy.zz.68 node-11.redhat.com True [kbidarka@localhost migration]$ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-vm-rhel84-ocs-cpupin-c4gfm 1/1 Running 0 33s virt-launcher-vm-rhel84-ocs-cpupin-xh85m 0/1 Completed 0 38m virt-launcher-vm-rhel84-ocs-cpupin2-qsv8b 1/1 Running 0 12m [kbidarka@localhost migration]$ virtctl console vm-rhel84-ocs-cpupin Successfully connected to vm-rhel84-ocs-cpupin console. The escape sequence is ^] [cloud-user@vm-rhel84-ocs-cpupin ~]$ [kbidarka@localhost migration]$ [kbidarka@localhost migration]$ 5) The CPUSet for a VM using dedicatedCPUs is different now, after the LiveMigration. ---------------------------------------------------------- $ oc rsh virt-launcher-vm-rhel84-ocs-cpupin-c4gfm sh-4.4# cat /sys/fs/cgroup/cpuset/cpuset.cpus 10,12,14,16,50,52,54,56 sh-4.4# exit exit [kbidarka@localhost migration]$ [kbidarka@localhost migration]$ oc rsh virt-launcher-vm-rhel84-ocs-cpupin-c4gfm sh-4.4# virsh list Id Name State ---------------------------------------------- 1 default_vm-rhel84-ocs-cpupin running sh-4.4# cat /sys/fs/cgroup/cpuset/cpuset.cpus 10,12,14,16,50,52,54,56 sh-4.4# virsh dumpxml default_vm-rhel84-ocs-cpupin ... <cputune> <vcpupin vcpu='0' cpuset='10'/> <vcpupin vcpu='1' cpuset='50'/> <vcpupin vcpu='2' cpuset='12'/> <vcpupin vcpu='3' cpuset='52'/> <vcpupin vcpu='4' cpuset='14'/> <vcpupin vcpu='5' cpuset='54'/> <vcpupin vcpu='6' cpuset='16'/> <vcpupin vcpu='7' cpuset='56'/> </cputune> Summary: CPU pinning is now correct after LiveMigration, It is now pinned to the new CPU set.
VERIFIED with v4.10.0-648
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947