Created attachment 1869690 [details] engine log dump xmls Description of problem: engine must not run resize policy VM when there are no free host CPUs for it. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. On host with topology 1:4:1 (sockets:cores:threads) run one dedicated 1:2:1 and one none policy VM 1:2:1 - run successfully 2. Try to run on the same host resize_and_pin policy VM . Expected it to fail but it was run and we have now the same CPUs assigned twice root@lynx09 ~]# virsh -r dumpxml 30 |grep vcpu <vcpu placement='static'>3</vcpu> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='2'/> <vcpupin vcpu='2' cpuset='3'/> [root@lynx09 ~]# virsh -r dumpxml 29 |grep vcpu <vcpu placement='static' current='2'>32</vcpu> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='2'/> [root@lynx09 ~]# virsh -r dumpxml 25 |grep vcpu <vcpu placement='static' current='2'>32</vcpu> <vcpupin vcpu='0' cpuset='1,3'/> <vcpupin vcpu='1' cpuset='1,3'/> 30 - resize VM 29 - dedicated 25 - none Actual results: resize policy VM is started while the host has no free CPUs to reside it Expected results:the resize_and_pin VM must not start on the same host Additional info: 1. attached dump.xmls for three VMs 2. the scenario could be easily reproduced on hosted-engine-04.lab.eng.tlv2.redhat.com where we have hosts with topology 1:4:1
Liran, should be fixed in the beta version, no?
add another example - On host 2:8:2 (serval18.lab.eng.tlv2.redhat.com) I run three dedicated VMs , each one 1:8:1 , so I have left future cpus = 16 - 24/2. I run resize_and_pin VM expecting it will take 16-12=3 (1 for host). This resize_and_pin VM runs with topology 2:7:2 under dynamic_cpu and 5sockets:1:1 under cpu .and assigned the pCPUs which are already taken by other dedicated VMs pCPU assigned: [root@serval18 ~]# grep "cpuset=" VM1.dumpxml |awk -F\ "cpuset='" '{print $2}' | rev | cut -c 4- | rev|sort -n 0 2 4 6 16 18 20 22 root@serval18 ~]# grep "cpuset=" VM2.dumpxml |awk -F\ "cpuset='" '{print $2}' | rev | cut -c 4- | rev |sort -n 3 5 7 9 19 21 23 25 [root@serval18 ~]# grep "cpuset=" VM3.dumpxml |awk -F\ "cpuset='" '{print $2}' | rev | cut -c 4- | rev |sort -n 8 10 12 14 24 26 28 30 [root@serval18 ~]# grep "cpuset=" VM4_resize_and_pin.dumpxml |awk -F\ "cpuset='" '{print $2}' | rev | cut -c 4- | rev |sort -n 2,18 2,18 3,19 3,19 4,20 4,20 5,21 5,21 6,22 6,22 7,23 7,23 8,24 8,24 9,25 9,25 10,26 10,26 11,27 11,27 12,28 12,28 13,29 13,29 14,30 14,30 15,31 15,31
Resize and pin shouldn't fit based on dedicate, it should consume the host resources. I think we will need to forbid running dedicated combines with resize and pin vm. If the resize and pin VM already running - we already should hit a filter. The other way around is more problematic because the timing we filter and set the vm dynamic details (only when running the vm). (In reply to Arik from comment #1) > Liran, should be fixed in the beta version, no? That would be the best.
So we don't take into account the exclusively pinned resources when scheduling VMs with resize-and-pin - doing that should be relatively simple
it is verified on ovirt-engine-4.5.0.2-0.7.el8ev.noarch resize policy VM is forbidden to run together with dedicated. event if there are CPU resources. that's why I'm not sure that the error text we get now is good. "the host host_mixed_1 did not satisfy internal filter CpuPinning because doesn't have enough CPUs for the resize and pin NUMA CPU policy that the VM is set with." such error is returned in any case of simultaneous launch of dedicated and resize VM on a host even if dedicated is set with 1 CPU . maybe the error should say about exclusively running rule?
We forbidden any run of `Resize and Pin NUMA` when a dedicated VM runs on that host. Basically the message is true. `Resize and Pin NUMA` is consuming all the resources on that host, those we don't pin (one core) is to leave the host with some breathing space. Based on that logic, you don't have enough CPUs even when 1 CPU is taken. But yes, we may change the message if that seems important. I am not sure in that case we need another bug or using this one (IMO, a new one), Arik?
(In reply to Liran Rotenberg from comment #6) > I am not sure in that case we need another bug or using this one (IMO, a new > one), Arik? Yes, a new one with lower severity Users won't necessarily know the accurate meaning of the "resize" part and may not notice that a VM with exclusive pinning runs on the host, so it would be better to be more explicit about it (.. because some of the physical CPUs on the host are exclusively pinned), even if just to ease debugging by us
reported low priority bz https://bugzilla.redhat.com/show_bug.cgi?id=2078189 this one is closed on the base of https://bugzilla.redhat.com/show_bug.cgi?id=2070536#c5
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.