Description of problem: Currently, the provided VMs templates let the user easily overcommit the CPU resources from the cluster. For example, in the RHEL 8 template, we ask the user how many CPU he wants and define them using this in the template : ~~~ cpu: sockets: {{ item.cpus }} cores: 1 threads: 1 ~~~ However, this can easily lead to CPU overcommit and bad scheduling decision as this configuration does not reserve the actual amount of CPU requested. A user can easily create 4 VMs with 16 CPUs each on a worker node with 16 actual CPUs in it. Version-Release number of selected component (if applicable): Tested on CNV 4.8.2 but should affect every version. How reproducible: Every time a template is used. Steps to Reproduce: 1. Create a VM using a template in the web UI 2. Specify the amount of CPUs you want 3. The VMs is created with the amount of CPU specified Actual results: The amount of CPUs specified is not reserved and multiple VMs could easily overcommit the CPU of a worker node. Expected results: Maybe we can use the following instead in the template (like we do request memory) : ~~~ resources: requests: cpu: 8 ~~~ Someone also suggested I look at the newer cpuAllocationRatio : https://github.com/kubevirt/kubevirt/pull/4162
1. Create several vm machines using ~~~ resources: requests: cpu: 4 ~~~ This is also mentioned in here: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu This gave the expected result, the vms were distributed to the nodes according to the usage. once all nodes couldn't handle the amount of the vm cpu the vm creation was halted with an Error: PodDisruptionBudget The moment the a running vm was deleted (freeing the cpu resource) a new vm could be started. 2. Created several vm machines without requesting cpu. (with cpu manager on and off had the same result) The machines were created without an issue while going over the node limit - the VM were still being distributed equally as far as i could tell. the nodes CPU usage stat using "oc adm top nodes" looked good (avg of 45%), I presume an overcommit of the cpu would occur if i start running processes on the vms and use some cpu power. since they're idle they aren't using much CPU.
(In reply to Roni Kishner from comment #9) > 1. Create several vm machines using > ~~~ > resources: > requests: > cpu: 4 > ~~~ > This is also mentioned in here: > https://kubernetes.io/docs/concepts/configuration/manage-resources- > containers/#meaning-of-cpu > > This gave the expected result, the vms were distributed to the nodes > according to the usage. once all nodes couldn't handle the amount of the vm > cpu the vm creation was halted with an Error: PodDisruptionBudget > The moment the a running vm was deleted (freeing the cpu resource) a new vm > could be started. What is the behavior with CPU manager on? and off? > > 2. Created several vm machines without requesting cpu. (with cpu manager on > and off had the same result) > > The machines were created without an issue while going over the node limit - > the VM were still being distributed equally as far as i could tell. > the nodes CPU usage stat using "oc adm top nodes" looked good (avg of 45%), > I presume an overcommit of the cpu would occur if i start running processes > on the vms and use some cpu power. since they're idle they aren't using much > CPU. From this check i understand that overcommit of CPU is working by can you give more info, like: 1. Number of VMs 2. How much CPU you set on the VM 3. Node CPU how much CPU on the node? (run lscpu on the node)
When putting cpu request on the vm and turning off cpu manager, i noticed the vms are being randomly distributed between the nodes (i could tell that now by the error given). the main behaviour of stopping VM creation on full CPU capacity was still being done, only you had to stop/start a vm until it would land on the available node. when i created several vm without cpu request i created 12 VMs each had 3 cores. all of them were running without any issue. the setup was running on 3 worker nodes, each with 8 cpu. this correspond to the previous check where i put request cpu of 3 on the vm, and the vm creation was halted when i tried to create the 7 vm since each node already had 6 cpu in use *Note: besides vms there are other resources requesting cpu so in theory if we want to create a 16 cpu vm we need a 17 cpu node
Based on the above, requests.cpu will be added to high performance templates
The addition needs to be synced with UI @tnisan
(In reply to Ruth Netser from comment #12) > Based on the above, requests.cpu will be added to high performance templates Added as a template parameter ? (In reply to Ruth Netser from comment #13) > The addition needs to be synced with UI > @tnisan The create virtual machine wizard in the UI support editing requests.cpu buy overriding the static value, even if it's not a parameter, AFAIK no need to update the UI will be needed. Note: we will need to test the UI with requests.cpu as a parameter, AFAIU it should just work.
Latest version of 4.10 as of this moment have this issue fixed. when creating a VM using the high performance template a VM will be created only if enough CPU is available to use, if there isn't a node with enough cpu, the VM creation will be halted until enough cpu is free. This is done by applying the "dedicatedCpuPlacement: true" parameter under cpu, In case one does not want to use the high performance template adding this parameter will prevent over commit of cpu, and turning it off (false) will allow over commit
(In reply to Roni Kishner from comment #15) > Latest version of 4.10 as of this moment have this issue fixed. > > when creating a VM using the high performance template a VM will be created > only if enough CPU is available to use, if there isn't a node with enough > cpu, the VM creation will be halted until enough cpu is free. > > This is done by applying the "dedicatedCpuPlacement: true" parameter under > cpu, In case one does not want to use the high performance template adding > this parameter will prevent over commit of cpu, and turning it off (false) > will allow over commit @jsaucier does this solve the issue from your point of view?
@dholler yes, this seems to effectively solve the issue on my end!
@jsaucier Thanks for reporting and the feedback!