Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 871726

Summary: CPU pinning validation missing
Product: Red Hat Enterprise Virtualization Manager Reporter: Ido Begun <ibegun>
Component: ovirt-engineAssignee: Noam Slomianko <nslomian>
Status: CLOSED CURRENTRELEASE QA Contact: Ido Begun <ibegun>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: dfediuck, dyasny, ecohen, iheim, lpeer, nslomian, omasad, oramraz, pstehlik, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: sf4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs none

Description Ido Begun 2012-10-31 08:33:11 UTC
Created attachment 635983 [details]
Logs

Description of problem:
When starting a VM with bad CPU pinning (i.e. pinning a non-existing virtual CPU to physical CPU or the other way around), no proper error is given.
The UI attemps to start the VM (displaying VM status "Waiting for launch" for a moment), then fails.
The events tab logs those failures as:
"VM <name> is down. Exit message: internal error vcpu id must be less than maxvcpus." when pinning a non-existing vCPU to a pCPU.
"VM <name> is down. Exit message: cannot set CPU affinity on process <pid>: Invalid argument." when pinning a vCPU to a non existing pCPU.

Also, engine.log logs CanDoAction errors properly.
However, there is no popup error on UI, similar to other CanDoAction errors.

Version-Release number of selected component (if applicable):
rhevm-3.1.0-22.el6ev

How reproducible:
100%

Steps to Reproduce:
1. Set CPU pinning on a VM with a non-existing vCPU.
2. Run VM.

&

1. Set CPU pinning on a VM with a non-existing pCPU.
2. Run VM.
  
Actual results:
No proper error is given.

Expected results:
a popup error should be displayed with proper details regarding the wrong CPU pinning.

Additional info:

Comment 1 Doron Fediuck 2012-11-26 10:23:18 UTC
Ido,
Can you specify which can-do-action error you refer to?

The scenario I see in your log files is that you get a CDA error due to the VM state and not CPU topology.


Adding some more info;
In order to handle this, we need to evaluate the pinning-topology on run vm, compared to vds.getcpu_cores.
Verification needed WRT the way HT is being considered. ie- threads as cores or not.

Comment 2 Ido Begun 2012-11-26 11:27:38 UTC
I'm referring to:
2012-10-31 10:16:45,472 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (pool-4-thread-46) CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VM_PINNED_TO_HOST_CANNOT_RUN_ON_THE_DEFAULT_VDS,VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_MIGRATION_TO_SAME_HOST
(appears at the end of engine.log)

(or when trying to pin an existing vCPU to a non-existing pCPU:
2012-11-26 13:11:37,251 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (pool-4-thread-50) CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VM_PINNED_TO_HOST_CANNOT_RUN_ON_THE_DEFAULT_VDS,VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_CLUSTER)

Comment 3 Doron Fediuck 2012-11-26 14:00:21 UTC
Thanks Ido,
both errors are not related directly to cpu-pin, and has to do with the host state.

So testing and handling it in the backend should prevent it.

Comment 4 Noam Slomianko 2012-12-16 16:33:59 UTC
Currently VDSM reports to the engine the number of cores and not the number of threads (e.i. does not factor hyperthreading) by default.

Meaning that with the current configuration on a machine with 1 socket * 4 cores * 2 thread the engine only "sees" 4 CPUs and will address pining to CPU 5 as an error. (e.i. 0#5 will now fail validation)


Any input on the subject?

Comment 5 Noam Slomianko 2012-12-23 14:10:26 UTC
Important notes:

1) Validation assumes that cpu pinning is only valid when the VM is pinned to host

2) Validation types are:
 - syntax is correct
 - all given vcpus exist (e.i vcpu number < max cpu# on vm)
 - only one definition for each vcpu
 - if defined, a vcpu is pined to at least one pcpu (e.g 0#1-2,^1,^2 isn't valid)
 - all given pcpu exist on host*
    *this definition is still under debate as I asked in the previous comment ^
    *it might mean  pcpu number < max cpu# on host or pcpu number < max thread# on host

Comment 6 Noam Slomianko 2013-01-01 10:14:23 UTC
It was decided:
 - cluster version < 3.2: do not validate host CPUs (data not reliable)
 - cluster version >= 3.2 validate if pinned to host is set

Comment 9 Ido Begun 2013-02-11 14:15:38 UTC
OK - SF7

The cases mentioned on comments 5+6 seem to include all cases, and they're all checked when setting CPU pinning.

Comment 10 Itamar Heim 2013-06-11 08:31:04 UTC
3.2 has been released

Comment 11 Itamar Heim 2013-06-11 08:31:15 UTC
3.2 has been released

Comment 12 Itamar Heim 2013-06-11 08:32:48 UTC
3.2 has been released