Bug 871726
| Summary: | CPU pinning validation missing | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ido Begun <ibegun> | ||||
| Component: | ovirt-engine | Assignee: | Noam Slomianko <nslomian> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Ido Begun <ibegun> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.1.0 | CC: | dfediuck, dyasny, ecohen, iheim, lpeer, nslomian, omasad, oramraz, pstehlik, Rhev-m-bugs, sgrinber, yeylon, ykaul | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.2.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | sla | ||||||
| Fixed In Version: | sf4 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | Bug | |||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Ido, Can you specify which can-do-action error you refer to? The scenario I see in your log files is that you get a CDA error due to the VM state and not CPU topology. Adding some more info; In order to handle this, we need to evaluate the pinning-topology on run vm, compared to vds.getcpu_cores. Verification needed WRT the way HT is being considered. ie- threads as cores or not. I'm referring to: 2012-10-31 10:16:45,472 WARN [org.ovirt.engine.core.bll.RunVmCommand] (pool-4-thread-46) CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VM_PINNED_TO_HOST_CANNOT_RUN_ON_THE_DEFAULT_VDS,VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_MIGRATION_TO_SAME_HOST (appears at the end of engine.log) (or when trying to pin an existing vCPU to a non-existing pCPU: 2012-11-26 13:11:37,251 WARN [org.ovirt.engine.core.bll.RunVmCommand] (pool-4-thread-50) CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VM_PINNED_TO_HOST_CANNOT_RUN_ON_THE_DEFAULT_VDS,VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_CLUSTER) Thanks Ido, both errors are not related directly to cpu-pin, and has to do with the host state. So testing and handling it in the backend should prevent it. Currently VDSM reports to the engine the number of cores and not the number of threads (e.i. does not factor hyperthreading) by default. Meaning that with the current configuration on a machine with 1 socket * 4 cores * 2 thread the engine only "sees" 4 CPUs and will address pining to CPU 5 as an error. (e.i. 0#5 will now fail validation) Any input on the subject? Important notes:
1) Validation assumes that cpu pinning is only valid when the VM is pinned to host
2) Validation types are:
- syntax is correct
- all given vcpus exist (e.i vcpu number < max cpu# on vm)
- only one definition for each vcpu
- if defined, a vcpu is pined to at least one pcpu (e.g 0#1-2,^1,^2 isn't valid)
- all given pcpu exist on host*
*this definition is still under debate as I asked in the previous comment ^
*it might mean pcpu number < max cpu# on host or pcpu number < max thread# on host
It was decided: - cluster version < 3.2: do not validate host CPUs (data not reliable) - cluster version >= 3.2 validate if pinned to host is set OK - SF7 The cases mentioned on comments 5+6 seem to include all cases, and they're all checked when setting CPU pinning. 3.2 has been released 3.2 has been released 3.2 has been released |
Created attachment 635983 [details] Logs Description of problem: When starting a VM with bad CPU pinning (i.e. pinning a non-existing virtual CPU to physical CPU or the other way around), no proper error is given. The UI attemps to start the VM (displaying VM status "Waiting for launch" for a moment), then fails. The events tab logs those failures as: "VM <name> is down. Exit message: internal error vcpu id must be less than maxvcpus." when pinning a non-existing vCPU to a pCPU. "VM <name> is down. Exit message: cannot set CPU affinity on process <pid>: Invalid argument." when pinning a vCPU to a non existing pCPU. Also, engine.log logs CanDoAction errors properly. However, there is no popup error on UI, similar to other CanDoAction errors. Version-Release number of selected component (if applicable): rhevm-3.1.0-22.el6ev How reproducible: 100% Steps to Reproduce: 1. Set CPU pinning on a VM with a non-existing vCPU. 2. Run VM. & 1. Set CPU pinning on a VM with a non-existing pCPU. 2. Run VM. Actual results: No proper error is given. Expected results: a popup error should be displayed with proper details regarding the wrong CPU pinning. Additional info: