Description of problem: VDSM and engine assume that the CPU ids of each physical core are consecutive and these ids never exceed the number of cores present. This is not true in ppc64 hosts because in virtualized enviroments the SMT must be disabled and several hardware threads are Version-Release number of selected component (if applicable): 3.4.2-0.0.2.20140825gita78caee How reproducible: Always Steps to Reproduce: 1. Run lscpu and choose the last available online CPU 2. Edit/create a VM, go to "Resource Allocation" and pin a vCPU into the CPU chosen in the first step 3. Press "OK" Actual results: The error "CPU pinning validation failed - CPU does not exist in host." is shown. Expected results: The VM should be pinned to the CPU. Additional info: lscpu output of a ppc64 host: Architecture: ppc64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Big Endian CPU(s): 160 On-line CPU(s) list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152 Off-line CPU(s) list: 1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79,81-87,89-95,97-103,105-111,113-119,121-127,129-135,137-143,145-151,153-159 Thread(s) per core: 1 Core(s) per socket: 5 Socket(s): 4 NUMA node(s): 4 Model: 8247-22L CPU max MHz: 3690.0000 CPU min MHz: 2061.0000 L1d cache: 64K L1i cache: 32K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0,8,16,24,32 NUMA node1 CPU(s): 40,48,56,64,72 NUMA node16 CPU(s): 80,88,96,104,112 NUMA node17 CPU(s): 120,128,136,144,152
More details about the bug: ...and several hardware threads are disabled, resulting in a number of offline CPUs between the valid ids of the online CPUs.
*** Bug 1148906 has been marked as a duplicate of this bug. ***
For now leaving this on 3.4.4 although it may slip. Vitor, there are several issues here; 1. First CPU is 0 and not 1. This may be related to the problem you see. 2. The actual pinning is handled by libvirt, and we only provide the mapping based on the API. So if the mapping is correct and the result is wrong, this goes further into libvirt. 3. the one thing we have here for sure is validation issue, where we assume we cannot have a cpu id higher than the cpu count. So this is what actually needs to be fixed. Can you please review the above and comment?
The problem is related to how VDSM/engine assumes the CPU ids are distributed, it assumes all the CPU ids are consecutive and there are not any disabled CPUs. In the CPU pinning validation in the engine backend, the physical CPU id is considered valid if it is smaller than the number of CPUs. This causes two problems: some disabled CPUs (which cannot have vCPUs pinned into it) are considered valid and CPUs with an id larger than this number are considered invalid (in the example above there are 20 CPUs and the last id is 152). The proposed solution passes the list of CPUs that are online from the host to the engine and uses this information in the pinning validation, only considering valid the configurations using CPUs present on this list.
Hi Michal, Can you provide the doc text for release notes? Cheers, Julie
is the 33872 gerrit changeset only optional? If not the bug should not be MODIFIED?
In ppc64 this change is needed because currently in this platform VDSM bypasses libvirt to get topology infomation and uses lscpu instead.
actually the last patch is missing
let's unblock the UI check, it doesn't require any vdsm changes the patch 33872 is optional
Removing the release notes flag as the bug status is still on POST and doc text has been provided. Please provided the doc text and flag the flags again if this bug indeed identified for the release notes. Cheers, Julie
not really critical since we didn't fix 3.4.z, so unblocking GA and moving this to 3.5.1 to minimize risk of regressions
This patch http://gerrit.ovirt.org/#/c/35782/ broke one of automation tests Also it's not good idea to cancel cpu pinning validation, because now if you enter incorrect pinning information and run vm, vm will failed and you will receive libvirt error in vdsm log, that not desire behavior at all.
I still have problem with validation of cpu pinning, please fix it or revert this patch http://gerrit.ovirt.org/#/c/35782/
(In reply to Artyom from comment #21) > I still have problem with validation of cpu pinning, please fix it or revert > this patch http://gerrit.ovirt.org/#/c/35782/ what problem?
See comment 16
this is a 3.6 bug where the solution is not finished yet (hence the bug is in POST) however the behavior in 3.5 is exactly like you've described and was intentional. if you're complaining about 3.6 it belongs here, if about 3.5 please contnue discussion there (the test there probably needs to be adjusted)
I wrote the same comment under bug https://bugzilla.redhat.com/show_bug.cgi?id=1171724
still relevant now that we moved to ppc64le?
Roy, do you want to check that CPU pinning validation? It should be possible to add it back (I believe it is removed in master as well as in 3.5)
(In reply to Michal Skrivanek from comment #27) > Roy, do you want to check that CPU pinning validation? It should be possible > to add it back (I believe it is removed in master as well as in 3.5) Yes this should be back on. Roman comments?
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015. Please review this bug and if not a blocker, please postpone to a later release. All bugs not postponed on GA release will be automatically re-targeted to - 3.6.1 if severity >= high - 4.0 if severity < high
This fix should be checked on 3.6.0 BUT we open a new BZ to revert the behaviour to validate CPU pinning. Roman can you please open a new bz for 3.6.1?
Verified on rhevm-3.6.0.3-0.1.el6.noarch CPU pinning works as expected
(In reply to Roy Golan from comment #32) > This fix should be checked on 3.6.0 BUT we open a new BZ to revert the > behaviour > to validate CPU pinning. > > Roman can you please open a new bz for 3.6.1? Bug 1279375 for reintroducing cpu pinning checks created.