Bug 1148092 - [PPC] Mismatch in CPU pinning support
Summary: [PPC] Mismatch in CPU pinning support
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: unspecified
Hardware: ppc64
OS: Unspecified
high
high
Target Milestone: ovirt-3.6.1
: 3.6.1
Assignee: Roman Mohr
QA Contact: Artyom
URL:
Whiteboard:
: 1148906 (view as bug list)
Depends On:
Blocks: 1122979 1155985 1171724 RHEV3.6PPC 1277183 1277184
TreeView+ depends on / blocked
 
Reported: 2014-09-30 16:55 UTC by Vitor de Lima
Modified: 2016-04-20 01:35 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: On ppc64 online CPUs were not correctly reported to ovirt-engine because consecutive cpu ids were expected by vdsm and the engine, which is not always the case on ppc64. Consequence: Although a cpu was online it was not reported as online which lead to the inability to start VMs with correct cpu pinnings. Fix: Until 3.6.3 as a workaround the validation was disabled in the engine. See https://bugzilla.redhat.com/show_bug.cgi?id=1279375 for versions >=3.6.3. Result: All valid pinnings are now passing the validation but some invalid pinnings too.
Clone Of:
: 1155985 1171724 (view as bug list)
Environment:
Last Closed: 2016-04-20 01:35:46 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 33788 0 master MERGED core, engine, webadmin: Consider only online logical CPUs Never
oVirt gerrit 33789 0 master MERGED caps: Report online logical CPUs Never
oVirt gerrit 33872 0 master MERGED caps: Do not use lscpu on ppc64 Never
oVirt gerrit 34768 0 ovirt-3.5 MERGED caps: Report online logical CPUs Never
oVirt gerrit 34769 0 ovirt-engine-3.5 MERGED core, engine, webadmin: Consider only online logical CPUs Never
oVirt gerrit 35760 0 ovirt-engine-3.5 MERGED core, engine: Disable CPU pinning validation Never
oVirt gerrit 35782 0 master MERGED core, engine: Disable CPU pinning validation Never

Description Vitor de Lima 2014-09-30 16:55:22 UTC
Description of problem:
VDSM and engine assume that the CPU ids of each physical core are consecutive and these ids never exceed the number of cores present. This is not true in ppc64 hosts because in virtualized enviroments the SMT must be disabled and several hardware threads are 

Version-Release number of selected component (if applicable):
3.4.2-0.0.2.20140825gita78caee

How reproducible:
Always

Steps to Reproduce:
1. Run lscpu and choose the last available online CPU
2. Edit/create a VM, go to "Resource Allocation" and pin a vCPU into
the CPU chosen in the first step
3. Press "OK"

Actual results:
The error "CPU pinning validation failed - CPU does not exist in host." is shown.

Expected results:
The VM should be pinned to the CPU.

Additional info:

lscpu output of a ppc64 host:

Architecture:          ppc64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Big Endian
CPU(s):                160
On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152
Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79,81-87,89-95,97-103,105-111,113-119,121-127,129-135,137-143,145-151,153-159
Thread(s) per core:    1
Core(s) per socket:    5
Socket(s):             4
NUMA node(s):          4
Model:                 8247-22L
CPU max MHz:           3690.0000
CPU min MHz:           2061.0000
L1d cache:             64K
L1i cache:             32K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0,8,16,24,32
NUMA node1 CPU(s):     40,48,56,64,72
NUMA node16 CPU(s):    80,88,96,104,112
NUMA node17 CPU(s):    120,128,136,144,152

Comment 1 Vitor de Lima 2014-09-30 16:57:33 UTC
More details about the bug:

...and several hardware threads are disabled, resulting in a number of offline CPUs between the valid ids of the online CPUs.

Comment 3 Jiri Moskovcak 2014-10-06 11:29:43 UTC
*** Bug 1148906 has been marked as a duplicate of this bug. ***

Comment 4 Doron Fediuck 2014-10-07 14:44:53 UTC
For now leaving this on 3.4.4 although it may slip.

Vitor, there are several issues here;
1. First CPU is 0 and not 1. This may be related to the problem you see.
2. The actual pinning is handled by libvirt, and we only provide the mapping based on the API. So if the mapping is correct and the result is wrong, this goes further into libvirt.
3. the one thing we have here for sure is validation issue, where we assume we cannot have a cpu id higher than the cpu count. So this is what actually needs to be fixed.

Can you please review the above and comment?

Comment 5 Vitor de Lima 2014-10-07 15:34:24 UTC
The problem is related to how VDSM/engine assumes the CPU ids are distributed, it assumes all the CPU ids are consecutive and there are not any disabled CPUs.

In the CPU pinning validation in the engine backend, the physical CPU id is considered valid if it is smaller than the number of CPUs. This causes two problems: some disabled CPUs (which cannot have vCPUs pinned into it) are considered valid and CPUs with an id larger than this number are considered invalid (in the example above there are 20 CPUs and the last id is 152).

The proposed solution passes the list of CPUs that are online from the host to the engine and uses this information in the pinning validation, only considering valid the configurations using CPUs present on this list.

Comment 8 Julie 2014-11-24 04:12:53 UTC
Hi Michal,
     Can you provide the doc text for release notes?

Cheers,
Julie

Comment 9 Michal Skrivanek 2014-11-25 07:45:04 UTC
is the 33872 gerrit changeset only optional? If not the bug should not be MODIFIED?

Comment 10 Vitor de Lima 2014-11-25 16:13:37 UTC
In ppc64 this change is needed because currently in this platform VDSM bypasses libvirt to get topology infomation and uses lscpu instead.

Comment 11 Michal Skrivanek 2014-11-27 10:24:27 UTC
actually the last patch is missing

Comment 12 Michal Skrivanek 2014-11-27 15:05:03 UTC
let's unblock the UI check, it doesn't require any vdsm changes
the patch 33872 is optional

Comment 13 Julie 2014-12-03 06:45:21 UTC
Removing the release notes flag as the bug status is still on POST and doc text has been provided. Please provided the doc text and flag the flags again if this bug indeed identified for the release notes.

Cheers,
Julie

Comment 14 Michal Skrivanek 2014-12-08 13:20:24 UTC
not really critical since we didn't fix 3.4.z, so unblocking GA and moving this to 3.5.1 to minimize risk of regressions

Comment 16 Artyom 2014-12-28 11:27:15 UTC
This patch http://gerrit.ovirt.org/#/c/35782/ broke one of automation tests
Also it's not good idea to cancel cpu pinning validation, because now if you enter incorrect pinning information and run vm, vm will failed and you will receive libvirt error in vdsm log, that not desire behavior at all.

Comment 20 Artyom 2015-01-07 09:10:45 UTC
I still have problem with validation of cpu pinning, please fix it or revert this patch http://gerrit.ovirt.org/#/c/35782/

Comment 21 Artyom 2015-01-07 09:10:56 UTC
I still have problem with validation of cpu pinning, please fix it or revert this patch http://gerrit.ovirt.org/#/c/35782/

Comment 22 Michal Skrivanek 2015-01-08 14:41:08 UTC
(In reply to Artyom from comment #21)
> I still have problem with validation of cpu pinning, please fix it or revert
> this patch http://gerrit.ovirt.org/#/c/35782/

what problem?

Comment 23 Artyom 2015-01-08 15:01:52 UTC
See comment 16

Comment 24 Michal Skrivanek 2015-01-08 15:23:29 UTC
this is a 3.6 bug where the solution is not finished yet (hence the bug is in POST)
however the behavior in 3.5 is exactly like you've described and was intentional.

if you're complaining about 3.6 it belongs here, if about 3.5 please contnue discussion there (the test there probably needs to be adjusted)

Comment 25 Artyom 2015-01-08 15:52:59 UTC
I wrote the same comment under bug https://bugzilla.redhat.com/show_bug.cgi?id=1171724

Comment 26 Eyal Edri 2015-06-03 20:30:50 UTC
still relevant now that we moved to ppc64le?

Comment 27 Michal Skrivanek 2015-06-05 13:10:47 UTC
Roy, do you want to check that CPU pinning validation? It should be possible to add it back (I believe it is removed in master as well as in 3.5)

Comment 30 Roy Golan 2015-10-07 08:26:38 UTC
(In reply to Michal Skrivanek from comment #27)
> Roy, do you want to check that CPU pinning validation? It should be possible
> to add it back (I believe it is removed in master as well as in 3.5)

Yes this should be back on. Roman comments?

Comment 31 Sandro Bonazzola 2015-10-26 12:43:50 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 32 Roy Golan 2015-11-04 10:21:37 UTC
This fix should be checked on 3.6.0 BUT we open a new BZ to revert the behaviour 
to validate CPU pinning.

Roman can you please open a new bz for 3.6.1?

Comment 33 Artyom 2015-11-05 09:59:31 UTC
Verified on rhevm-3.6.0.3-0.1.el6.noarch
CPU pinning works as expected

Comment 34 Roman Mohr 2015-11-09 10:20:11 UTC
(In reply to Roy Golan from comment #32)
> This fix should be checked on 3.6.0 BUT we open a new BZ to revert the
> behaviour 
> to validate CPU pinning.
> 
> Roman can you please open a new bz for 3.6.1?

Bug 1279375 for reintroducing cpu pinning checks created.


Note You need to log in before you can comment on or make changes to this bug.