Bug 1256836

Summary: Improvement: start vm on host with unusual numa architecture failed
Product: [oVirt] ovirt-engine Reporter: Artyom <alukiano>
Component: GeneralAssignee: Nobody <nobody>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: alukiano, bugs, dfediuck, gklein, lsurette, mgoldboi, rbalakri, Rhev-m-bugs, srevivo, ykaul
Target Milestone: ---Keywords: Improvement
Target Release: ---Flags: dfediuck: ovirt-future?
rule-engine: planning_ack?
alukiano: devel_ack?
rule-engine: testing_ack?
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-17 21:13:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm none

Description Artyom 2015-08-25 14:49:52 UTC
Created attachment 1066894 [details]
vdsm

Description of problem:
Start vm with 1 CPU and without numa nodes on host, that have numa architecture
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 12276 MB
node 0 free: 11218 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 12288 MB
node 1 free: 11502 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 
and with CPU
lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Model name:            Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz
Stepping:              2
CPU MHz:               2660.000
BogoMIPS:              5066.55
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23

failed with libvirt error

Version-Release number of selected component (if applicable):
host - vdsm-4.17.3-1.el7ev.noarch
engine - rhevm-3.6.0-0.12.master.el6.noarch

How reproducible:
Always

Steps to Reproduce:
1. Add host with the same numa architecture as above to engine
2. Create vm with one cpu, that pinned to host and without numa nodes
3. Start vm

Actual results:
Vm failed to start with error
libvirtError: internal error: CPU IDs in <numa> exceed the <vcpu> count

Expected results:
Vm run without any errors

Additional info:
If I define one numa node vm success to run

dumpxml of vm with one numa node:
<cpu mode='custom' match='exact'>
    <model fallback='allow'>Conroe</model>
    <topology sockets='16' cores='1' threads='1'/>
    <numa>
      <cell id='0' cpus='0' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>


dumpxml of vm without numa node:
<cpu match="exact">
                <model>Conroe</model>
                <topology cores="1" sockets="16" threads="1"/>
                <numa>
                        <cell cpus="0,1,2,3,4,5,12,13,14,15,16,17" memory="1048576"/>
                </numa>
        </cpu>

Comment 1 Doron Fediuck 2015-08-26 13:29:53 UTC
Please verify this topology is 'legal' from numa perspective.
ie- each cell is hosting cores and ram with some reasonable correlation.
What you describe may break the topology since the data will end up interleaving.

Comment 2 Artyom 2015-08-27 10:37:47 UTC
It from IBM blade center, so I believe it legal, but problem here why we send numa cell when I not defined one via engine:
<numa>
                        <cell cpus="0,1,2,3,4,5,12,13,14,15,16,17" memory="1048576"/>
I see difference on PPC hosts and regular x86_64 hosts in vdsm log when I start vm that pinned to host without create VNUMA node

vdsm-4.17.3-1.el7ev.noarch

On PPC host:
we not send numa node
u'cpuType': u'power8', u'smp': u'1', u'smartcardEnable': u'false'

On x86_64 host:
u'cpuType': u'Conroe', u'smp': u'1', u'guestNumaNodes': [{u'nodeIndex': 0, u'cpus': u'0,1,2,3,4,5,12,13,14,15,16,17', u'memory': u'1024'}], u'smartcardEnable': u'false'

Comment 4 Doron Fediuck 2015-11-17 16:21:22 UTC
(In reply to Artyom from comment #2)
> It from IBM blade center, so I believe it legal, but problem here why we
> send numa cell when I not defined one via engine:
> <numa>
>                         <cell cpus="0,1,2,3,4,5,12,13,14,15,16,17"
> memory="1048576"/>
> I see difference on PPC hosts and regular x86_64 hosts in vdsm log when I
> start vm that pinned to host without create VNUMA node
> 
> vdsm-4.17.3-1.el7ev.noarch
> 
> On PPC host:
> we not send numa node
> u'cpuType': u'power8', u'smp': u'1', u'smartcardEnable': u'false'
> 
> On x86_64 host:
> u'cpuType': u'Conroe', u'smp': u'1', u'guestNumaNodes': [{u'nodeIndex': 0,
> u'cpus': u'0,1,2,3,4,5,12,13,14,15,16,17', u'memory': u'1024'}],
> u'smartcardEnable': u'false'

PPC currently does not support NUMA.
Did you find this issue on a PPC machine or a standard AMD64 machine?

Comment 5 Artyom 2015-11-29 10:17:48 UTC
1) Where you found information that NUMA not supported on ppc64 architecture, I checked it on our power8 hosts and it works fine?
2) I found this issue on x86_64, but on vdsm-4.17.11-0.el7ev.noarch and libvirt-1.2.17-13.el7.x86_64 error not appear and cpu element looks:
<cpu match="exact">
                <model>Conroe</model>
                <topology cores="1" sockets="16" threads="1"/>
                <numa>
                        <cell cpus="0" memory="1048576"/>
                </numa>
        </cpu>