Bug 1794304

Summary: correctly configured VM fails on start with error: CPU IDs in <numa> exceed the <vcpu> count.
Product: [oVirt] ovirt-engine Reporter: Polina <pagranat>
Component: BLL.VirtAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED DUPLICATE QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.0CC: bugs, rbarry
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 16:01:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1437559    
Attachments:
Description Flags
logs none

Description Polina 2020-01-23 08:55:48 UTC
Created attachment 1654785 [details]
logs

Description of problem: Vm fails on start with ERROR: EVENT_ID: VM_DOWN_ERROR(119), VM vm is down with error. Exit message: internal error: CPU IDs in <numa> exceed the <vcpu> count.

Version-Release number of selected component (if applicable):
http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-14

How reproducible:100%


Steps to Reproduce:
1.Configure the VM with 5CPUs (5 Virtual Sockets, 1 Cores per Virtual Socket, 1 Threads per Core) and 2 numa nodes. Start. Check engine log correct numa configuration
     <numa>
      <cell id='0' cpus='0-7,16-71' memory='524288' unit='KiB'/>
      <cell id='1' cpus='8-15,72-127' memory='524288' unit='KiB'/>
    </numa>


2. Reconfigure the VM to have 16 CPUs(2 Virtual Sockets, 2Cores per Virtual Socket, 4 Threads per Core). Leave Numa node 2. Restart the VM .

Actual results:the VM fails on startup with ERROR: EVENT_ID: VM_DOWN_ERROR(119), VM vm is down with error. Exit message: internal error: CPU IDs in <numa> exceed the <vcpu> count.
in engine.log
    ...
    <cpu match="exact">
    <model>Westmere</model>
    <topology cores="2" threads="4" sockets="16"/>
    <numa>
      <cell id="1" cpus="3-4,16-77" memory="524288"/>
      <cell id="0" cpus="0-2,78-138" memory="524288"/>
    </numa>
    </cpu>
    ...

Expected results: VM starts
xml contains :
    <numa>
      <cell id='0' cpus='0-7,16-71' memory='524288' unit='KiB'/>
      <cell id='1' cpus='8-15,72-127' memory='524288' unit='KiB'/>
    </numa>

Additional info: 
In the attached logs
2020-01-23 10:07:07,864+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-9) [2b14e989] EVENT_ID: VM_DOWN_ERROR(119), VM vm is down with error. Exit message: internal error: CPU IDs in <numa> exceed the <vcpu> count.

Comment 1 Michal Skrivanek 2020-01-23 09:01:54 UTC
when you change topology the numa assignemnt/reservation is not necessarily valid anymore. it should be probably dropped and recreated on any such change

Comment 2 Ryan Barry 2020-01-23 16:01:11 UTC
This is part of testing rhbz#1437559, not a blocker. Closing. Let's resolve it there.

*** This bug has been marked as a duplicate of bug 1437559 ***