Description of problem: Latest generation CPUs introduced a new level in the topology referred to as a "die", sitting between the socket & core. To see the probolem, first assume bug 1785207 is implemented then configure a guest with: <vcpu placement='static'>12</vcpu> <cpu mode='host-passthrough' check='none'> <topology sockets='2' dies='3' cores='2' threads='1'/> </cpu> Inside that guest, libvirt reports # virsh capabilities <capabilities> <host> <uuid>8a370a1f-7e1b-4c33-9533-5d44519f3a1d</uuid> <cpu> ...snip... <topology sockets='1' cores='12' threads='1'/> ...snip... </cpu> ...snip... <topology> <cells num='1'> <cell id='0'> ...snip... <cpus num='12'> <cpu id='0' socket_id='0' core_id='0' siblings='0'/> <cpu id='1' socket_id='0' core_id='1' siblings='1'/> <cpu id='2' socket_id='0' core_id='0' siblings='2'/> <cpu id='3' socket_id='0' core_id='1' siblings='3'/> <cpu id='4' socket_id='0' core_id='0' siblings='4'/> <cpu id='5' socket_id='0' core_id='1' siblings='5'/> <cpu id='6' socket_id='1' core_id='0' siblings='6'/> <cpu id='7' socket_id='1' core_id='1' siblings='7'/> <cpu id='8' socket_id='1' core_id='0' siblings='8'/> <cpu id='9' socket_id='1' core_id='1' siblings='9'/> <cpu id='10' socket_id='1' core_id='0' siblings='10'/> <cpu id='11' socket_id='1' core_id='1' siblings='11'/> </cpus> </cell> </cells> </topology> </host> Notice that 'core_id' is not unique wrt to 'socket_id'. THis is because we're missing reporting of 'dies'. Also the <topology> reports 1 socket, 12 cores. Something here is getting confused by the dies. It should be reporting 2 sockets, 6 cores. Ideally we should report the dies <topology sockets='2' dies='3' cores='2' threads='1'/> This would be a semantic change in libvirt output as sockets*cores*threads would no longer sum to total CPU count. So we probably have to *not* report dies in the topology here, but we should at least make sure we get 'sockets' reported correctly, and have reported cores equal to real cores * dies. There's are new sysfs files "die_id" and "die_cpus" and "die_cpus_list" at /sys/devices/system/cpu/cpuXXX/topology/ where libvirt can / should fetch this info Note for real hardware, AFAIK, only CascadeLake-AP CPUs use 'dies' right now. This is Intel labelled "Xeon Platinum" in /proc/cpuinfo. AFAICT, AMD EPYC boxes that I've seen do *not* report 'dies', even though IIUC they do use this concept in silicon. I'm not sure if I simply haven't tested the rioght generation of EPYC, or if the kernel isn't correctly reporting 'dies' for EPYC or something else. Version-Release number of selected component (if applicable): libvirt-5.10.0-1
Patches at https://www.redhat.com/archives/libvir-list/2019-December/msg01249.html
Testing this will require RHEL 8 kernel-4.18.0-147.4.el8 or newer, via bug 1616309
Verified this bug with libvirt-6.0.0-3.module+el8.2.0+5633+b0e06c1a.x86_64. Version: libvirt-6.0.0-3.module+el8.2.0+5633+b0e06c1a.x86_64 qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 kernel-4.18.0-175.el8.x86_64 Steps: 1. Check the output of "virsh capabilities" # virsh capabilities ... <host> ... <topology> <cells num='8'> <cell id='0'> <memory unit='KiB'>16142068</memory> <pages unit='KiB' size='4'>4035517</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='10'/> <sibling id='1' value='16'/> <sibling id='2' value='16'/> <sibling id='3' value='16'/> <sibling id='4' value='32'/> <sibling id='5' value='32'/> <sibling id='6' value='32'/> <sibling id='7' value='32'/> </distances> <cpus num='4'> <cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0,16'/> <cpu id='1' socket_id='0' die_id='0' core_id='4' siblings='1,17'/> <cpu id='16' socket_id='0' die_id='0' core_id='0' siblings='0,16'/> <cpu id='17' socket_id='0' die_id='0' core_id='4' siblings='1,17'/> </cpus> </cell> <cell id='1'> <pages unit='KiB' size='4'>0</pages> <distances> <sibling id='0' value='16'/> <sibling id='1' value='10'/> <sibling id='2' value='16'/> <sibling id='3' value='16'/> <sibling id='4' value='32'/> <sibling id='5' value='32'/> <sibling id='6' value='32'/> <sibling id='7' value='32'/> </distances> <cpus num='4'> <cpu id='2' socket_id='0' die_id='0' core_id='8' siblings='2,18'/> <cpu id='3' socket_id='0' die_id='0' core_id='12' siblings='3,19'/> <cpu id='18' socket_id='0' die_id='0' core_id='8' siblings='2,18'/> <cpu id='19' socket_id='0' die_id='0' core_id='12' siblings='3,19'/> </cpus> </cell> <cell id='2'> <pages unit='KiB' size='4'>0</pages> <distances> <sibling id='0' value='16'/> <sibling id='1' value='16'/> <sibling id='2' value='10'/> <sibling id='3' value='16'/> <sibling id='4' value='32'/> <sibling id='5' value='32'/> <sibling id='6' value='32'/> <sibling id='7' value='32'/> </distances> <cpus num='4'> <cpu id='4' socket_id='0' die_id='0' core_id='16' siblings='4,20'/> <cpu id='5' socket_id='0' die_id='0' core_id='20' siblings='5,21'/> <cpu id='20' socket_id='0' die_id='0' core_id='16' siblings='4,20'/> <cpu id='21' socket_id='0' die_id='0' core_id='20' siblings='5,21'/> </cpus> </cell> <cell id='3'> <pages unit='KiB' size='4'>0</pages> <distances> <sibling id='0' value='16'/> <sibling id='1' value='16'/> <sibling id='2' value='16'/> <sibling id='3' value='10'/> <sibling id='4' value='32'/> <sibling id='5' value='32'/> <sibling id='6' value='32'/> <sibling id='7' value='32'/> </distances> <cpus num='4'> <cpu id='6' socket_id='0' die_id='0' core_id='24' siblings='6,22'/> <cpu id='7' socket_id='0' die_id='0' core_id='28' siblings='7,23'/> <cpu id='22' socket_id='0' die_id='0' core_id='24' siblings='6,22'/> <cpu id='23' socket_id='0' die_id='0' core_id='28' siblings='7,23'/> </cpus> </cell> <cell id='4'> <memory unit='KiB'>16462792</memory> <pages unit='KiB' size='4'>4115698</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='32'/> <sibling id='1' value='32'/> <sibling id='2' value='32'/> <sibling id='3' value='32'/> <sibling id='4' value='10'/> <sibling id='5' value='16'/> <sibling id='6' value='16'/> <sibling id='7' value='16'/> </distances> <cpus num='4'> <cpu id='8' socket_id='1' die_id='0' core_id='0' siblings='8,24'/> <cpu id='9' socket_id='1' die_id='0' core_id='4' siblings='9,25'/> <cpu id='24' socket_id='1' die_id='0' core_id='0' siblings='8,24'/> <cpu id='25' socket_id='1' die_id='0' core_id='4' siblings='9,25'/> </cpus> </cell> <cell id='5'> <pages unit='KiB' size='4'>0</pages> <distances> <sibling id='0' value='32'/> <sibling id='1' value='32'/> <sibling id='2' value='32'/> <sibling id='3' value='32'/> <sibling id='4' value='16'/> <sibling id='5' value='10'/> <sibling id='6' value='16'/> <sibling id='7' value='16'/> </distances> <cpus num='4'> <cpu id='10' socket_id='1' die_id='0' core_id='8' siblings='10,26'/> <cpu id='11' socket_id='1' die_id='0' core_id='12' siblings='11,27'/> <cpu id='26' socket_id='1' die_id='0' core_id='8' siblings='10,26'/> <cpu id='27' socket_id='1' die_id='0' core_id='12' siblings='11,27'/> </cpus> </cell> <cell id='6'> <pages unit='KiB' size='4'>0</pages> <distances> <sibling id='0' value='32'/> <sibling id='1' value='32'/> <sibling id='2' value='32'/> <sibling id='3' value='32'/> <sibling id='4' value='16'/> <sibling id='5' value='16'/> <sibling id='6' value='10'/> <sibling id='7' value='16'/> </distances> <cpus num='4'> <cpu id='12' socket_id='1' die_id='0' core_id='16' siblings='12,28'/> <cpu id='13' socket_id='1' die_id='0' core_id='20' siblings='13,29'/> <cpu id='28' socket_id='1' die_id='0' core_id='16' siblings='12,28'/> <cpu id='29' socket_id='1' die_id='0' core_id='20' siblings='13,29'/> </cpus> </cell> <cell id='7'> <pages unit='KiB' size='4'>0</pages> <distances> <sibling id='0' value='32'/> <sibling id='1' value='32'/> <sibling id='2' value='32'/> <sibling id='3' value='32'/> <sibling id='4' value='16'/> <sibling id='5' value='16'/> <sibling id='6' value='16'/> <sibling id='7' value='10'/> </distances> <cpus num='4'> <cpu id='14' socket_id='1' die_id='0' core_id='24' siblings='14,30'/> <cpu id='15' socket_id='1' die_id='0' core_id='28' siblings='15,31'/> <cpu id='30' socket_id='1' die_id='0' core_id='24' siblings='14,30'/> <cpu id='31' socket_id='1' die_id='0' core_id='28' siblings='15,31'/> </cpus> </cell> </cells> </topology> ... </host> ... The "die_id" parameter can be seen in the "topology" element, which is expected. Move this bug to be verified.
Hi Daniel I have noticed the following info in the downstream patch. "AMD has confirmed they do *NOT* expect it to be reported for EPYC CPUs, as they expect apps should just use the NUMA topology that (optionally per BIOS config) puts each die in a separate NUMA cell." And unfortunately, I tested the scenario in comment 5 with a EPYC machine. So Q1: I am not sure whether it will be okay to verifying this bug, will the test result differ on different machines? And in the other bug: Bug 1785207 "Anyway, the upshot is I've not found any real hardware to test this series on. I've tested it only inside a QEMU guest with the suitable -smp arg to fake dies." Q2: What kind of physical machine can support this function? Q3: I have tested the "dies" id parameter with the following scenario. The "current" vcpus may cause the failure of starting VM, which seems unreasonable. (The following scenario is also tested on EPYC physical machine.) # rpm -qa libvirt qemu-kvm kernel qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 libvirt-6.0.0-3.module+el8.2.0+5633+b0e06c1a.x86_64 kernel-4.18.0-175.el8.x86_64 # virsh domstate test82 shut off # virsh dumpxml test82 |grep -E "vcpu|topology" <vcpu placement='static' current='52'>128</vcpu> <topology sockets='8' dies='2' cores='4' threads='2'/> # virsh start test82 error: Failed to start domain test82 error: internal error: qemu didn't report thread id for vcpu '48' # virsh dumpxml test82 |grep -E "vcpu|topology" <vcpu placement='static' current='48'>128</vcpu> <topology sockets='8' dies='2' cores='4' threads='2'/> # virsh start test82 Domain test82 started
(In reply to jiyan from comment #6) > Hi Daniel > I have noticed the following info in the downstream patch. > > "AMD has confirmed they do *NOT* expect it to be reported for > EPYC CPUs, as they expect apps should just use the NUMA > topology that (optionally per BIOS config) puts each die > in a separate NUMA cell." > > And unfortunately, I tested the scenario in comment 5 with a EPYC machine. > So Q1: I am not sure whether it will be okay to verifying this bug, will the > test result differ on different machines? All machines with 8.2 kernels should support reporting a die_id value, but almost all of them (including EPYC) will report a value of 0. This test shows that libvirt exposes the info in the XML, but it would be desirable to test on a machine reporting a non-zero value. > And in the other bug: Bug 1785207 > "Anyway, the upshot is I've not found any real hardware to test this > series on. I've tested it only inside a QEMU guest with the suitable > -smp arg to fake dies." > > Q2: What kind of physical machine can support this function? AFAICT, the only CPU which report a non-zero die_id currently are CascadeLake-AP These are pretty rare - I only see a small number of machines with these CPUs in beaker, so it might be hard for you to find one to test on. If you can't find real hardware, then the next best thing todo is to create a KVM guest which has multiple dies in the <topology>. Inside that guest, then run libvirtd and look at 'virsh capabilities', which should report non-zero values for die_id. > Q3: I have tested the "dies" id parameter with the following scenario. > The "current" vcpus may cause the failure of starting VM, which seems > unreasonable. (The following scenario is also tested on EPYC physical > machine.) > > # rpm -qa libvirt qemu-kvm kernel > qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 > libvirt-6.0.0-3.module+el8.2.0+5633+b0e06c1a.x86_64 > kernel-4.18.0-175.el8.x86_64 > > # virsh domstate test82 > shut off > > # virsh dumpxml test82 |grep -E "vcpu|topology" > <vcpu placement='static' current='52'>128</vcpu> > <topology sockets='8' dies='2' cores='4' threads='2'/> > > # virsh start test82 > error: Failed to start domain test82 > error: internal error: qemu didn't report thread id for vcpu '48' Oh, this is very unexpected, and will need to be investigated. > # virsh dumpxml test82 |grep -E "vcpu|topology" > <vcpu placement='static' current='48'>128</vcpu> > <topology sockets='8' dies='2' cores='4' threads='2'/> > > # virsh start test82 > Domain test82 started
According to comment 7, verified this bug on a Cascadelake-AP physical host. Version: qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64 kernel-4.18.0-187.el8.x86_64 libvirt-6.0.0-10.module+el8.2.0+5984+dce93708.x86_64 Steps: 1. Check host cpu related info # lscpu ... Model name: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz ... Flags: ...avx512_vnni... 2. Check "virsh capabilities" # virsh capabilities <capabilities> <host> <uuid>00597cf4-0879-e911-906e-001635649f5c</uuid> <cpu> <arch>x86_64</arch> <model>Cascadelake-Server</model> <vendor>Intel</vendor> <microcode version='83886124'/> <counter name='tsc' frequency='2294609000' scaling='yes'/> <topology sockets='1' dies='1' cores='24' threads='2'/> <feature name='ds'/> <feature name='acpi'/> <feature name='ss'/> <feature name='ht'/> <feature name='tm'/> <feature name='pbe'/> <feature name='dtes64'/> <feature name='monitor'/> <feature name='ds_cpl'/> <feature name='vmx'/> <feature name='smx'/> <feature name='est'/> <feature name='tm2'/> <feature name='xtpr'/> <feature name='pdcm'/> <feature name='dca'/> <feature name='osxsave'/> <feature name='tsc_adjust'/> <feature name='cmt'/> <feature name='intel-pt'/> <feature name='pku'/> <feature name='ospke'/> <feature name='md-clear'/> <feature name='stibp'/> <feature name='arch-capabilities'/> <feature name='xsaves'/> <feature name='mbm_total'/> <feature name='mbm_local'/> <feature name='invtsc'/> <feature name='rdctl-no'/> <feature name='ibrs-all'/> <feature name='skip-l1dfl-vmentry'/> <feature name='mds-no'/> <feature name='tsx-ctrl'/> <pages unit='KiB' size='4'/> <pages unit='KiB' size='2048'/> <pages unit='KiB' size='1048576'/> </cpu> <power_management> <suspend_mem/> <suspend_disk/> <suspend_hybrid/> </power_management> <iommu support='no'/> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> <uri_transport>rdma</uri_transport> </uri_transports> </migration_features> <topology> <cells num='4'> <cell id='0'> <memory unit='KiB'>97407380</memory> <pages unit='KiB' size='4'>24351845</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='10'/> <sibling id='1' value='21'/> <sibling id='2' value='21'/> <sibling id='3' value='21'/> </distances> <cpus num='48'> <cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0,96'/> <cpu id='1' socket_id='0' die_id='0' core_id='1' siblings='1,97'/> <cpu id='2' socket_id='0' die_id='0' core_id='2' siblings='2,98'/> <cpu id='3' socket_id='0' die_id='0' core_id='3' siblings='3,99'/> <cpu id='4' socket_id='0' die_id='0' core_id='4' siblings='4,100'/> <cpu id='5' socket_id='0' die_id='0' core_id='5' siblings='5,101'/> <cpu id='6' socket_id='0' die_id='0' core_id='6' siblings='6,102'/> <cpu id='7' socket_id='0' die_id='0' core_id='8' siblings='7,103'/> <cpu id='8' socket_id='0' die_id='0' core_id='10' siblings='8,104'/> <cpu id='9' socket_id='0' die_id='0' core_id='11' siblings='9,105'/> <cpu id='10' socket_id='0' die_id='0' core_id='12' siblings='10,106'/> <cpu id='11' socket_id='0' die_id='0' core_id='13' siblings='11,107'/> <cpu id='12' socket_id='0' die_id='0' core_id='16' siblings='12,108'/> <cpu id='13' socket_id='0' die_id='0' core_id='17' siblings='13,109'/> <cpu id='14' socket_id='0' die_id='0' core_id='18' siblings='14,110'/> <cpu id='15' socket_id='0' die_id='0' core_id='19' siblings='15,111'/> <cpu id='16' socket_id='0' die_id='0' core_id='20' siblings='16,112'/> <cpu id='17' socket_id='0' die_id='0' core_id='21' siblings='17,113'/> <cpu id='18' socket_id='0' die_id='0' core_id='24' siblings='18,114'/> <cpu id='19' socket_id='0' die_id='0' core_id='25' siblings='19,115'/> <cpu id='20' socket_id='0' die_id='0' core_id='26' siblings='20,116'/> <cpu id='21' socket_id='0' die_id='0' core_id='27' siblings='21,117'/> <cpu id='22' socket_id='0' die_id='0' core_id='28' siblings='22,118'/> <cpu id='23' socket_id='0' die_id='0' core_id='29' siblings='23,119'/> <cpu id='96' socket_id='0' die_id='0' core_id='0' siblings='0,96'/> <cpu id='97' socket_id='0' die_id='0' core_id='1' siblings='1,97'/> <cpu id='98' socket_id='0' die_id='0' core_id='2' siblings='2,98'/> <cpu id='99' socket_id='0' die_id='0' core_id='3' siblings='3,99'/> <cpu id='100' socket_id='0' die_id='0' core_id='4' siblings='4,100'/> <cpu id='101' socket_id='0' die_id='0' core_id='5' siblings='5,101'/> <cpu id='102' socket_id='0' die_id='0' core_id='6' siblings='6,102'/> <cpu id='103' socket_id='0' die_id='0' core_id='8' siblings='7,103'/> <cpu id='104' socket_id='0' die_id='0' core_id='10' siblings='8,104'/> <cpu id='105' socket_id='0' die_id='0' core_id='11' siblings='9,105'/> <cpu id='106' socket_id='0' die_id='0' core_id='12' siblings='10,106'/> <cpu id='107' socket_id='0' die_id='0' core_id='13' siblings='11,107'/> <cpu id='108' socket_id='0' die_id='0' core_id='16' siblings='12,108'/> <cpu id='109' socket_id='0' die_id='0' core_id='17' siblings='13,109'/> <cpu id='110' socket_id='0' die_id='0' core_id='18' siblings='14,110'/> <cpu id='111' socket_id='0' die_id='0' core_id='19' siblings='15,111'/> <cpu id='112' socket_id='0' die_id='0' core_id='20' siblings='16,112'/> <cpu id='113' socket_id='0' die_id='0' core_id='21' siblings='17,113'/> <cpu id='114' socket_id='0' die_id='0' core_id='24' siblings='18,114'/> <cpu id='115' socket_id='0' die_id='0' core_id='25' siblings='19,115'/> <cpu id='116' socket_id='0' die_id='0' core_id='26' siblings='20,116'/> <cpu id='117' socket_id='0' die_id='0' core_id='27' siblings='21,117'/> <cpu id='118' socket_id='0' die_id='0' core_id='28' siblings='22,118'/> <cpu id='119' socket_id='0' die_id='0' core_id='29' siblings='23,119'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>99079824</memory> <pages unit='KiB' size='4'>24769956</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='21'/> <sibling id='1' value='10'/> <sibling id='2' value='21'/> <sibling id='3' value='21'/> </distances> <cpus num='48'> <cpu id='24' socket_id='0' die_id='1' core_id='0' siblings='24,120'/> <cpu id='25' socket_id='0' die_id='1' core_id='1' siblings='25,121'/> <cpu id='26' socket_id='0' die_id='1' core_id='2' siblings='26,122'/> <cpu id='27' socket_id='0' die_id='1' core_id='3' siblings='27,123'/> <cpu id='28' socket_id='0' die_id='1' core_id='4' siblings='28,124'/> <cpu id='29' socket_id='0' die_id='1' core_id='5' siblings='29,125'/> <cpu id='30' socket_id='0' die_id='1' core_id='6' siblings='30,126'/> <cpu id='31' socket_id='0' die_id='1' core_id='8' siblings='31,127'/> <cpu id='32' socket_id='0' die_id='1' core_id='9' siblings='32,128'/> <cpu id='33' socket_id='0' die_id='1' core_id='10' siblings='33,129'/> <cpu id='34' socket_id='0' die_id='1' core_id='11' siblings='34,130'/> <cpu id='35' socket_id='0' die_id='1' core_id='12' siblings='35,131'/> <cpu id='36' socket_id='0' die_id='1' core_id='13' siblings='36,132'/> <cpu id='37' socket_id='0' die_id='1' core_id='16' siblings='37,133'/> <cpu id='38' socket_id='0' die_id='1' core_id='17' siblings='38,134'/> <cpu id='39' socket_id='0' die_id='1' core_id='18' siblings='39,135'/> <cpu id='40' socket_id='0' die_id='1' core_id='19' siblings='40,136'/> <cpu id='41' socket_id='0' die_id='1' core_id='20' siblings='41,137'/> <cpu id='42' socket_id='0' die_id='1' core_id='21' siblings='42,138'/> <cpu id='43' socket_id='0' die_id='1' core_id='25' siblings='43,139'/> <cpu id='44' socket_id='0' die_id='1' core_id='26' siblings='44,140'/> <cpu id='45' socket_id='0' die_id='1' core_id='27' siblings='45,141'/> <cpu id='46' socket_id='0' die_id='1' core_id='28' siblings='46,142'/> <cpu id='47' socket_id='0' die_id='1' core_id='29' siblings='47,143'/> <cpu id='120' socket_id='0' die_id='1' core_id='0' siblings='24,120'/> <cpu id='121' socket_id='0' die_id='1' core_id='1' siblings='25,121'/> <cpu id='122' socket_id='0' die_id='1' core_id='2' siblings='26,122'/> <cpu id='123' socket_id='0' die_id='1' core_id='3' siblings='27,123'/> <cpu id='124' socket_id='0' die_id='1' core_id='4' siblings='28,124'/> <cpu id='125' socket_id='0' die_id='1' core_id='5' siblings='29,125'/> <cpu id='126' socket_id='0' die_id='1' core_id='6' siblings='30,126'/> <cpu id='127' socket_id='0' die_id='1' core_id='8' siblings='31,127'/> <cpu id='128' socket_id='0' die_id='1' core_id='9' siblings='32,128'/> <cpu id='129' socket_id='0' die_id='1' core_id='10' siblings='33,129'/> <cpu id='130' socket_id='0' die_id='1' core_id='11' siblings='34,130'/> <cpu id='131' socket_id='0' die_id='1' core_id='12' siblings='35,131'/> <cpu id='132' socket_id='0' die_id='1' core_id='13' siblings='36,132'/> <cpu id='133' socket_id='0' die_id='1' core_id='16' siblings='37,133'/> <cpu id='134' socket_id='0' die_id='1' core_id='17' siblings='38,134'/> <cpu id='135' socket_id='0' die_id='1' core_id='18' siblings='39,135'/> <cpu id='136' socket_id='0' die_id='1' core_id='19' siblings='40,136'/> <cpu id='137' socket_id='0' die_id='1' core_id='20' siblings='41,137'/> <cpu id='138' socket_id='0' die_id='1' core_id='21' siblings='42,138'/> <cpu id='139' socket_id='0' die_id='1' core_id='25' siblings='43,139'/> <cpu id='140' socket_id='0' die_id='1' core_id='26' siblings='44,140'/> <cpu id='141' socket_id='0' die_id='1' core_id='27' siblings='45,141'/> <cpu id='142' socket_id='0' die_id='1' core_id='28' siblings='46,142'/> <cpu id='143' socket_id='0' die_id='1' core_id='29' siblings='47,143'/> </cpus> </cell> <cell id='2'> <memory unit='KiB'>99079828</memory> <pages unit='KiB' size='4'>24769957</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='21'/> <sibling id='1' value='21'/> <sibling id='2' value='10'/> <sibling id='3' value='21'/> </distances> <cpus num='48'> <cpu id='48' socket_id='1' die_id='0' core_id='0' siblings='48,144'/> <cpu id='49' socket_id='1' die_id='0' core_id='1' siblings='49,145'/> <cpu id='50' socket_id='1' die_id='0' core_id='2' siblings='50,146'/> <cpu id='51' socket_id='1' die_id='0' core_id='3' siblings='51,147'/> <cpu id='52' socket_id='1' die_id='0' core_id='4' siblings='52,148'/> <cpu id='53' socket_id='1' die_id='0' core_id='5' siblings='53,149'/> <cpu id='54' socket_id='1' die_id='0' core_id='6' siblings='54,150'/> <cpu id='55' socket_id='1' die_id='0' core_id='9' siblings='55,151'/> <cpu id='56' socket_id='1' die_id='0' core_id='10' siblings='56,152'/> <cpu id='57' socket_id='1' die_id='0' core_id='11' siblings='57,153'/> <cpu id='58' socket_id='1' die_id='0' core_id='12' siblings='58,154'/> <cpu id='59' socket_id='1' die_id='0' core_id='13' siblings='59,155'/> <cpu id='60' socket_id='1' die_id='0' core_id='16' siblings='60,156'/> <cpu id='61' socket_id='1' die_id='0' core_id='17' siblings='61,157'/> <cpu id='62' socket_id='1' die_id='0' core_id='18' siblings='62,158'/> <cpu id='63' socket_id='1' die_id='0' core_id='19' siblings='63,159'/> <cpu id='64' socket_id='1' die_id='0' core_id='20' siblings='64,160'/> <cpu id='65' socket_id='1' die_id='0' core_id='21' siblings='65,161'/> <cpu id='66' socket_id='1' die_id='0' core_id='24' siblings='66,162'/> <cpu id='67' socket_id='1' die_id='0' core_id='25' siblings='67,163'/> <cpu id='68' socket_id='1' die_id='0' core_id='26' siblings='68,164'/> <cpu id='69' socket_id='1' die_id='0' core_id='27' siblings='69,165'/> <cpu id='70' socket_id='1' die_id='0' core_id='28' siblings='70,166'/> <cpu id='71' socket_id='1' die_id='0' core_id='29' siblings='71,167'/> <cpu id='144' socket_id='1' die_id='0' core_id='0' siblings='48,144'/> <cpu id='145' socket_id='1' die_id='0' core_id='1' siblings='49,145'/> <cpu id='146' socket_id='1' die_id='0' core_id='2' siblings='50,146'/> <cpu id='147' socket_id='1' die_id='0' core_id='3' siblings='51,147'/> <cpu id='148' socket_id='1' die_id='0' core_id='4' siblings='52,148'/> <cpu id='149' socket_id='1' die_id='0' core_id='5' siblings='53,149'/> <cpu id='150' socket_id='1' die_id='0' core_id='6' siblings='54,150'/> <cpu id='151' socket_id='1' die_id='0' core_id='9' siblings='55,151'/> <cpu id='152' socket_id='1' die_id='0' core_id='10' siblings='56,152'/> <cpu id='153' socket_id='1' die_id='0' core_id='11' siblings='57,153'/> <cpu id='154' socket_id='1' die_id='0' core_id='12' siblings='58,154'/> <cpu id='155' socket_id='1' die_id='0' core_id='13' siblings='59,155'/> <cpu id='156' socket_id='1' die_id='0' core_id='16' siblings='60,156'/> <cpu id='157' socket_id='1' die_id='0' core_id='17' siblings='61,157'/> <cpu id='158' socket_id='1' die_id='0' core_id='18' siblings='62,158'/> <cpu id='159' socket_id='1' die_id='0' core_id='19' siblings='63,159'/> <cpu id='160' socket_id='1' die_id='0' core_id='20' siblings='64,160'/> <cpu id='161' socket_id='1' die_id='0' core_id='21' siblings='65,161'/> <cpu id='162' socket_id='1' die_id='0' core_id='24' siblings='66,162'/> <cpu id='163' socket_id='1' die_id='0' core_id='25' siblings='67,163'/> <cpu id='164' socket_id='1' die_id='0' core_id='26' siblings='68,164'/> <cpu id='165' socket_id='1' die_id='0' core_id='27' siblings='69,165'/> <cpu id='166' socket_id='1' die_id='0' core_id='28' siblings='70,166'/> <cpu id='167' socket_id='1' die_id='0' core_id='29' siblings='71,167'/> </cpus> </cell> <cell id='3'> <memory unit='KiB'>99049064</memory> <pages unit='KiB' size='4'>24762266</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>0</pages> <distances> <sibling id='0' value='21'/> <sibling id='1' value='21'/> <sibling id='2' value='21'/> <sibling id='3' value='10'/> </distances> <cpus num='48'> <cpu id='72' socket_id='1' die_id='1' core_id='0' siblings='72,168'/> <cpu id='73' socket_id='1' die_id='1' core_id='1' siblings='73,169'/> <cpu id='74' socket_id='1' die_id='1' core_id='2' siblings='74,170'/> <cpu id='75' socket_id='1' die_id='1' core_id='3' siblings='75,171'/> <cpu id='76' socket_id='1' die_id='1' core_id='4' siblings='76,172'/> <cpu id='77' socket_id='1' die_id='1' core_id='5' siblings='77,173'/> <cpu id='78' socket_id='1' die_id='1' core_id='6' siblings='78,174'/> <cpu id='79' socket_id='1' die_id='1' core_id='8' siblings='79,175'/> <cpu id='80' socket_id='1' die_id='1' core_id='9' siblings='80,176'/> <cpu id='81' socket_id='1' die_id='1' core_id='10' siblings='81,177'/> <cpu id='82' socket_id='1' die_id='1' core_id='11' siblings='82,178'/> <cpu id='83' socket_id='1' die_id='1' core_id='12' siblings='83,179'/> <cpu id='84' socket_id='1' die_id='1' core_id='13' siblings='84,180'/> <cpu id='85' socket_id='1' die_id='1' core_id='16' siblings='85,181'/> <cpu id='86' socket_id='1' die_id='1' core_id='17' siblings='86,182'/> <cpu id='87' socket_id='1' die_id='1' core_id='18' siblings='87,183'/> <cpu id='88' socket_id='1' die_id='1' core_id='19' siblings='88,184'/> <cpu id='89' socket_id='1' die_id='1' core_id='20' siblings='89,185'/> <cpu id='90' socket_id='1' die_id='1' core_id='21' siblings='90,186'/> <cpu id='91' socket_id='1' die_id='1' core_id='25' siblings='91,187'/> <cpu id='92' socket_id='1' die_id='1' core_id='26' siblings='92,188'/> <cpu id='93' socket_id='1' die_id='1' core_id='27' siblings='93,189'/> <cpu id='94' socket_id='1' die_id='1' core_id='28' siblings='94,190'/> <cpu id='95' socket_id='1' die_id='1' core_id='29' siblings='95,191'/> <cpu id='168' socket_id='1' die_id='1' core_id='0' siblings='72,168'/> <cpu id='169' socket_id='1' die_id='1' core_id='1' siblings='73,169'/> <cpu id='170' socket_id='1' die_id='1' core_id='2' siblings='74,170'/> <cpu id='171' socket_id='1' die_id='1' core_id='3' siblings='75,171'/> <cpu id='172' socket_id='1' die_id='1' core_id='4' siblings='76,172'/> <cpu id='173' socket_id='1' die_id='1' core_id='5' siblings='77,173'/> <cpu id='174' socket_id='1' die_id='1' core_id='6' siblings='78,174'/> <cpu id='175' socket_id='1' die_id='1' core_id='8' siblings='79,175'/> <cpu id='176' socket_id='1' die_id='1' core_id='9' siblings='80,176'/> <cpu id='177' socket_id='1' die_id='1' core_id='10' siblings='81,177'/> <cpu id='178' socket_id='1' die_id='1' core_id='11' siblings='82,178'/> <cpu id='179' socket_id='1' die_id='1' core_id='12' siblings='83,179'/> <cpu id='180' socket_id='1' die_id='1' core_id='13' siblings='84,180'/> <cpu id='181' socket_id='1' die_id='1' core_id='16' siblings='85,181'/> <cpu id='182' socket_id='1' die_id='1' core_id='17' siblings='86,182'/> <cpu id='183' socket_id='1' die_id='1' core_id='18' siblings='87,183'/> <cpu id='184' socket_id='1' die_id='1' core_id='19' siblings='88,184'/> <cpu id='185' socket_id='1' die_id='1' core_id='20' siblings='89,185'/> <cpu id='186' socket_id='1' die_id='1' core_id='21' siblings='90,186'/> <cpu id='187' socket_id='1' die_id='1' core_id='25' siblings='91,187'/> <cpu id='188' socket_id='1' die_id='1' core_id='26' siblings='92,188'/> <cpu id='189' socket_id='1' die_id='1' core_id='27' siblings='93,189'/> <cpu id='190' socket_id='1' die_id='1' core_id='28' siblings='94,190'/> <cpu id='191' socket_id='1' die_id='1' core_id='29' siblings='95,191'/> </cpus> </cell> </cells> </topology> On CPU topology part in the output of "virsh capabilities", the die_id is not zero on cell id=1 and =3. The test result is as expected, mark this bug as verified.
Also added the following info to make previous comment more clear: # numactl --hard available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 node 0 size: 95124 MB node 0 free: 94396 MB node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 node 1 size: 96757 MB node 1 free: 96528 MB node 2 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 node 2 size: 96757 MB node 2 free: 96545 MB node 3 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 node 3 size: 96727 MB node 3 free: 95104 MB node distances: node 0 1 2 3 0: 10 21 21 21 1: 21 10 21 21 2: 21 21 10 21 3: 21 21 21 10 # cat /sys/devices/system/cpu/cpu21/topology/die_id 0 # cat /sys/devices/system/cpu/cpu52/topology/die_id 0 # cat /sys/devices/system/cpu/cpu24/topology/die_id 1 # ll /sys/devices/system/cpu/cpu24/topology/die* -r--r--r--. 1 root root 4096 Mar 13 2020 /sys/devices/system/cpu/cpu24/topology/die_cpus -r--r--r--. 1 root root 4096 Mar 13 2020 /sys/devices/system/cpu/cpu24/topology/die_cpus_list -r--r--r--. 1 root root 4096 Mar 13 2020 /sys/devices/system/cpu/cpu24/topology/die_id
(In reply to jiyan from comment #6) > # virsh dumpxml test82 |grep -E "vcpu|topology" > <vcpu placement='static' current='52'>128</vcpu> > <topology sockets='8' dies='2' cores='4' threads='2'/> > > # virsh start test82 > error: Failed to start domain test82 > error: internal error: qemu didn't report thread id for vcpu '48' This problem is being tracked in a new bug: https://bugzilla.redhat.com/show_bug.cgi?id=1813395
Test on ppc64le, die_id is set to zero as expected. # rpm -q libvirt qemu-kvm libvirt-6.0.0-14.module+el8.2.0+6069+78a1cb09.ppc64le qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.ppc64le # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 160 On-line CPU(s) list: 0-159 Thread(s) per core: 4 Core(s) per socket: 20 Socket(s): 2 NUMA node(s): 2 Model: 2.2 (pvr 004e 1202) Model name: POWER9, altivec supported ... NUMA node0 CPU(s): 0-79 NUMA node8 CPU(s): 80-159 # numactl --hard available: 2 nodes (0,8) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 node 0 size: 126761 MB node 0 free: 123442 MB node 8 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 node 8 size: 130774 MB node 8 free: 108024 MB node distances: node 0 8 0: 10 40 8: 40 10 # virsh capabilities <capabilities> <host> <uuid>7714c83d-2719-4a89-8119-156778df62e7</uuid> <cpu> <arch>ppc64le</arch> <model>POWER9</model> <vendor>IBM</vendor> <topology sockets='1' dies='1' cores='20' threads='4'/> <pages unit='KiB' size='64'/> <pages unit='KiB' size='2048'/> <pages unit='KiB' size='1048576'/> </cpu> ... <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>263716608</memory> <cpus num='80'> <cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0-3'/> <=== zero <cpu id='1' socket_id='0' die_id='0' core_id='0' siblings='0-3'/> <cpu id='2' socket_id='0' die_id='0' core_id='0' siblings='0-3'/> <cpu id='3' socket_id='0' die_id='0' core_id='0' siblings='0-3'/> <cpu id='4' socket_id='0' die_id='0' core_id='1' siblings='4-7'/> ... <cpu id='79' socket_id='0' die_id='0' core_id='19' siblings='76-79'/> </cpus> </cell> <cell id='0'> <memory unit='KiB'>263716608</memory> <cpus num='80'> <cpu id='80' socket_id='0' die_id='0' core_id='0' siblings='80-83'/> <cpu id='81' socket_id='0' die_id='0' core_id='0' siblings='80-83'/> <cpu id='82' socket_id='0' die_id='0' core_id='0' siblings='80-83'/> <cpu id='83' socket_id='0' die_id='0' core_id='0' siblings='80-83'/> ... <cpu id='159' socket_id='0' die_id='0' core_id='19' siblings='156-159'/> </cpus> </cell> </cells> </topology>
Continue with comment 11, # cat /sys/devices/system/cpu/cpu0/topology/die_id -1 Same value for other cpus.
Test on s390x, die_id is set to zero as expected. # rpm -q libvirt qemu-kvm libvirt-6.0.0-14.module+el8.2.0+6069+78a1cb09.s390x qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.s390x # virsh capabilities setlocale: No such file or directory <capabilities> ... <cpu> <arch>s390x</arch> <topology sockets='2' dies='1' cores='1' threads='1'/> <pages unit='KiB' size='4'/> <pages unit='KiB' size='1024'/> </cpu> ... <topology> <cells num='1'> <cell id='0'> <memory unit='KiB'>6006264</memory> <cpus num='2'> <cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0'/> <=== Zero <cpu id='1' socket_id='1' die_id='0' core_id='0' siblings='1'/> </cpus> </cell> </cells> </topology> ... </capabilities> # cat /sys/devices/system/cpu/cpu[0,1]/topology/die_id -1 -1 # lscpu Architecture: s390x CPU op-mode(s): 32-bit, 64-bit Byte Order: Big Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s) per book: 1 Book(s) per drawer: 1 Drawer(s): 2 NUMA node(s): 1 Vendor ID: IBM/S390
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017