Description of problem: When configuring a guest with multiple 'dies' in the CPU topology <cpu...> <topology sockets='8' dies='2' cores='4' threads='2'/> </cpu> If attempting to start the guest with a reduced number of active CPUs, to allow for later hotplug, the guest will fail to start sometimes. eg this will work: <vcpu placement='static' current='48'>128</vcpu> but this will fail: <vcpu placement='static' current='52'>128</vcpu> # virsh start test82 error: Failed to start domain test82 error: internal error: qemu didn't report thread id for vcpu '48' likewise <vcpu placement='static' current='49'>128</vcpu> and <vcpu placement='static' current='47'>128</vcpu> and certain other current values. The root cause is that libvirt fails to take into account die_id when matching up CPUs reported by QEMU. Version-Release number of selected component (if applicable): qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64 kernel-4.18.0-187.el8.x86_64 libvirt-6.0.0-10.module+el8.2.0+5984+dce93708.x86_64 How reproducible: ALways Steps to Reproduce: 1. Configure a guest with <cpu...> <topology sockets='8' dies='2' cores='4' threads='2'/> </cpu> <vcpu placement='static' current='47'>128</vcpu> 2. Start the guest Actual results: It fails to start Expected results: It starts successfully Additional info:
Fix posted at https://www.redhat.com/archives/libvir-list/2020-March/msg00509.html
Upstream commit: commit 8b789c657445 ("qemu: fix detection of vCPU pids when multiple dies are present")
Reproduced this bug with libvirt-6.0.0-18.module+el8.2.1+6456+a6d62e4e.x86_64, and verified this bug with libvirt-6.0.0-19.module+el8.2.1+6538+c148631f.x86_64. Version: libvirt-6.0.0-18.module+el8.2.1+6456+a6d62e4e.x86_64 qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9.x86_64 kernel-4.18.0-193.2.1.el8_2.x86_64 Steps: 1. Prepare a shutdown VM with the following conf # virsh domstate test82 shut off # virsh dumpxml test82 --inactive ... <vcpu placement='static' current='96'>144</vcpu> <iothreads>2</iothreads> <iothreadids> <iothread id='2'/> <iothread id='1'/> </iothreadids> <cputune> <shares>2048</shares> <vcpupin vcpu='2' cpuset='0-7'/> <vcpupin vcpu='19' cpuset='7,170,191'/> <emulatorpin cpuset='1-3'/> <iothreadpin iothread='2' cpuset='7-8'/> <iothreadpin iothread='1' cpuset='5-6'/> </cputune> <numatune> <memory mode='strict' nodeset='0-2'/> <memnode cellid='0' mode='strict' nodeset='1'/> <memnode cellid='2' mode='preferred' nodeset='2'/> </numatune> ... <cpu mode='host-model' check='partial'> <topology sockets='4' dies='3' cores='4' threads='3'/> <numa> <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'/> <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'> <distances> <sibling id='1' value='10'/> </distances> </cell> <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'> <distances> <sibling id='2' value='10'/> </distances> </cell> <cell id='3' cpus='3' memory='512000' unit='KiB'/> </numa> </cpu> 2. Start the VM # virsh start test82 error: Failed to start domain test82 error: internal error: qemu didn't report thread id for vcpu '72' 3. Upgrade libvirt and restart libvirtd # yum upgrade libvirt* -y # systemctl restart libvirtd # rpm -qa libvirt libvirt-6.0.0-19.module+el8.2.1+6538+c148631f.x86_64 4. Repear step-2 # virsh start test82 Domain test82 started # virsh vcpucount test82 maximum config 144 maximum live 144 current config 96 current live 96 # ps -ef | grep test82 -smp 96,maxcpus=144,sockets=4,dies=3,cores=4,threads=3 virsh console test82 Connected to domain test82 Escape character is ^] Red Hat Enterprise Linux 8.2 (Ootpa) Kernel 4.18.0-193.el8.x86_64 on an x86_64 localhost login: root Password: [root@localhost ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 3 NUMA node(s): 4 Vendor ID: GenuineIntel 5. Test hot-plugging/unplugging vpus # virsh setvcpus test82 137 # virsh setvcpu test82 109 --disable # virsh setvcpu test82 123 --disable # virsh setvcpu test82 143 --enable # virsh vcpucount test82 maximum config 144 maximum live 144 current config 96 current live 136 # virsh console test82 Connected to domain test82 Escape character is ^] [root@localhost ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 136 On-line CPU(s) list: 0-108,110-122,124-137 Thread(s) per core: 2 Core(s) per socket: 13 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel All the test are as expected, move this bug to be verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3172