Bug 1813395
| Summary: | Unable to start guest with a reduced number of active CPUs and multiple dies | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Daniel Berrangé <berrange> | |
| Component: | libvirt | Assignee: | Daniel Berrangé <berrange> | |
| Status: | CLOSED ERRATA | QA Contact: | jiyan <jiyan> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 8.2 | CC: | dyuan, jdenemar, jiyan, jsuchane, lhuang, lmen, mtessun, pkrempa, toneata, virt-maint, xuzhang | |
| Target Milestone: | rc | Keywords: | Triaged, ZStream | |
| Target Release: | 8.3 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | libvirt-6.0.0-19.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1821592 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-28 07:12:15 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | commit in BZ | |
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1785207, 1819060, 1821592 | |||
Upstream commit:
commit 8b789c657445 ("qemu: fix detection of vCPU pids when multiple dies are present")
Reproduced this bug with libvirt-6.0.0-18.module+el8.2.1+6456+a6d62e4e.x86_64, and verified this bug with libvirt-6.0.0-19.module+el8.2.1+6538+c148631f.x86_64.
Version:
libvirt-6.0.0-18.module+el8.2.1+6456+a6d62e4e.x86_64
qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9.x86_64
kernel-4.18.0-193.2.1.el8_2.x86_64
Steps:
1. Prepare a shutdown VM with the following conf
# virsh domstate test82
shut off
# virsh dumpxml test82 --inactive
...
<vcpu placement='static' current='96'>144</vcpu>
<iothreads>2</iothreads>
<iothreadids>
<iothread id='2'/>
<iothread id='1'/>
</iothreadids>
<cputune>
<shares>2048</shares>
<vcpupin vcpu='2' cpuset='0-7'/>
<vcpupin vcpu='19' cpuset='7,170,191'/>
<emulatorpin cpuset='1-3'/>
<iothreadpin iothread='2' cpuset='7-8'/>
<iothreadpin iothread='1' cpuset='5-6'/>
</cputune>
<numatune>
<memory mode='strict' nodeset='0-2'/>
<memnode cellid='0' mode='strict' nodeset='1'/>
<memnode cellid='2' mode='preferred' nodeset='2'/>
</numatune>
...
<cpu mode='host-model' check='partial'>
<topology sockets='4' dies='3' cores='4' threads='3'/>
<numa>
<cell id='0' cpus='0,4-7' memory='512000' unit='KiB'/>
<cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'>
<distances>
<sibling id='1' value='10'/>
</distances>
</cell>
<cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'>
<distances>
<sibling id='2' value='10'/>
</distances>
</cell>
<cell id='3' cpus='3' memory='512000' unit='KiB'/>
</numa>
</cpu>
2. Start the VM
# virsh start test82
error: Failed to start domain test82
error: internal error: qemu didn't report thread id for vcpu '72'
3. Upgrade libvirt and restart libvirtd
# yum upgrade libvirt* -y
# systemctl restart libvirtd
# rpm -qa libvirt
libvirt-6.0.0-19.module+el8.2.1+6538+c148631f.x86_64
4. Repear step-2
# virsh start test82
Domain test82 started
# virsh vcpucount test82
maximum config 144
maximum live 144
current config 96
current live 96
# ps -ef | grep test82
-smp 96,maxcpus=144,sockets=4,dies=3,cores=4,threads=3
virsh console test82
Connected to domain test82
Escape character is ^]
Red Hat Enterprise Linux 8.2 (Ootpa)
Kernel 4.18.0-193.el8.x86_64 on an x86_64
localhost login: root
Password:
[root@localhost ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 3
NUMA node(s): 4
Vendor ID: GenuineIntel
5. Test hot-plugging/unplugging vpus
# virsh setvcpus test82 137
# virsh setvcpu test82 109 --disable
# virsh setvcpu test82 123 --disable
# virsh setvcpu test82 143 --enable
# virsh vcpucount test82
maximum config 144
maximum live 144
current config 96
current live 136
# virsh console test82
Connected to domain test82
Escape character is ^]
[root@localhost ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 136
On-line CPU(s) list: 0-108,110-122,124-137
Thread(s) per core: 2
Core(s) per socket: 13
Socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
All the test are as expected, move this bug to be verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3172 |
Description of problem: When configuring a guest with multiple 'dies' in the CPU topology <cpu...> <topology sockets='8' dies='2' cores='4' threads='2'/> </cpu> If attempting to start the guest with a reduced number of active CPUs, to allow for later hotplug, the guest will fail to start sometimes. eg this will work: <vcpu placement='static' current='48'>128</vcpu> but this will fail: <vcpu placement='static' current='52'>128</vcpu> # virsh start test82 error: Failed to start domain test82 error: internal error: qemu didn't report thread id for vcpu '48' likewise <vcpu placement='static' current='49'>128</vcpu> and <vcpu placement='static' current='47'>128</vcpu> and certain other current values. The root cause is that libvirt fails to take into account die_id when matching up CPUs reported by QEMU. Version-Release number of selected component (if applicable): qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64 kernel-4.18.0-187.el8.x86_64 libvirt-6.0.0-10.module+el8.2.0+5984+dce93708.x86_64 How reproducible: ALways Steps to Reproduce: 1. Configure a guest with <cpu...> <topology sockets='8' dies='2' cores='4' threads='2'/> </cpu> <vcpu placement='static' current='47'>128</vcpu> 2. Start the guest Actual results: It fails to start Expected results: It starts successfully Additional info: