Bug 1813395 - Unable to start guest with a reduced number of active CPUs and multiple dies
Summary: Unable to start guest with a reduced number of active CPUs and multiple dies
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.3
Assignee: Daniel Berrangé
QA Contact: jiyan
URL:
Whiteboard:
Depends On:
Blocks: 1785207 1819060 1821592
TreeView+ depends on / blocked
 
Reported: 2020-03-13 16:52 UTC by Daniel Berrangé
Modified: 2020-07-28 07:13 UTC (History)
11 users (show)

Fixed In Version: libvirt-6.0.0-19.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1821592 (view as bug list)
Environment:
Last Closed: 2020-07-28 07:12:15 UTC
Type: Bug
Target Upstream Version: commit in BZ
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3172 0 None None None 2020-07-28 07:13:29 UTC

Description Daniel Berrangé 2020-03-13 16:52:14 UTC
Description of problem:
When configuring a guest with multiple 'dies' in the CPU topology

   <cpu...>
    <topology sockets='8' dies='2' cores='4' threads='2'/>
   </cpu>

If attempting to start the guest with a reduced number of active CPUs, to allow for later hotplug, the guest will fail to start sometimes.

eg this will work:

  <vcpu placement='static' current='48'>128</vcpu>

but this will fail:

  <vcpu placement='static' current='52'>128</vcpu>

  # virsh start test82 
  error: Failed to start domain test82
  error: internal error: qemu didn't report thread id for vcpu '48'

likewise

  <vcpu placement='static' current='49'>128</vcpu>

and

  <vcpu placement='static' current='47'>128</vcpu>
 
and certain other current values.

The root cause is that libvirt fails to take into account die_id when matching up CPUs reported by QEMU.


Version-Release number of selected component (if applicable):
qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64
kernel-4.18.0-187.el8.x86_64
libvirt-6.0.0-10.module+el8.2.0+5984+dce93708.x86_64

How reproducible:
ALways

Steps to Reproduce:
1. Configure a guest with

  <cpu...>
    <topology sockets='8' dies='2' cores='4' threads='2'/>
   </cpu>
  <vcpu placement='static' current='47'>128</vcpu>

2. Start the guest

Actual results:
It fails to start

Expected results:
It starts successfully

Additional info:

Comment 1 Daniel Berrangé 2020-03-13 16:53:14 UTC
Fix posted at

https://www.redhat.com/archives/libvir-list/2020-March/msg00509.html

Comment 6 Eduardo Habkost 2020-04-01 22:11:40 UTC
Upstream commit:
commit 8b789c657445 ("qemu: fix detection of vCPU pids when multiple dies are present")

Comment 12 jiyan 2020-05-15 08:39:38 UTC
Reproduced this bug with libvirt-6.0.0-18.module+el8.2.1+6456+a6d62e4e.x86_64, and verified this bug with libvirt-6.0.0-19.module+el8.2.1+6538+c148631f.x86_64.

Version:
libvirt-6.0.0-18.module+el8.2.1+6456+a6d62e4e.x86_64
qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9.x86_64
kernel-4.18.0-193.2.1.el8_2.x86_64

Steps:
1. Prepare a shutdown VM with the following conf
# virsh domstate test82
shut off

# virsh dumpxml test82 --inactive
...
  <vcpu placement='static' current='96'>144</vcpu>
  <iothreads>2</iothreads>
  <iothreadids>
    <iothread id='2'/>
    <iothread id='1'/>
  </iothreadids>
  <cputune>
    <shares>2048</shares>
    <vcpupin vcpu='2' cpuset='0-7'/>
    <vcpupin vcpu='19' cpuset='7,170,191'/>
    <emulatorpin cpuset='1-3'/>
    <iothreadpin iothread='2' cpuset='7-8'/>
    <iothreadpin iothread='1' cpuset='5-6'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0-2'/>
    <memnode cellid='0' mode='strict' nodeset='1'/>
    <memnode cellid='2' mode='preferred' nodeset='2'/>
  </numatune>
...
  <cpu mode='host-model' check='partial'>
    <topology sockets='4' dies='3' cores='4' threads='3'/>
    <numa>
      <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'/>
      <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'>
        <distances>
          <sibling id='1' value='10'/>
        </distances>
      </cell>
      <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'>
        <distances>
          <sibling id='2' value='10'/>
        </distances>
      </cell>
      <cell id='3' cpus='3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>

2. Start the VM
# virsh start test82
error: Failed to start domain test82
error: internal error: qemu didn't report thread id for vcpu '72'

3. Upgrade libvirt and restart libvirtd
# yum upgrade libvirt* -y

# systemctl restart libvirtd

# rpm -qa libvirt
libvirt-6.0.0-19.module+el8.2.1+6538+c148631f.x86_64

4. Repear step-2
# virsh start test82
Domain test82 started

# virsh vcpucount test82 
maximum      config       144
maximum      live         144
current      config        96
current      live          96

# ps -ef | grep test82
-smp 96,maxcpus=144,sockets=4,dies=3,cores=4,threads=3

 virsh console test82
Connected to domain test82
Escape character is ^]

Red Hat Enterprise Linux 8.2 (Ootpa)
Kernel 4.18.0-193.el8.x86_64 on an x86_64

localhost login: root
Password: 
[root@localhost ~]# lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):           3
NUMA node(s):        4
Vendor ID:           GenuineIntel

5. Test hot-plugging/unplugging vpus
# virsh setvcpus test82 137

# virsh setvcpu test82 109 --disable

# virsh setvcpu test82 123 --disable

# virsh setvcpu test82 143 --enable

# virsh vcpucount test82 
maximum      config       144
maximum      live         144
current      config        96
current      live         136

# virsh console test82
Connected to domain test82
Escape character is ^]

[root@localhost ~]# lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              136
On-line CPU(s) list: 0-108,110-122,124-137
Thread(s) per core:  2
Core(s) per socket:  13
Socket(s):           4
NUMA node(s):        4
Vendor ID:           GenuineIntel

All the test are as expected, move this bug to be verified.

Comment 14 errata-xmlrpc 2020-07-28 07:12:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3172


Note You need to log in before you can comment on or make changes to this bug.