Bug 1479694 - Guest cpus are not enabled when the guest start [NEEDINFO]
Guest cpus are not enabled when the guest start
Status: ASSIGNED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.4
ppc64le Linux
high Severity high
: rc
: 7.5
Assigned To: David Gibson
Virtualization Bugs
:
: 1482437 (view as bug list)
Depends On:
Blocks: 1399177
  Show dependency treegraph
 
Reported: 2017-08-09 04:15 EDT by junli
Modified: 2017-08-21 01:36 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
qzhang: needinfo? (junli)


Attachments (Terms of Use)

  None (edit)
Description junli 2017-08-09 04:15:23 EDT
Description of problem:
Guest cpus are not enabled when the guest start

Version-Release number of selected component (if applicable):
libvirt version: 3.2.0, package: 14.virtcov.el7_4.2
QEMU emulator version 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.3)
Red Hat Enterprise Linux Server release 7.4 (Maipo)
3.10.0-693.el7.ppc64le

How reproducible:
100%

Steps to Reproduce:
1.Prepare a domain xml:

<domain type='kvm' id='17'>
  <name>avocado-vt-vm1</name>
  <uuid>aafc3e8a-ce65-44c5-86ab-1d39bab26887</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static' current='4'>8</vcpu>
  <vcpus>
    <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
    <vcpu id='1' enabled='yes' hotpluggable='yes' order='3'/>
    <vcpu id='2' enabled='no' hotpluggable='yes'/>
    <vcpu id='3' enabled='yes' hotpluggable='yes' order='2'/>
    <vcpu id='4' enabled='no' hotpluggable='yes'/>
    <vcpu id='5' enabled='yes' hotpluggable='yes' order='4'/>
    <vcpu id='6' enabled='no' hotpluggable='yes'/>
    <vcpu id='7' enabled='no' hotpluggable='yes'/>
  </vcpus>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='ppc64le' machine='pseries-rhel7.4.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <clock offset='utc'/>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/avocado/data/avocado-vt/images/jeos-25-64.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <interface type='bridge'>
      <mac address='52:54:00:1f:3b:f9'/>
      <source bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </interface>
    <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c120,c334</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c120,c334</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

2.Define this xml
3.Start the guest
4.Run "lscpu" in the guest

Actual results:
CPU(s) is 1

Expected results:
CPU(s) is 4

Additional info:
Comment 1 junli 2017-08-09 04:17:36 EDT
(In reply to junli from comment #0)
> Description of problem:
> Guest cpus are not enabled when the guest start
> 
> Version-Release number of selected component (if applicable):
> libvirt version: 3.2.0, package: 14.virtcov.el7_4.2
> QEMU emulator version 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.3)
> Red Hat Enterprise Linux Server release 7.4 (Maipo)
> 3.10.0-693.el7.ppc64le
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1.Prepare a domain xml:
> 
> <domain type='kvm' id='17'>
>   <name>avocado-vt-vm1</name>
>   <uuid>aafc3e8a-ce65-44c5-86ab-1d39bab26887</uuid>
>   <memory unit='KiB'>1048576</memory>
>   <currentMemory unit='KiB'>1048576</currentMemory>
>   <vcpu placement='static' current='4'>8</vcpu>
>   <vcpus>
>     <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
>     <vcpu id='1' enabled='yes' hotpluggable='yes' order='3'/>
>     <vcpu id='2' enabled='no' hotpluggable='yes'/>
>     <vcpu id='3' enabled='yes' hotpluggable='yes' order='2'/>
>     <vcpu id='4' enabled='no' hotpluggable='yes'/>
>     <vcpu id='5' enabled='yes' hotpluggable='yes' order='4'/>
>     <vcpu id='6' enabled='no' hotpluggable='yes'/>
>     <vcpu id='7' enabled='no' hotpluggable='yes'/>
>   </vcpus>
>   <resource>
>     <partition>/machine</partition>
>   </resource>
>   <os>
>     <type arch='ppc64le' machine='pseries-rhel7.4.0'>hvm</type>
>     <boot dev='hd'/>
>   </os>
>   <clock offset='utc'/>
>   <devices>
>     <emulator>/usr/libexec/qemu-kvm</emulator>
>     <disk type='file' device='disk'>
>       <driver name='qemu' type='qcow2'/>
>       <source
> file='/var/lib/avocado/data/avocado-vt/images/jeos-25-64.qcow2'/>
>       <backingStore/>
>       <target dev='vda' bus='virtio'/>
>       <alias name='virtio-disk0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
> function='0x0'/>
>     </disk>
>     <interface type='bridge'>
>       <mac address='52:54:00:1f:3b:f9'/>
>       <source bridge='virbr0'/>
>       <target dev='vnet0'/>
>       <model type='virtio'/>
>       <alias name='net0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
> function='0x0'/>
>     </interface>
>     <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
>       <listen type='address' address='127.0.0.1'/>
>     </graphics>
>   </devices>
>   <seclabel type='dynamic' model='selinux' relabel='yes'>
>     <label>system_u:system_r:svirt_t:s0:c120,c334</label>
>     <imagelabel>system_u:object_r:svirt_image_t:s0:c120,c334</imagelabel>
>   </seclabel>
>   <seclabel type='dynamic' model='dac' relabel='yes'>
>     <label>+107:+107</label>
>     <imagelabel>+107:+107</imagelabel>
>   </seclabel>
> </domain>
> 
> 2.Define this xml
> 3.Start the guest
> 4.Run "lscpu" in the guest
> 
> Actual results:
> CPU(s) is 1
> 
> Expected results:
> CPU(s) is 4
> 
> Additional info:

QEMU emulator version 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.3)

Red Hat Enterprise Linux Server release 7.4 (Maipo)

3.10.0-693.el7.ppc64le
Comment 3 Peter Krempa 2017-08-09 05:01:02 EDT
Please post full output of "lcspu" from the guest and on the host please run:

virsh qemu-monitor-command --pretty $VMNAME '{"execute":"query-cpus"}'

and 

virsh qemu-monitor-command --pretty $VMNAME '{"execute":"query-hotpluggable-cpus"}'
Comment 4 junli 2017-08-09 05:16:25 EDT
# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                1
On-line CPU(s) list:   0
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Model:                 2.1 (pvr 004b 0201)
Model name:            POWER8E (raw), altivec supported
Hypervisor vendor:     (null)
Virtualization type:   full
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0


# virsh qemu-monitor-command --pretty avocado-vt-vm1 '{"execute":"query-cpus"}'
{
  "return": [
    {
      "arch": "ppc",
      "current": true,
      "CPU": 0,
      "nip": -4611686018426742380,
      "qom_path": "/machine/unattached/device[0]/thread[0]",
      "halted": false,
      "thread_id": 72669
    },
    {
      "arch": "ppc",
      "current": false,
      "CPU": 3,
      "nip": 0,
      "qom_path": "/machine/peripheral/vcpu3/thread[0]",
      "halted": true,
      "thread_id": 72685
    },
    {
      "arch": "ppc",
      "current": false,
      "CPU": 1,
      "nip": 0,
      "qom_path": "/machine/peripheral/vcpu1/thread[0]",
      "halted": true,
      "thread_id": 72688
    },
    {
      "arch": "ppc",
      "current": false,
      "CPU": 5,
      "nip": 0,
      "qom_path": "/machine/peripheral/vcpu5/thread[0]",
      "halted": true,
      "thread_id": 72689
    }
  ],
  "id": "libvirt-19"
}


# virsh qemu-monitor-command --pretty avocado-vt-vm1 '{"execute":"query-hotpluggable-cpus"}'
{
  "return": [
    {
      "props": {
        "core-id": 7
      },
      "vcpus-count": 1,
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 6
      },
      "vcpus-count": 1,
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 5
      },
      "vcpus-count": 1,
      "qom-path": "/machine/peripheral/vcpu5",
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 4
      },
      "vcpus-count": 1,
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 3
      },
      "vcpus-count": 1,
      "qom-path": "/machine/peripheral/vcpu3",
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 2
      },
      "vcpus-count": 1,
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 1
      },
      "vcpus-count": 1,
      "qom-path": "/machine/peripheral/vcpu1",
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 0
      },
      "vcpus-count": 1,
      "qom-path": "/machine/unattached/device[0]",
      "type": "host-spapr-cpu-core"
    }
  ],
  "id": "libvirt-20"
}
Comment 5 David Gibson 2017-08-10 20:33:28 EDT
Looks like the CPUs have been hotplugged, but not onlined in the guest.  Probably a guest side problem.

Can you check if 'rtas_errd' is running within the guest?
Comment 6 junli 2017-08-10 21:39:12 EDT
(In reply to David Gibson from comment #5)
> Looks like the CPUs have been hotplugged, but not onlined in the guest. 
> Probably a guest side problem.
> 
> Can you check if 'rtas_errd' is running within the guest?

It is running

root    823    1  0 09:32 ?      00:00:00 /usr/sbin/rtas_errd
Comment 7 David Gibson 2017-08-11 01:44:26 EDT
Ok, next thing would be to attach dmesg from the guest after hotplugging the cpus.  Maybe we'll see some useful errors in there.
Comment 8 junli 2017-08-13 23:14:09 EDT
(In reply to David Gibson from comment #7)
> Ok, next thing would be to attach dmesg from the guest after hotplugging the
> cpus.  Maybe we'll see some useful errors in there.

After hotplugging the cpus, the lscpu's result is correct.

And this is the dmesg log:

# dmesg | grep cpu
[    0.000000] Partition configured for 8 cpus.
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] PERCPU: Embedded 3 pages/cpu @c000000001700000 s124952 r0 d71656 u262144
[    0.000000] pcpu-alloc: s124952 r0 d71656 u262144 alloc=1*1048576
[    0.000000] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 
[    0.000000] 	RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=8.
[    0.000003] clockevent: decrementer mult[83126e98] shift[32] cpu[0]
[    0.384418] cpuidle: using governor menu
Comment 9 David Gibson 2017-08-14 21:13:30 EDT
Junli,

To clarify, if you just boot the guest with online/offline cpus specified in the XML, then only the boot cpu actually appears online?  But if you hotplug the cpus at runtime then the correct cpus appear online?

Is that right?  Are you hotplugging the cpus in addition to having them specified as online in the XML, or are you altering the XML then hotplugging them at runtime instead?


Andrea,

IIUC, for the (non boot) CPUs specified online in the XML, libvirt will hotplug them after starting qemu with -S but before allowing the guest to continue.  Is that right?


I suspect this is because the PAPR hotplugging logic isn't working when invoked before the OS has booted properly.  I think my upstream patches to cleanup the DRC code will fix this, but they're fairly extensive so I wasn't intending to backport them for Pegas 1.0.

If this use case and libvirt behaviour is important enough I might need to reconsider that.
Comment 10 junli 2017-08-14 22:01:07 EDT
Yes.
When I specify vcpu 0-3 are enabled in the XML, vcpuinfo's result is correct and lscpu in the guest is wrong(Just one cpu in the guest).
But I don't specify vcpu and vcpu 0-3 are enabled by auto, both their results are correct.

Then, when I hotplug any vcpu(whatever enable or disable), the lscpu in the guest's result will be correct(I don't change the XML)
Comment 11 Peter Krempa 2017-08-15 04:37:12 EDT
(In reply to David Gibson from comment #9)
> Junli,
> 
> To clarify, if you just boot the guest with online/offline cpus specified in
> the XML, then only the boot cpu actually appears online?  But if you hotplug
> the cpus at runtime then the correct cpus appear online?
> 
> Is that right?  Are you hotplugging the cpus in addition to having them
> specified as online in the XML, or are you altering the XML then hotplugging
> them at runtime instead?
> 
> 
> Andrea,
> 
> IIUC, for the (non boot) CPUs specified online in the XML, libvirt will
> hotplug them after starting qemu with -S but before allowing the guest to
> continue.  Is that right?

Yes that is right. Libvirt needs to query how the topology will look like and also tries not to start throwaway processes. That is the reason to configure the vCPUs via hotplug.

> I suspect this is because the PAPR hotplugging logic isn't working when
> invoked before the OS has booted properly.  I think my upstream patches to
> cleanup the DRC code will fix this, but they're fairly extensive so I wasn't
> intending to backport them for Pegas 1.0.
> 
> If this use case and libvirt behaviour is important enough I might need to
> reconsider that.

We don't want to start a throwaway qemu just to query stuff before starting and since this data is dependant on the machine type and topology it can't be cached prior when we are loading the qemu capabilities.
Comment 12 Peter Krempa 2017-08-17 06:03:10 EDT
So according to the data in comment 4, libvirt configured the vcpus properly, thus it looks like this bug should be moved to qemu or kernel, since libvirt is doing the same for x86 and it works there.

David, which component is appropriate in this case?
Comment 13 Peter Krempa 2017-08-17 06:03:27 EDT
*** Bug 1482437 has been marked as a duplicate of this bug. ***
Comment 14 David Gibson 2017-08-21 00:35:59 EDT
Moved to qemu.  We've identified at least one upstream bug that's related to this, though it may not be the only one.

How to fix it in the timeframe is.. going to be tricky.
Comment 15 Qunfang Zhang 2017-08-21 01:36:31 EDT
Hi, junli

Could you provide qemu command line by "#ps -aux | grep kvm" when you hit the bug? Thanks!

Note You need to log in before you can comment on or make changes to this bug.