Bug 1479694 - Guest cpus are not enabled when the guest start
Guest cpus are not enabled when the guest start
Status: CLOSED DUPLICATE of bug 1448344
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.4
ppc64le Linux
high Severity high
: rc
: 7.5
Assigned To: David Gibson
Xujun Ma
:
: 1482437 1487415 (view as bug list)
Depends On: 1448344
Blocks: 1399177 1438583 1440030
  Show dependency treegraph
 
Reported: 2017-08-09 04:15 EDT by junli
Modified: 2017-10-09 05:07 EDT (History)
18 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-10-09 05:07:29 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
guest xml (3.71 KB, application/octet-stream)
2017-09-04 03:10 EDT, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 157816 None None None 2017-09-05 09:10 EDT

  None (edit)
Description junli 2017-08-09 04:15:23 EDT
Description of problem:
Guest cpus are not enabled when the guest start

Version-Release number of selected component (if applicable):
libvirt version: 3.2.0, package: 14.virtcov.el7_4.2
QEMU emulator version 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.3)
Red Hat Enterprise Linux Server release 7.4 (Maipo)
3.10.0-693.el7.ppc64le

How reproducible:
100%

Steps to Reproduce:
1.Prepare a domain xml:

<domain type='kvm' id='17'>
  <name>avocado-vt-vm1</name>
  <uuid>aafc3e8a-ce65-44c5-86ab-1d39bab26887</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static' current='4'>8</vcpu>
  <vcpus>
    <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
    <vcpu id='1' enabled='yes' hotpluggable='yes' order='3'/>
    <vcpu id='2' enabled='no' hotpluggable='yes'/>
    <vcpu id='3' enabled='yes' hotpluggable='yes' order='2'/>
    <vcpu id='4' enabled='no' hotpluggable='yes'/>
    <vcpu id='5' enabled='yes' hotpluggable='yes' order='4'/>
    <vcpu id='6' enabled='no' hotpluggable='yes'/>
    <vcpu id='7' enabled='no' hotpluggable='yes'/>
  </vcpus>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='ppc64le' machine='pseries-rhel7.4.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <clock offset='utc'/>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/avocado/data/avocado-vt/images/jeos-25-64.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <interface type='bridge'>
      <mac address='52:54:00:1f:3b:f9'/>
      <source bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </interface>
    <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c120,c334</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c120,c334</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

2.Define this xml
3.Start the guest
4.Run "lscpu" in the guest

Actual results:
CPU(s) is 1

Expected results:
CPU(s) is 4

Additional info:
Comment 1 junli 2017-08-09 04:17:36 EDT
(In reply to junli from comment #0)
> Description of problem:
> Guest cpus are not enabled when the guest start
> 
> Version-Release number of selected component (if applicable):
> libvirt version: 3.2.0, package: 14.virtcov.el7_4.2
> QEMU emulator version 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.3)
> Red Hat Enterprise Linux Server release 7.4 (Maipo)
> 3.10.0-693.el7.ppc64le
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1.Prepare a domain xml:
> 
> <domain type='kvm' id='17'>
>   <name>avocado-vt-vm1</name>
>   <uuid>aafc3e8a-ce65-44c5-86ab-1d39bab26887</uuid>
>   <memory unit='KiB'>1048576</memory>
>   <currentMemory unit='KiB'>1048576</currentMemory>
>   <vcpu placement='static' current='4'>8</vcpu>
>   <vcpus>
>     <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
>     <vcpu id='1' enabled='yes' hotpluggable='yes' order='3'/>
>     <vcpu id='2' enabled='no' hotpluggable='yes'/>
>     <vcpu id='3' enabled='yes' hotpluggable='yes' order='2'/>
>     <vcpu id='4' enabled='no' hotpluggable='yes'/>
>     <vcpu id='5' enabled='yes' hotpluggable='yes' order='4'/>
>     <vcpu id='6' enabled='no' hotpluggable='yes'/>
>     <vcpu id='7' enabled='no' hotpluggable='yes'/>
>   </vcpus>
>   <resource>
>     <partition>/machine</partition>
>   </resource>
>   <os>
>     <type arch='ppc64le' machine='pseries-rhel7.4.0'>hvm</type>
>     <boot dev='hd'/>
>   </os>
>   <clock offset='utc'/>
>   <devices>
>     <emulator>/usr/libexec/qemu-kvm</emulator>
>     <disk type='file' device='disk'>
>       <driver name='qemu' type='qcow2'/>
>       <source
> file='/var/lib/avocado/data/avocado-vt/images/jeos-25-64.qcow2'/>
>       <backingStore/>
>       <target dev='vda' bus='virtio'/>
>       <alias name='virtio-disk0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
> function='0x0'/>
>     </disk>
>     <interface type='bridge'>
>       <mac address='52:54:00:1f:3b:f9'/>
>       <source bridge='virbr0'/>
>       <target dev='vnet0'/>
>       <model type='virtio'/>
>       <alias name='net0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
> function='0x0'/>
>     </interface>
>     <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
>       <listen type='address' address='127.0.0.1'/>
>     </graphics>
>   </devices>
>   <seclabel type='dynamic' model='selinux' relabel='yes'>
>     <label>system_u:system_r:svirt_t:s0:c120,c334</label>
>     <imagelabel>system_u:object_r:svirt_image_t:s0:c120,c334</imagelabel>
>   </seclabel>
>   <seclabel type='dynamic' model='dac' relabel='yes'>
>     <label>+107:+107</label>
>     <imagelabel>+107:+107</imagelabel>
>   </seclabel>
> </domain>
> 
> 2.Define this xml
> 3.Start the guest
> 4.Run "lscpu" in the guest
> 
> Actual results:
> CPU(s) is 1
> 
> Expected results:
> CPU(s) is 4
> 
> Additional info:

QEMU emulator version 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.3)

Red Hat Enterprise Linux Server release 7.4 (Maipo)

3.10.0-693.el7.ppc64le
Comment 3 Peter Krempa 2017-08-09 05:01:02 EDT
Please post full output of "lcspu" from the guest and on the host please run:

virsh qemu-monitor-command --pretty $VMNAME '{"execute":"query-cpus"}'

and 

virsh qemu-monitor-command --pretty $VMNAME '{"execute":"query-hotpluggable-cpus"}'
Comment 4 junli 2017-08-09 05:16:25 EDT
# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                1
On-line CPU(s) list:   0
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Model:                 2.1 (pvr 004b 0201)
Model name:            POWER8E (raw), altivec supported
Hypervisor vendor:     (null)
Virtualization type:   full
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0


# virsh qemu-monitor-command --pretty avocado-vt-vm1 '{"execute":"query-cpus"}'
{
  "return": [
    {
      "arch": "ppc",
      "current": true,
      "CPU": 0,
      "nip": -4611686018426742380,
      "qom_path": "/machine/unattached/device[0]/thread[0]",
      "halted": false,
      "thread_id": 72669
    },
    {
      "arch": "ppc",
      "current": false,
      "CPU": 3,
      "nip": 0,
      "qom_path": "/machine/peripheral/vcpu3/thread[0]",
      "halted": true,
      "thread_id": 72685
    },
    {
      "arch": "ppc",
      "current": false,
      "CPU": 1,
      "nip": 0,
      "qom_path": "/machine/peripheral/vcpu1/thread[0]",
      "halted": true,
      "thread_id": 72688
    },
    {
      "arch": "ppc",
      "current": false,
      "CPU": 5,
      "nip": 0,
      "qom_path": "/machine/peripheral/vcpu5/thread[0]",
      "halted": true,
      "thread_id": 72689
    }
  ],
  "id": "libvirt-19"
}


# virsh qemu-monitor-command --pretty avocado-vt-vm1 '{"execute":"query-hotpluggable-cpus"}'
{
  "return": [
    {
      "props": {
        "core-id": 7
      },
      "vcpus-count": 1,
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 6
      },
      "vcpus-count": 1,
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 5
      },
      "vcpus-count": 1,
      "qom-path": "/machine/peripheral/vcpu5",
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 4
      },
      "vcpus-count": 1,
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 3
      },
      "vcpus-count": 1,
      "qom-path": "/machine/peripheral/vcpu3",
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 2
      },
      "vcpus-count": 1,
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 1
      },
      "vcpus-count": 1,
      "qom-path": "/machine/peripheral/vcpu1",
      "type": "host-spapr-cpu-core"
    },
    {
      "props": {
        "core-id": 0
      },
      "vcpus-count": 1,
      "qom-path": "/machine/unattached/device[0]",
      "type": "host-spapr-cpu-core"
    }
  ],
  "id": "libvirt-20"
}
Comment 5 David Gibson 2017-08-10 20:33:28 EDT
Looks like the CPUs have been hotplugged, but not onlined in the guest.  Probably a guest side problem.

Can you check if 'rtas_errd' is running within the guest?
Comment 6 junli 2017-08-10 21:39:12 EDT
(In reply to David Gibson from comment #5)
> Looks like the CPUs have been hotplugged, but not onlined in the guest. 
> Probably a guest side problem.
> 
> Can you check if 'rtas_errd' is running within the guest?

It is running

root    823    1  0 09:32 ?      00:00:00 /usr/sbin/rtas_errd
Comment 7 David Gibson 2017-08-11 01:44:26 EDT
Ok, next thing would be to attach dmesg from the guest after hotplugging the cpus.  Maybe we'll see some useful errors in there.
Comment 8 junli 2017-08-13 23:14:09 EDT
(In reply to David Gibson from comment #7)
> Ok, next thing would be to attach dmesg from the guest after hotplugging the
> cpus.  Maybe we'll see some useful errors in there.

After hotplugging the cpus, the lscpu's result is correct.

And this is the dmesg log:

# dmesg | grep cpu
[    0.000000] Partition configured for 8 cpus.
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] PERCPU: Embedded 3 pages/cpu @c000000001700000 s124952 r0 d71656 u262144
[    0.000000] pcpu-alloc: s124952 r0 d71656 u262144 alloc=1*1048576
[    0.000000] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 
[    0.000000] 	RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=8.
[    0.000003] clockevent: decrementer mult[83126e98] shift[32] cpu[0]
[    0.384418] cpuidle: using governor menu
Comment 9 David Gibson 2017-08-14 21:13:30 EDT
Junli,

To clarify, if you just boot the guest with online/offline cpus specified in the XML, then only the boot cpu actually appears online?  But if you hotplug the cpus at runtime then the correct cpus appear online?

Is that right?  Are you hotplugging the cpus in addition to having them specified as online in the XML, or are you altering the XML then hotplugging them at runtime instead?


Andrea,

IIUC, for the (non boot) CPUs specified online in the XML, libvirt will hotplug them after starting qemu with -S but before allowing the guest to continue.  Is that right?


I suspect this is because the PAPR hotplugging logic isn't working when invoked before the OS has booted properly.  I think my upstream patches to cleanup the DRC code will fix this, but they're fairly extensive so I wasn't intending to backport them for Pegas 1.0.

If this use case and libvirt behaviour is important enough I might need to reconsider that.
Comment 10 junli 2017-08-14 22:01:07 EDT
Yes.
When I specify vcpu 0-3 are enabled in the XML, vcpuinfo's result is correct and lscpu in the guest is wrong(Just one cpu in the guest).
But I don't specify vcpu and vcpu 0-3 are enabled by auto, both their results are correct.

Then, when I hotplug any vcpu(whatever enable or disable), the lscpu in the guest's result will be correct(I don't change the XML)
Comment 11 Peter Krempa 2017-08-15 04:37:12 EDT
(In reply to David Gibson from comment #9)
> Junli,
> 
> To clarify, if you just boot the guest with online/offline cpus specified in
> the XML, then only the boot cpu actually appears online?  But if you hotplug
> the cpus at runtime then the correct cpus appear online?
> 
> Is that right?  Are you hotplugging the cpus in addition to having them
> specified as online in the XML, or are you altering the XML then hotplugging
> them at runtime instead?
> 
> 
> Andrea,
> 
> IIUC, for the (non boot) CPUs specified online in the XML, libvirt will
> hotplug them after starting qemu with -S but before allowing the guest to
> continue.  Is that right?

Yes that is right. Libvirt needs to query how the topology will look like and also tries not to start throwaway processes. That is the reason to configure the vCPUs via hotplug.

> I suspect this is because the PAPR hotplugging logic isn't working when
> invoked before the OS has booted properly.  I think my upstream patches to
> cleanup the DRC code will fix this, but they're fairly extensive so I wasn't
> intending to backport them for Pegas 1.0.
> 
> If this use case and libvirt behaviour is important enough I might need to
> reconsider that.

We don't want to start a throwaway qemu just to query stuff before starting and since this data is dependant on the machine type and topology it can't be cached prior when we are loading the qemu capabilities.
Comment 12 Peter Krempa 2017-08-17 06:03:10 EDT
So according to the data in comment 4, libvirt configured the vcpus properly, thus it looks like this bug should be moved to qemu or kernel, since libvirt is doing the same for x86 and it works there.

David, which component is appropriate in this case?
Comment 13 Peter Krempa 2017-08-17 06:03:27 EDT
*** Bug 1482437 has been marked as a duplicate of this bug. ***
Comment 14 David Gibson 2017-08-21 00:35:59 EDT
Moved to qemu.  We've identified at least one upstream bug that's related to this, though it may not be the only one.

How to fix it in the timeframe is.. going to be tricky.
Comment 15 Qunfang Zhang 2017-08-21 01:36:31 EDT
Hi, junli

Could you provide qemu command line by "#ps -aux | grep kvm" when you hit the bug? Thanks!
Comment 16 junli 2017-08-21 20:53:43 EDT
(In reply to Qunfang Zhang from comment #15)
> Hi, junli
> 
> Could you provide qemu command line by "#ps -aux | grep kvm" when you hit
> the bug? Thanks!

/usr/libexec/qemu-kvm -name guest=avocado-vt-vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-avocado-vt-vm1/master-key.aes -machine pseries-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -m 1024 -realtime mlock=off -smp 1,maxcpus=8,sockets=8,cores=1,threads=1 -uuid aafc3e8a-ce65-44c5-86ab-1d39bab26887 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-avocado-vt-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -drive file=/var/lib/avocado/data/avocado-vt/images/jeos-25-64.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1f:3b:f9,bus=pci.0,addr=0x1 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -vnc 127.0.0.1:0 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on
Comment 17 David Gibson 2017-08-31 00:24:29 EDT
I've staged an upstream fix in my ppc-for-2.11 tree.

However, 2.11 development hasn't opened yet, so it can't be pushed to the master tree yet.

Setting blocker?

Case for blocker
================

As well as the specific triggering case described in this bug, the underlying problem could be triggered in a number of guest configurations, which libvirt performing early hotplug operations which aren't correctly picked up by the guest.

We have an upstream patch drafted for this, however the upstream 2.10/2.11 schedule means we may have to wait before a final upstream merge.  For that reason the downstream backport may come quite late in the Pegas 1.0 release cycle.
Comment 18 David Gibson 2017-08-31 02:25:34 EDT
There's another complication here.  The upstream fix is based on the substantial rework of PAPR specific hotplug code that went into qemu 2.10, which we don't have in 2.9 based downstream.

We will have to weigh the risk of backporting that large set of patches versus implementing a downstream only version of the fix for this bug.
Comment 19 David Gibson 2017-09-04 00:08:39 EDT
I've made a (very preliminary) backport of the necessary patches, brewed at:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13996120
Comment 20 David Gibson 2017-09-04 01:03:02 EDT
junli,

IIUC, this is not a regression from RHEL 7.4.

Since CPU hotplug isn't expected in the CORAL use case, I'm inclined to postpone this fix until RHEL 7.5 - fixing it in Pegas 1.0 will require a very late backport of a lot of patches, which seems pretty risky.
Comment 21 junli 2017-09-04 01:14:51 EDT
(In reply to David Gibson from comment #20)
> junli,
> 
> IIUC, this is not a regression from RHEL 7.4.
> 
> Since CPU hotplug isn't expected in the CORAL use case, I'm inclined to
> postpone this fix until RHEL 7.5 - fixing it in Pegas 1.0 will require a
> very late backport of a lot of patches, which seems pretty risky.

I have no opinion and follow your plan.
Comment 22 David Gibson 2017-09-04 01:37:11 EDT
*** Bug 1487415 has been marked as a duplicate of this bug. ***
Comment 23 IBM Bug Proxy 2017-09-04 03:10:33 EDT
Created attachment 1321708 [details]
guest xml
Comment 24 Frank Novak 2017-09-04 09:02:10 EDT
Hmm, AFAIK, cpu hot plug (add) was working and I thought supported in RHV for Power..

Having said that, CPU hot plug is not required for CORAL, though that wouldn't be the case for the general KVM support in Pegas..
Comment 25 Karen Noel 2017-09-21 13:16:56 EDT
Move to qemu-kvm-rhev. This fix will apply to both RHEL KVM and qemu-kvm-rhev for RHV and RHOSP. Both packages are using the same code base.
Comment 27 David Gibson 2017-10-04 01:46:38 EDT
I believe this should be fixed by the same patches backported for bug 1448344.

When you have time, can you retest with the scratch kernel from bug 1448344 comment 12.
Comment 28 junli 2017-10-08 23:48:03 EDT
(In reply to David Gibson from comment #27)
> I believe this should be fixed by the same patches backported for bug
> 1448344.
> 
> When you have time, can you retest with the scratch kernel from bug 1448344
> comment 12.

Sorry, it is still reproduced on 
# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-2.10.0-1.el7.ppc64le
Comment 29 Qunfang Zhang 2017-10-09 00:12:58 EDT
(In reply to David Gibson from comment #27)
> I believe this should be fixed by the same patches backported for bug
> 1448344.
> 
> When you have time, can you retest with the scratch kernel from bug 1448344
> comment 12.

Hello, David

The build you mentioned above is closed, could you have re-build a new one?  Thanks.
Comment 30 David Gibson 2017-10-09 02:53:52 EDT
Ah, yes.  I've rebuilt here:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14217635
Comment 31 Qunfang Zhang 2017-10-09 04:16:05 EDT
Thanks David!

Hi, Junxiang

Could you please give a help to re-test the above build?  

Thanks,
Qunfang
Comment 32 junli 2017-10-09 04:59:40 EDT
(In reply to David Gibson from comment #30)
> Ah, yes.  I've rebuilt here:
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14217635

Congratulations! It has been fixed.
Comment 33 David Gibson 2017-10-09 05:07:29 EDT

*** This bug has been marked as a duplicate of bug 1448344 ***

Note You need to log in before you can comment on or make changes to this bug.