Bug 1448344 - Failed to hot unplug cpu core which hotplugged in early boot stages
Summary: Failed to hot unplug cpu core which hotplugged in early boot stages
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: ppc64le
OS: Linux
unspecified
high
Target Milestone: rc
: 7.5
Assignee: David Gibson
QA Contact: Xujun Ma
URL:
Whiteboard:
Keywords: Patch, Regression
: 1459017 1479694 (view as bug list)
Depends On: 1432382
Blocks: 1399177 1438583 1476742 1522983 1469590 1473046 1479694
TreeView+ depends on / blocked
 
Reported: 2017-05-05 09:03 UTC by Xujun Ma
Modified: 2018-04-11 00:21 UTC (History)
12 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2018-04-11 00:19:31 UTC


Attachments (Terms of Use)
guest xml (3.71 KB, application/octet-stream)
2017-10-09 09:41 UTC, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1104 None None None 2018-04-11 00:21 UTC
IBM Linux Technology Center 155954 None None None 2019-03-22 07:24 UTC

Description Xujun Ma 2017-05-05 09:03:00 UTC
Description of problem:
Failed to hot unplug cpu core which hotplugged in early boot stages

Version-Release number of selected component (if applicable):
host and guest:3.10.0-657.el7.ppc64le
qemu-kvm-rhev-2.9.0-2.el7.ppc64le


How reproducible:
100%


Steps to Reproduce:
1.Boot up guest with qemu cmdline:
 -name vm \
 -m 8192 \
 -monitor stdio \
 -smp 8,maxcpus=16,cores=2,threads=8,sockets=1\
 -vnc :99\
 -vga std \
 -device virtio-scsi-pci,bus=pci.0,addr=0x5 \
 -device scsi-hd,id=scsi-hd0,drive=scsi-hd-dr0,bootindex=0 \
 -drive file=RHEL-7.4-20170426.4-ppc64le.qcow2,if=none,id=scsi-hd-dr0,format=qcow2,cache=none \\
 -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on 
 
2.Hotplug cpu core immediately after run cmdline:
(qemu)device_add host-spapr-cpu-core,core-id=8,id=core8

3.Hotunplug the core after guest boot up
(qemu)device_del core8
4.Check cpus:
(qemu)info cpus
* CPU #0: nip=0xc00000000009c0d4 thread_id=49315
  CPU #1: nip=0xc00000000009c0d4 thread_id=49316
  CPU #2: nip=0xc00000000009c0d4 thread_id=49317
  CPU #3: nip=0xc00000000009c0d4 thread_id=49318
  CPU #4: nip=0xc00000000009c0d4 thread_id=49319
  CPU #5: nip=0xc00000000009c0d4 thread_id=49320
  CPU #6: nip=0xc00000000009c0d4 thread_id=49321
  CPU #7: nip=0xc00000000009c0d4 thread_id=49322
  CPU #8: nip=0x0000000000000000 (halted) thread_id=49372
  CPU #9: nip=0x0000000000000000 (halted) thread_id=49373
  CPU #10: nip=0x0000000000000000 (halted) thread_id=49374
  CPU #11: nip=0x0000000000000000 (halted) thread_id=49375
  CPU #12: nip=0x0000000000000000 (halted) thread_id=49376
  CPU #13: nip=0x0000000000000000 (halted) thread_id=49377
  CPU #14: nip=0x0000000000000000 (halted) thread_id=49378
  CPU #15: nip=0x0000000000000000 (halted) thread_id=49379

Actual results:
Can't hotunplug the cpu core with device_del cmd at first time ,but can hotunplug it at second time. 

Expected results:
Can hotunplug the cpu core at first time.

Additional info:
ppc only.

Comment 2 Igor Mammedov 2017-05-05 13:26:22 UTC
Looks to me as if guest doesn't know about hotplugged CPU (probably lost dynamically added fdt).
Reassigning BZ to David as he probably knows better how ppc should behave.

Comment 3 Laurent Vivier 2017-05-09 13:09:02 UTC
It really looks like BZ1432382 for memory.

Fixed by:

    fe6824d spapr: fix memory hot-unplugging

But it seems something is missing for the CPU case.

Comment 4 Laurent Vivier 2017-05-09 14:48:08 UTC
In fact, we have the same bug with device/memory hotplug.

Moreover it crashes on the second device_del:

on boot:

(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
(qemu) info memdev 
memory backend: mem1
  size:  1073741824
  merge: true
  dump: true
  prealloc: false
  policy: default
  host nodes: 
(qemu) info memory-devices
Memory device [dimm]: "dimm1"
  addr: 0x100000000
  slot: 0
  node: 0
  size: 1073741824
  memdev: /objects/mem1
  hotplugged: true
  hotpluggable: true

After boot:

(qemu) device_del dimm1
(qemu) info memdev 
memory backend: mem1
  size:  1073741824
  merge: true
  dump: true
  prealloc: false
  policy: default
  host nodes: 

(qemu) info memory-devices
Memory device [dimm]: "dimm1"
  addr: 0x100000000
  slot: 0
  node: 0
  size: 1073741824
  memdev: /objects/mem1
  hotplugged: true
  hotpluggable: true
(qemu) device_del dimm1
used ring relocated for ring 2
qemu-system-ppc64: /home/lvivier/Projects/qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r >= 0' failed.

Comment 5 Laurent Vivier 2017-05-23 12:51:45 UTC
As device hotplug cannot be managed by SLOF (and can't be cleanly canceled) I propose to disable it until the OS is started.

https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05226.html

Comment 6 David Gibson 2017-06-07 03:23:46 UTC
*** Bug 1459017 has been marked as a duplicate of this bug. ***

Comment 9 Xujun Ma 2017-07-27 06:37:02 UTC
The same issue on RHEL-ALT-7.4 with qemu-kvm-2.9.0-18.el7a.ppc64le

localhost login: 
(qemu) info cpus
* CPU #0: nip=0x000000001001b35c thread_id=8860
  CPU #1: nip=0x000000001009a558 thread_id=8861
  CPU #2: nip=0x0000000000000000 (halted) thread_id=8892
(qemu) device_del core2 
(qemu) [   33.982250] pseries-hotplug-cpu: Failed to acquire DRC, rc: -22, drc index: 10000002
[   33.982405] pseries-hotplug-cpu: Cannot find CPU (drc index 10000002) to remove
(qemu) info cpus
* CPU #0: nip=0x00000000100030d4 thread_id=8860
  CPU #1: nip=0xc0000000000c9390 thread_id=8861
  CPU #2: nip=0x0000000000000000 (halted) thread_id=8892
(qemu) device_del core2 
(qemu) [   52.758000] pseries-hotplug-cpu: Cannot find CPU (drc index 10000002) to remove

(qemu) info cpus
* CPU #0: nip=0x00000000100030ac thread_id=8860
  CPU #1: nip=0xc0000000000c9390 thread_id=8861

Comment 10 IBM Bug Proxy 2017-07-27 20:00:33 UTC
------- Comment From danielhb@br.ibm.com 2017-07-27 15:52 EDT-------
I've tested this scenario using upstream QEMU (which contains most of the hotplug changes we're going to ship in 2.10) to see if this was reproducible. The behavior changed a little from what was verified in the original report: the CPU unplug works, but the hotplugged CPU remained in the halted state, not being recognized by the guest kernel. This is the setup I've used:

- Host: Cent OS

$ uname -a
Linux localhost 4.11.0-7.gitd255e14.el7.centos.ppc64le #1 SMP Wed Jul 26 11:46:31 BRT 2017 ppc64le ppc64le ppc64le GNU/Linux

- Guest: Fedora 26 ppc64le

[danielhb@localhost ~]$ uname -a
Linux localhost.localdomain 4.11.11-300.fc26.ppc64le #1 SMP Mon Jul 17 16:14:56 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
[danielhb@localhost ~]$

In the host, the added CPU remained as halted and wasn't recognized by the guest:

localhost login: (qemu)
* CPU #0: nip=0xc0000000000a9c0c thread_id=20516
CPU #1: nip=0x0000000000000000 (halted) thread_id=20517
(qemu)

[danielhb@localhost ~]$ lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Model:               2.0 (pvr 004d 0200)
Model name:          POWER8 (raw), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:           64K
L1i cache:           32K
NUMA node0 CPU(s):   0
[danielhb@localhost ~]$

Hot unplug the CPU works, but not without leaving a warning in /var/log/messages:

[danielhb@localhost ~]$ (qemu)
(qemu) device_del core1
* CPU #0: nip=0xc0000000000a9c0c thread_id=20516
(qemu)

[danielhb@localhost ~]$ dmesg | tail -n 3
[   18.770840] ip_set: protocol 6
[   49.519635] pseries-hotplug-cpu: Failed to acquire DRC, rc: -22, drc index: 10000008
[   49.519642] pseries-hotplug-cpu: Cannot find CPU (drc index 10000008) to remove
[danielhb@localhost ~]$

Any subsequent CPU hot plug/unplug operations works as expected.

One thing I've noticed is that this experiment can also be reproduced by passing -S in the QEMU command line, adding the CPU and then issuing 'cont' in QEMU monitor to resume the boot. I am mentioning it because this is a testing scenario I've discussed with David in the community a couple of weeks ago. At that time David mentioned that he couldn't reproduce this bug, which led us to believe that there were host/guest configuration specifics that were impacting the outcome.

Given that this also affects memory hotplug (and with a worse outcome in my tests - hot unplugging memory that was hotplugged in early boot panics the guest kernel) I'll resume the investigation and the discussions in the community about it.

Comment 11 IBM Bug Proxy 2017-08-31 13:30:32 UTC
------- Comment From danielhb@br.ibm.com 2017-08-31 09:29 EDT-------
This problem ended up having not only a QEMU side but also a kernel side. When hotplugging a device in early boot, before CAS, this is what happens:

- QEMU treats the device as hotplugged. This means that an IRQ pulse is sent to warn the guest that a new device was attached and the event is store in an internal queue. It can be retrieved later by the kernel using a RTAS call  'check_exception'.

- At this point in early boot, the pulse is ignored. The firmware does not deal with hotplugged devices.

- The kernel, during boot, does not de-queue the existing events. The device remains in 'halted' state, waiting for activation.

This is why we are facing problems in early hotplug. This is also why the early hotplugged device starts working after any device hotplug/unplug happens - the kernel executes check_exception after each of these operations and it became aware of the device that was hotplugged in the early stages.

One obvious solutions is to avoid hotplugging at these stages, but this is too extreme and breaks some Libvirt use cases.

Another solution I've tried was to make QEMU consider all devices hotplugged before CAS as coldplugged, but this proved to be too hard with the current QEMU code base.

Yet another solution I've tried, proposed by David, was to pulse the hotplug queue during CAS to try to make the kernel fetch the existing events using check_exception. This pulse at this  time causes a kernel ooops with sig 11 (bad access). I've reported this behavior in the linuxppc-devel kernel mailing list and this turns out to be a bug. We could sit and wait for the kernel community to fix the bug, but then there's also other older kernels/guests that will not see this fix.

The solution that was pushed to qemu 2.11 was to fire a CAS induced reset when a hotplugged device is detected during CAS. This reset is enough to set the FDTs of those early hotplugged devices in a way that the kernel does not need a pulse to recognize them, while allowing them to be hot unplugged as a regular hotplugged device. This solution is also indepedent of any kernel fixes.

To sum up: this problem is fixed in the upcoming QEMU 2.11.

Daniel

Comment 12 David Gibson 2017-10-04 05:34:54 UTC
Now that we have a qemu 2.10 based downstream tree, I've backported the relevant patches for this and built them at:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14170740

Comment 15 David Gibson 2017-10-09 09:07:29 UTC
*** Bug 1479694 has been marked as a duplicate of this bug. ***

Comment 16 IBM Bug Proxy 2017-10-09 09:41:18 UTC
Created attachment 1336266 [details]
guest xml

Comment 17 Laurent Vivier 2017-10-10 10:03:02 UTC
Assigned to David, as he sent the patch.

Comment 18 Miroslav Rezanina 2017-10-13 10:26:37 UTC
Fix included in qemu-kvm-rhev-2.10.0-2.el7

Comment 20 Xujun Ma 2017-11-17 08:27:14 UTC
Test the issue on old version:

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-1.el7.ppc64le    


Steps to Reproduce:
1.The same steps as bug description.

Actual results:
Failed to hot unplug cpu core.


Verified the issue on the latest build:
Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-6.el7.ppc64le


Steps to Reproduce:
1.The same steps as bug description.

Actual results:

Hot unplug cpu core successful.

Comment 21 IBM Bug Proxy 2017-12-07 17:31:22 UTC
------- Comment From satheera@in.ibm.com 2017-12-07 12:25 EDT-------
Tested with qemu-kvm-ma-2.10.0-10.el7.ppc64le and 4.14.0-6.el7a.ppc64le(host + guest)

1.  <vcpu placement='static' current='3'>4</vcpu>
<vcpus>
<vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
<vcpu id='1' enabled='yes' hotpluggable='yes' order='2'/>
<vcpu id='2' enabled='yes' hotpluggable='yes' order='3'/>
<vcpu id='3' enabled='no' hotpluggable='yes' order='4'/>
</vcpus>
<os>
<type arch='ppc64le' machine='pseries-rhel7.5.0'>hvm</type>
<boot dev='hd'/>
</os>
<cpu>
<topology sockets='1' cores='4' threads='1'/>
<numa>
<cell id='0' cpus='0-1' memory='4194304' unit='KiB'/>
<cell id='1' cpus='2-3' memory='4194304' unit='KiB'/>
</numa>

2. [root@localhost ~]# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                3
On-line CPU(s) list:   0-2
Thread(s) per core:    1
Core(s) per socket:    3
Socket(s):             1
NUMA node(s):          2
Model:                 2.0 (pvr 004e 1200)
Model name:            POWER9 (architected), altivec supported
Hypervisor vendor:     KVM
Virtualization type:   para
L1d cache:             32K
L1i cache:             32K
NUMA node0 CPU(s):     0,1
NUMA node1 CPU(s):     2

3. # virsh qemu-monitor-command vm1 --cmd 'info cpus' --hmp
* CPU #0: nip=0xc0000000000db9cc thread_id=4316
CPU #1: nip=0xc0000000000db9cc thread_id=4336
CPU #2: nip=0xc0000000000db9cc thread_id=4337

4. /usr/libexec/qemu-kvm -name guest=vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-vm1/master-key.aes -machine pseries-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -m 8192 -realtime mlock=off -smp 1,maxcpus=4,sockets=1,cores=4,threads=1 -numa node,nodeid=0,cpus=0-1,mem=4096 -numa node,nodeid=1,cpus=2-3,mem=4096 -uuid 1bd94987-d571-4892-828c-eba1fcc2a58f -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=/var/lib/libvirt/images/workspace/pegas-1.0-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1a:19:93,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on

Regards,
-Satheesh

Comment 23 errata-xmlrpc 2018-04-11 00:19:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104


Note You need to log in before you can comment on or make changes to this bug.