Bug 1448344
| Summary: | Failed to hot unplug cpu core which hotplugged in early boot stages | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Xujun Ma <xuma> | ||||
| Component: | qemu-kvm-rhev | Assignee: | David Gibson <dgibson> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Xujun Ma <xuma> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.4 | CC: | bugproxy, dgibson, hannsj_uhl, hhuang, junli, knoel, lvivier, mdeng, michen, mrezanin, qzhang, virt-maint | ||||
| Target Milestone: | rc | Keywords: | Patch, Regression | ||||
| Target Release: | 7.5 | ||||||
| Hardware: | ppc64le | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qemu-kvm-rhev-2.10.0-2.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-04-11 00:19:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1432382 | ||||||
| Bug Blocks: | 1399177, 1438583, 1469590, 1473046, 1476742, 1479694, 1522983 | ||||||
| Attachments: |
|
||||||
|
Description
Xujun Ma
2017-05-05 09:03:00 UTC
Looks to me as if guest doesn't know about hotplugged CPU (probably lost dynamically added fdt). Reassigning BZ to David as he probably knows better how ppc should behave. It really looks like BZ1432382 for memory. Fixed by: fe6824d spapr: fix memory hot-unplugging But it seems something is missing for the CPU case. In fact, we have the same bug with device/memory hotplug. Moreover it crashes on the second device_del: on boot: (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 (qemu) info memdev memory backend: mem1 size: 1073741824 merge: true dump: true prealloc: false policy: default host nodes: (qemu) info memory-devices Memory device [dimm]: "dimm1" addr: 0x100000000 slot: 0 node: 0 size: 1073741824 memdev: /objects/mem1 hotplugged: true hotpluggable: true After boot: (qemu) device_del dimm1 (qemu) info memdev memory backend: mem1 size: 1073741824 merge: true dump: true prealloc: false policy: default host nodes: (qemu) info memory-devices Memory device [dimm]: "dimm1" addr: 0x100000000 slot: 0 node: 0 size: 1073741824 memdev: /objects/mem1 hotplugged: true hotpluggable: true (qemu) device_del dimm1 used ring relocated for ring 2 qemu-system-ppc64: /home/lvivier/Projects/qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r >= 0' failed. As device hotplug cannot be managed by SLOF (and can't be cleanly canceled) I propose to disable it until the OS is started. https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05226.html *** Bug 1459017 has been marked as a duplicate of this bug. *** The same issue on RHEL-ALT-7.4 with qemu-kvm-2.9.0-18.el7a.ppc64le localhost login: (qemu) info cpus * CPU #0: nip=0x000000001001b35c thread_id=8860 CPU #1: nip=0x000000001009a558 thread_id=8861 CPU #2: nip=0x0000000000000000 (halted) thread_id=8892 (qemu) device_del core2 (qemu) [ 33.982250] pseries-hotplug-cpu: Failed to acquire DRC, rc: -22, drc index: 10000002 [ 33.982405] pseries-hotplug-cpu: Cannot find CPU (drc index 10000002) to remove (qemu) info cpus * CPU #0: nip=0x00000000100030d4 thread_id=8860 CPU #1: nip=0xc0000000000c9390 thread_id=8861 CPU #2: nip=0x0000000000000000 (halted) thread_id=8892 (qemu) device_del core2 (qemu) [ 52.758000] pseries-hotplug-cpu: Cannot find CPU (drc index 10000002) to remove (qemu) info cpus * CPU #0: nip=0x00000000100030ac thread_id=8860 CPU #1: nip=0xc0000000000c9390 thread_id=8861 ------- Comment From danielhb.com 2017-07-27 15:52 EDT------- I've tested this scenario using upstream QEMU (which contains most of the hotplug changes we're going to ship in 2.10) to see if this was reproducible. The behavior changed a little from what was verified in the original report: the CPU unplug works, but the hotplugged CPU remained in the halted state, not being recognized by the guest kernel. This is the setup I've used: - Host: Cent OS $ uname -a Linux localhost 4.11.0-7.gitd255e14.el7.centos.ppc64le #1 SMP Wed Jul 26 11:46:31 BRT 2017 ppc64le ppc64le ppc64le GNU/Linux - Guest: Fedora 26 ppc64le [danielhb@localhost ~]$ uname -a Linux localhost.localdomain 4.11.11-300.fc26.ppc64le #1 SMP Mon Jul 17 16:14:56 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux [danielhb@localhost ~]$ In the host, the added CPU remained as halted and wasn't recognized by the guest: localhost login: (qemu) * CPU #0: nip=0xc0000000000a9c0c thread_id=20516 CPU #1: nip=0x0000000000000000 (halted) thread_id=20517 (qemu) [danielhb@localhost ~]$ lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Model: 2.0 (pvr 004d 0200) Model name: POWER8 (raw), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0 [danielhb@localhost ~]$ Hot unplug the CPU works, but not without leaving a warning in /var/log/messages: [danielhb@localhost ~]$ (qemu) (qemu) device_del core1 * CPU #0: nip=0xc0000000000a9c0c thread_id=20516 (qemu) [danielhb@localhost ~]$ dmesg | tail -n 3 [ 18.770840] ip_set: protocol 6 [ 49.519635] pseries-hotplug-cpu: Failed to acquire DRC, rc: -22, drc index: 10000008 [ 49.519642] pseries-hotplug-cpu: Cannot find CPU (drc index 10000008) to remove [danielhb@localhost ~]$ Any subsequent CPU hot plug/unplug operations works as expected. One thing I've noticed is that this experiment can also be reproduced by passing -S in the QEMU command line, adding the CPU and then issuing 'cont' in QEMU monitor to resume the boot. I am mentioning it because this is a testing scenario I've discussed with David in the community a couple of weeks ago. At that time David mentioned that he couldn't reproduce this bug, which led us to believe that there were host/guest configuration specifics that were impacting the outcome. Given that this also affects memory hotplug (and with a worse outcome in my tests - hot unplugging memory that was hotplugged in early boot panics the guest kernel) I'll resume the investigation and the discussions in the community about it. ------- Comment From danielhb.com 2017-08-31 09:29 EDT------- This problem ended up having not only a QEMU side but also a kernel side. When hotplugging a device in early boot, before CAS, this is what happens: - QEMU treats the device as hotplugged. This means that an IRQ pulse is sent to warn the guest that a new device was attached and the event is store in an internal queue. It can be retrieved later by the kernel using a RTAS call 'check_exception'. - At this point in early boot, the pulse is ignored. The firmware does not deal with hotplugged devices. - The kernel, during boot, does not de-queue the existing events. The device remains in 'halted' state, waiting for activation. This is why we are facing problems in early hotplug. This is also why the early hotplugged device starts working after any device hotplug/unplug happens - the kernel executes check_exception after each of these operations and it became aware of the device that was hotplugged in the early stages. One obvious solutions is to avoid hotplugging at these stages, but this is too extreme and breaks some Libvirt use cases. Another solution I've tried was to make QEMU consider all devices hotplugged before CAS as coldplugged, but this proved to be too hard with the current QEMU code base. Yet another solution I've tried, proposed by David, was to pulse the hotplug queue during CAS to try to make the kernel fetch the existing events using check_exception. This pulse at this time causes a kernel ooops with sig 11 (bad access). I've reported this behavior in the linuxppc-devel kernel mailing list and this turns out to be a bug. We could sit and wait for the kernel community to fix the bug, but then there's also other older kernels/guests that will not see this fix. The solution that was pushed to qemu 2.11 was to fire a CAS induced reset when a hotplugged device is detected during CAS. This reset is enough to set the FDTs of those early hotplugged devices in a way that the kernel does not need a pulse to recognize them, while allowing them to be hot unplugged as a regular hotplugged device. This solution is also indepedent of any kernel fixes. To sum up: this problem is fixed in the upcoming QEMU 2.11. Daniel Now that we have a qemu 2.10 based downstream tree, I've backported the relevant patches for this and built them at: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14170740 *** Bug 1479694 has been marked as a duplicate of this bug. *** Created attachment 1336266 [details]
guest xml
Assigned to David, as he sent the patch. Fix included in qemu-kvm-rhev-2.10.0-2.el7 Test the issue on old version: Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.10.0-1.el7.ppc64le Steps to Reproduce: 1.The same steps as bug description. Actual results: Failed to hot unplug cpu core. Verified the issue on the latest build: Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.10.0-6.el7.ppc64le Steps to Reproduce: 1.The same steps as bug description. Actual results: Hot unplug cpu core successful. ------- Comment From satheera.com 2017-12-07 12:25 EDT------- Tested with qemu-kvm-ma-2.10.0-10.el7.ppc64le and 4.14.0-6.el7a.ppc64le(host + guest) 1. <vcpu placement='static' current='3'>4</vcpu> <vcpus> <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/> <vcpu id='1' enabled='yes' hotpluggable='yes' order='2'/> <vcpu id='2' enabled='yes' hotpluggable='yes' order='3'/> <vcpu id='3' enabled='no' hotpluggable='yes' order='4'/> </vcpus> <os> <type arch='ppc64le' machine='pseries-rhel7.5.0'>hvm</type> <boot dev='hd'/> </os> <cpu> <topology sockets='1' cores='4' threads='1'/> <numa> <cell id='0' cpus='0-1' memory='4194304' unit='KiB'/> <cell id='1' cpus='2-3' memory='4194304' unit='KiB'/> </numa> 2. [root@localhost ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 3 On-line CPU(s) list: 0-2 Thread(s) per core: 1 Core(s) per socket: 3 Socket(s): 1 NUMA node(s): 2 Model: 2.0 (pvr 004e 1200) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 0,1 NUMA node1 CPU(s): 2 3. # virsh qemu-monitor-command vm1 --cmd 'info cpus' --hmp * CPU #0: nip=0xc0000000000db9cc thread_id=4316 CPU #1: nip=0xc0000000000db9cc thread_id=4336 CPU #2: nip=0xc0000000000db9cc thread_id=4337 4. /usr/libexec/qemu-kvm -name guest=vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-vm1/master-key.aes -machine pseries-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -m 8192 -realtime mlock=off -smp 1,maxcpus=4,sockets=1,cores=4,threads=1 -numa node,nodeid=0,cpus=0-1,mem=4096 -numa node,nodeid=1,cpus=2-3,mem=4096 -uuid 1bd94987-d571-4892-828c-eba1fcc2a58f -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=/var/lib/libvirt/images/workspace/pegas-1.0-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1a:19:93,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on Regards, -Satheesh Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 |