This bug report is based on the upstream bug: https://gitlab.com/libvirt/libvirt/-/issues/309 but I have updated the description and reproduction based on the discussion in the upstream report. Description of problem: If disk is detached from a guest while the guest OS is still booting then that disk get stuck. It seems that the detach succeeds from virsh perspective. But the disk is still visible both from the guest and from virsh as attached. However when the detach is retried, even after the guest OS is fully booted, it fails with "Failed to detach disk error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug". This was observed in OpenStack upstream CI with cirros 0.5.2 guest OS. But now reproduced without OpenStack with a more normal guest (Ubuntu 22.04). The OpenStack bug is being worked around by changing the test in the CI to wait until the guest is fully booted before trying to attach the volume. Version-Release number of selected component (if applicable): Host: * Operating system: Debian sid * Architecture: x86_64 * kernel version: 5.17.0-1-amd64 #1 (closed) SMP PREEMPT Debian 5.17.3-1 (2022-04-18) x86_64 GNU/Linux * libvirt version: 8.2.0-1 * Hypervisor and version: qemu-system-x86_64 1:7.0+dfsg-1 Guest: * Operating system: Ubuntu 22.04 (cloud image) How reproducible: If the guest OS boot is slowed down it is 100% reporducible Steps to Reproduce: 1. Modify the Ubuntu cloud guest image to have the boot_delay=100 added to the kernel args to simulate a slow host 2. Start the Ubuntu domain and connect to the serial console to see it boot 3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. 4. From a second terminal attach an additional disk to the guest. It succeeds. 5. Wait a second 6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds. 7. Check the domain XML, the disk is still attached 8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached. 9. Check the virsh domblklist output. The disk is still attached. 10. Try to detach the disk again. It fails with "Failed to detach disk error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug". Actual results: The disk cannot be detached even after the guest OS is fully booted. Retrying the detach always fails. Expected results: Either the disk is eventually detach from the guest after it is fully booted. Or the detach can be successfully retried from via libvirt / virsh Additional info: Please see the debug logs and detailed reproduction sequence in the upstream bug https://gitlab.com/libvirt/libvirt/-/issues/309
This to me looks like a thing that needs some work in QEMU since libvirt is trying to detach the device again, as requested. Looking at the linked issue it confirms my speculations. Therefore I am moving this to QEMU to further triage this.
Reproduce it on Red Hat Enterprise Linux release 9.0 (Plow) 5.14.0-70.13.1.el9_0.x86_64 qemu-kvm-6.2.0-11.el9_0.2.x86_64 seabios-bin-1.15.0-1.el9.noarch edk2-ovmf-20220126gitbb1bba3d77-3.el9.noarch Test steps: 1.Create image file if need qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg1.qcow2 1G 2.Boot vm /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 8G \ -object memory-backend-ram,size=8G,id=mem-machine_mem \ -smp 2 \ -cpu host,vmx,+kvm_pv_unhalt \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0 \ \ -blockdev node-name=file_stg1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/stg1.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_stg1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_stg1 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ \ -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \ -device virtio-net-pci,mac=9a:e1:e5:87:89:d2,id=idhDtYbt,netdev=id15e8Je,bus=pcie-root-port-5,addr=0x0 \ -netdev tap,id=id15e8Je,vhost=on \ -vnc :5 \ -monitor stdio \ -qmp tcp:0:5955,server,nowait \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=7 \ -chardev socket,id=charserial1,path=/var/tmp/run-serial.log,server=on,wait=off \ -device isa-serial,chardev=charserial1,id=serial1 \ 3.Sleep 3 seconds 4.execute qmp command to hot-plug/unplug disk {"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "bus": "pcie-root-port-3"}} {"execute":"device_del","arguments":{"id":"stg1"}} No any error on qmp command 5.wait for guest finish booting the login and check disk lsblk there is new disk found in guest. It expect the disk non-exist in guest 6.execute qmp command to unplug disk again {"execute":"device_del","arguments":{"id":"stg1"}} it get error return {"error": {"class": "GenericError", "desc": "Device stg1 is already in the process of unplug"}}
Can reproduce this bug with virtio-net-pci and virtio-blk-pci device on the latest rhel9.1.0 host with the test steps of Comment 2. host version: qemu-kvm-7.0.0-4.el9.x86_64 kernel-5.14.0-96.el9.x86_64 seabios-1.16.0-2.el9.x86_64 guest: rhel9.1.0 Test result: hot-plug/unplug virtio-net-pci device in qmp: { "execute": "netdev_add","arguments": { "type": "tap", "id": "hostnet0" } } { "execute": "device_add","arguments": { "driver": "virtio-net-pci", "id": "net1", "bus": "pcie-root-port-5", "mac": "52:54:00:12:34:56", "netdev": "hostnet0" } } { "execute": "device_del", "arguments": { "id": "net1" } }{"return": {}} {"return": {}} {"return": {}} { "execute": "device_del", "arguments": { "id": "net1" } } {"error": {"class": "GenericError", "desc": "Device net1 is already in the process of unplug"}} hot-plug/unplug virtio-blk-pci device in qmp: {"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "bus": "pcie-root-port-4"}} {"execute":"device_del","arguments":{"id":"stg1"}} {"return": {}} {"return": {}} {"execute":"device_del","arguments":{"id":"stg1"}} {"error": {"class": "GenericError", "desc": "Device stg1 is already in the process of unplug"}} Boot a guest with cmd: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 16G \ -object memory-backend-ram,size=16G,id=mem-machine_mem \ -smp 6,maxcpus=6,cores=2,threads=1,dies=1,sockets=3 \ -cpu Icelake-Server-noTSX,enforce \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/rhel9.1-seabios.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:5d:b0:f5:04:0f,id=idlokhzs,netdev=id4YbMcO,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=id4YbMcO,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio \ -S \ -qmp tcp:0:4444,server=on,wait=off \ -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \ -blockdev node-name=file_stg1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/test.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_stg1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_stg1 \ -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
IMHO The issue is the guest simply ignores the release requests for devices which he never learned it exists, it was added and remove requested before it initialized all devices+hotplug. The hotplug what we are using everywhere is just emulating the hotplug devs meant to be used for physical machines, where people are not expected to plug and remove a device at the first ms of the boot. If we really want to solve these kind of issues for once and for all, probably we should invent a new "cloud-plug" device named hotplug device for virtual machines. However some mitigation might be possible in same cases. - guest OS should acknowledge releasing devices what he never initialized (guest kernel modification) - guest kernel (requested by the init system?) should do another pci rescan to avoid not detected devices from the blind spot. The blind spot is between the pciscan and the hotplug initialization The feature expected from the cloud-plug device, if the guest os is not booted (yet) it simply allows to remove devices. The virtualization layer would know it is safe. So the guest os is expected to claim a device from the cloudplug in order to prevent removal, proper handshaking needed. The challenge here, is what to do with guests which does not supports the new "cloud-plug", probably we should just wait 3+/5+/.. years before we dare to try making it default expected.
*** Bug 2080893 has been marked as a duplicate of this bug. ***
We have been discusssing this regression upstream in the virtual OpenStack project team gathering (vPTG) i just wanted to pass on the feedback that this is still a pain point for us both upstream and in our downstream product. hopefully this is something that can be addressed with a higher priority. fell free to reach out to me as the User Advocate for the OpenStack compute team or to our pm Erwan Gallen <egallen> if you need additional information but this is still impacting our downstream product and affecting our upstream si stability.
Fix posted upstream: https://www.mail-archive.com/qemu-devel@nongnu.org/msg952944.html it's too late for merging into this release, but it should make into the next one. In nutshell, it was regression introduced in QEMU * v5.0 * 'pc' machine with ACPI hotplug * 'q35' native PCIe hotplug * v6.1 * + 'q35' with ACPI hotplug (default) Fixed in: * 6.2 'q35' native PCIe hotplug * TBD (8.1?): 'q35' and 'pc' ACPI hotplug (once it's merged upstream we can backport it) Need to look into SHPC one, which seems to be broken as well.
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.