Description of problem: Get libvirtError when detach ceph disk in OSP env Version-Release number of selected component (if applicable): OSP16.1: openstack-nova-compute-20.3.1-0.20200626213434.38ee1f3.el8ost.noarch RHEL-AV 8.3.0: libvirt-daemon-kvm-6.6.0-4.module+el8.3.0+7883+3d717aa8.x86_64 qemu-kvm-core-5.1.0-6.module+el8.3.0+8041+42ff16b8.x86_64 kernel: 4.18.0-236.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Deploy OSP16.1 with ceph storage, update the nova_libvirt, nova_compute images to RHEL-AV 8.3.0, re-deploy OSP with updated images, update kernel to rhel 8.3.0 on compute node 2. Create a image, flavor, network, create VM from the image, check the VM xml: guest.xml ---------------------------------------------------------------- <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='98a93f05-f5c9-4b6c-974e-4c901dc614bf'/> </auth> <source protocol='rbd' name='vms/3eb97e06-a772-4071-8fbe-4bdaad4294fe_disk' index='1'> <host name='172.17.3.57' port='6789'/> </source> <target dev='vda' bus='virtio'/> <iotune> <read_bytes_sec>1024000</read_bytes_sec> <write_bytes_sec>1024000</write_bytes_sec> <read_iops_sec>1000</read_iops_sec> <write_iops_sec>1000</write_iops_sec> </iotune> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </disk> 3. Create another 5 volumes in OSP (overcloud) [stack@undercloud-0 ~]$ openstack volume list +--------------------------------------+-------------------+-----------+------+- | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------------+-----------+------+- | a267ab71-7b44-4614-ad49-aa6cf00070c6 | asb-hotplugvol-5 | available | 10 | | | 38fed49b-35ea-4d85-8d57-c3af858514d4 | asb-hotplugvol-4 | available | 10 | | | 04895b6a-4135-4266-897a-5c19970b419f | asb-hotplugvol-3 | available | 10 | | | 3218f110-8fad-4c0e-a66d-06508959ceb0 | asb-hotplugvol-2 | available | 10 | | | 70ffe8d7-3a2e-4e4c-a52e-257b11bc6d97 | asb-hotplugvol-1 | available | 10 | | | 53a1e3f2-1cda-47a0-9eab-72ca14293c32 | asb-8.3-qcow2-vol | in-use | 10 | Attached to asb-vm-8.3-qcow2-vol on /dev/vda | +--------------------------------------+-------------------+-----------+------+- 4. Hotplug 5 volumes to VM (overcloud) [stack@undercloud-0 ~]$ for vol_id in $(openstack volume list| grep available| awk '{print $2}'); do sleep 30; nova volume-attach asb-vm-8.3-qcow2-img $vol_id; done (overcloud) [stack@undercloud-0 ~]$ openstack volume list +--------------------------------------+-------------------+--------+------+-----------------------------------------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------------+--------+------+-----------------------------------------------+ | a267ab71-7b44-4614-ad49-aa6cf00070c6 | asb-hotplugvol-5 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vdb | | 38fed49b-35ea-4d85-8d57-c3af858514d4 | asb-hotplugvol-4 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vdc | | 04895b6a-4135-4266-897a-5c19970b419f | asb-hotplugvol-3 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vdd | | 3218f110-8fad-4c0e-a66d-06508959ceb0 | asb-hotplugvol-2 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vde | | 70ffe8d7-3a2e-4e4c-a52e-257b11bc6d97 | asb-hotplugvol-1 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vdf | | 53a1e3f2-1cda-47a0-9eab-72ca14293c32 | asb-8.3-qcow2-vol | in-use | 10 | Attached to asb-vm-8.3-qcow2-vol on /dev/vda | +--------------------------------------+-------------------+--------+------+-----------------------------------------------+ 5. Login to guest, check there are 6 disks:/dev/vda-f 6. Login to nova_libvirt container, check the there are 6 disks: -------------------------------------------------------------------------- ()[root@compute-1 /]# virsh dumpxml instance-00000003|grep "<disk" -A 5 <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='98a93f05-f5c9-4b6c-974e-4c901dc614bf'/> </auth> <source protocol='rbd' name='vms/3eb97e06-a772-4071-8fbe-4bdaad4294fe_disk' index='1'> -- <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='98a93f05-f5c9-4b6c-974e-4c901dc614bf'/> </auth> <source protocol='rbd' name='volumes/volume-a267ab71-7b44-4614-ad49-aa6cf00070c6' index='2'> -- <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='98a93f05-f5c9-4b6c-974e-4c901dc614bf'/> </auth> <source protocol='rbd' name='volumes/volume-38fed49b-35ea-4d85-8d57-c3af858514d4' index='3'> -- <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='98a93f05-f5c9-4b6c-974e-4c901dc614bf'/> </auth> <source protocol='rbd' name='volumes/volume-04895b6a-4135-4266-897a-5c19970b419f' index='4'> -- <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='98a93f05-f5c9-4b6c-974e-4c901dc614bf'/> </auth> <source protocol='rbd' name='volumes/volume-3218f110-8fad-4c0e-a66d-06508959ceb0' index='5'> -- <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='openstack'> <secret type='ceph' uuid='98a93f05-f5c9-4b6c-974e-4c901dc614bf'/> </auth> <source protocol='rbd' name='volumes/volume-70ffe8d7-3a2e-4e4c-a52e-257b11bc6d97' index='6'> ---------------------------------------------------------------------- 7. Detach 5 disks from the guest in OSP (overcloud) [stack@undercloud-0 ~]$ for vol_id in $(openstack volume list| grep hotplugvol| awk '{print $2}'); do sleep 30; nova volume-detach asb-vm-8.3-qcow2-img $vol_id; done 8. Check the volumes' status in OSP, there are still in-use (overcloud) [stack@undercloud-0 ~]$ openstack volume list +--------------------------------------+-------------------+--------+------+-----------------------------------------------+ | ID | Name | Status | Size | Attached to | +--------------------------------------+-------------------+--------+------+-----------------------------------------------+ | a267ab71-7b44-4614-ad49-aa6cf00070c6 | asb-hotplugvol-5 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vdb | | 38fed49b-35ea-4d85-8d57-c3af858514d4 | asb-hotplugvol-4 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vdc | | 04895b6a-4135-4266-897a-5c19970b419f | asb-hotplugvol-3 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vdd | | 3218f110-8fad-4c0e-a66d-06508959ceb0 | asb-hotplugvol-2 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vde | | 70ffe8d7-3a2e-4e4c-a52e-257b11bc6d97 | asb-hotplugvol-1 | in-use | 10 | Attached to asb-vm-8.3-qcow2-img on /dev/vdf | | 53a1e3f2-1cda-47a0-9eab-72ca14293c32 | asb-8.3-qcow2-vol | in-use | 10 | Attached to asb-vm-8.3-qcow2-vol on /dev/vda | +--------------------------------------+-------------------+--------+------+-----------------------------------------------+ 9. Login to the guest, check the disks are detached 10. Check the libvirt xml, the disks are detached 11. Check the libvirtd log, has error below, but the disks are detached at last. --------------------------------------------------------------------- 2020-09-14 07:29:25.032+0000: 4501: error : qemuMonitorJSONCheckErrorFull:416 : internal error: unable to execute QEMU command 'device_del': Device virtio-disk1 is already in the process of unplug ---------------------------------------------------------------------- 12. Check in nova-compute log, has similar error, the disk are not detached in OSP ---------------------------------------------------------------------- 2020-09-14 00:38:08.340 8 ERROR nova.virt.libvirt.driver [req-86dd3be3-8395-4551-85ba-55fe631c8613 56324868f60749f0bc1f1b25fa90d92d 593476b1fecf432795d1cec96e5954f9 - default default] [instance: 3eb97e06-a772-4071-8fbe-4bdaad4294fe] detaching network adapter failed.: libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device net0 is already in the process of unplug ----------------------------------------------------------------------- Actual results: In step7,8, failed to detach the disks Expected results: In step7,8, detach the disk in OSP successfully Additional info: - libvirtd.log, nova-compute.log, guest xml, image/flavor metadata - It seems the libvirt error message cause the osp detach disk failed. But this error message don't block libvirt detach process.
Created attachment 1714744 [details] log and xml and image,flavor metadata
Additional info: When detach volumes from dashboard, no error in libvirtd.log and nova-compute.log, detach the volume successfully. - Login to dashboard, click Project->Compute->Instances: select the VM: "asb-vm-8.3-qcow2-img" -> Detach Volume: selete the volumes one by one -> Click "Detach Volume".
I don't think that's a libvirt bug, because of following reasons: 1. The error "unable to execute QEMU command 'device_del': Device XXXX is already in the process of unplug "only happens at the another hot-unplug of the same device just after a hot-unplug. Did you detach a device before the previous detachment of the same device finished? If so, is it a valid scenario? If not, it is nova's fault to detach it twice. 2. The detach device API in libvirt works asynchronously (see https://gitlab.com/libvirt/libvirt/-/blob/master/src/libvirt-domain.c#L8297). It means it's the uplayer's responsibility to check if the detach is finished(by the detach event of libvirt) I can reproduce the error by a python script: In [1]: import libvirt In [5]: conn=libvirt.open("qemu+ssh://root@XXXX/system") In [6]: dom=conn.lookupByName("test") In [10]: xml=open("/tmp/test.xml").read() In [11]: dom.detachDeviceFlags(xml) Out[11]: 0 In [12]: dom.detachDeviceFlags(xml) libvirt: QEMU Driver error : internal error: unable to execute QEMU command 'device_del': Device virtio-disk1 is already in the process of unplug --------------------------------------------------------------------------- libvirtError Traceback (most recent call last) <ipython-input-12-258037eff159> in <module> ----> 1 dom.detachDeviceFlags(xml) /usr/lib64/python3.6/site-packages/libvirt.py in detachDeviceFlags(self, xml, flags) 1406 may lead to unexpected results. """ 1407 ret = libvirtmod.virDomainDetachDeviceFlags(self._o, xml, flags) -> 1408 if ret == -1: raise libvirtError ('virDomainDetachDeviceFlags() failed', dom=self) 1409 return ret 1410 libvirtError: internal error: unable to execute QEMU command 'device_del': Device virtio-disk1 is already in the process of unplug XML: In [13]: cat /tmp/test.xml <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/tmp/test1' index='3'/> <backingStore/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </disk> BTW, the error is only about detach device. No business with ceph
Hi, Han Han Yes, the nova command line caused this error, as hit this error during detaching disk for the first time in OSP16.1 env by nova command line, and operate in OSP16.1 dash board don't hit this error. I filed bug on libvirt first and want to see if the libvirt error below should be an "error", or other level message. It seems that this libvirt error message block the osp detach disk process, but don't block libvirt detach process. If libvirt would like to keep this message as "error", I'll change the component to "nova-compute". ----------------------------------------------------------------------------------------------------------------------------------------------- libvirt: QEMU Driver error : internal error: unable to execute QEMU command 'device_del': Device virtio-disk1 is already in the process of unplug ------------------------------------------------------------------------------------------------------------------------------------------------
Hit the same error on OSP16.1 env (cinder with nfs storage), when hotunplug disks/interfaces. Version-Release number of selected component (if applicable): OSP16.1: - openstack-nova-compute-20.3.1-0.20200626213434.38ee1f3.el8ost.noarch RHEL-AV 8.3.0: - libvirt-daemon-kvm-6.6.0-5.module+el8.3.0+8092+f9e72d7e.x86_64 - qemu-kvm-core-5.1.0-7.module+el8.3.0+8099+dba2fe3e.x86_64 - kernel: 4.18.0-237.el8.x86_64 Steps as "Description" part. logs in file: libvirtd-nova-compute-log.tgz
Created attachment 1715202 [details] libvirtd.log, nova-compute.log
Remove ceph from since it is not ceph specific. See comment3
This looks like either a very unfortunate race condition or something get's stuck in qemu. The timing is following: 1 attempt: 2020-09-14 07:29:19.999+0000: 7966: info : qemuMonitorIOWrite:433 : QEMU_MONITOR_IO_WRITE: mon=0x7fe44c03a5b0 buf={"execute":"device_del","arguments":{"id":"virtio-disk1"},"id":"libvirt-399"} len=79 ret=79 errno=0 2020-09-14 07:29:20.002+0000: 7966: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fe44c03a5b0 reply={"return": {}, "id": "libvirt-399"} (libvirt waits 5 seconds to get the DEVICE_DETACHED) event until it gives up. The qom-list is done after the timeout) 2020-09-14 07:29:25.003+0000: 7966: info : qemuMonitorIOWrite:433 : QEMU_MONITOR_IO_WRITE: mon=0x7fe44c03a5b0 buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-400"} len=86 ret=86 errno=0 2020-09-14 07:29:25.007+0000: 7966: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fe44c03a5b0 reply={"return": [{"name": "type", "type": "string"}, {"name": "watchdog0", "type": "child<i6300esb>"}, {"name": "pci.9", "type": "child<pcie-root-port>"}, {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name": "input0", "type": "child<usb-tablet>"}, {"name": "pci.8", "type": "child<pcie-root-port>"}, {"name": "pci.7", "type": "child<pcie-root-port>"}, {"name": "pci.6", "type": "child<pcie-root-port>"}, {"name": "pci.5", "type": "child<pcie-root-port>"}, {"name": "pci.4", "type": "child<pcie-root-port>"}, {"name": "pci.3", "type": "child<pcie-root-port>"}, {"name": "pci.2", "type": "child<pcie-root-port>"}, {"name": "pci.1", "type": "child<pcie-root-port>"}, {"name": "pci.13", "type": "child<pcie-root-port>"}, {"name": "virtio-disk3", "type": "child<virtio-blk-pci>"}, {"name": "virtio-disk5", "type": "child<virtio-blk-pci>"}, {"name": "virtio-disk4", "type": "child<virtio-blk-pci>"}, {"name": "rng0", "type": "child<virtio-rng-pci>"}, {"name": "virtio-disk2", "type": "child<virtio-blk-pci>"}, {"name": "balloon0", "type": "child<virtio-balloon-pci>"}, {"name": "virtio-disk0", "type": "child<virtio-blk-pci>"}, {"name": "virtio-disk1", "type": "child<virtio-blk-pci>"}, {"name": "pci.18", "type": "child<pcie-pci-bridge>"}, {"name": "pci.17", "type": "child<pcie-root-port>"}, {"name": "pci.16", "type": "child<pcie-root-port>"}, {"name": "pci.15", "type": "child<pcie-root-port>"}, {"name": "pci.14", "type": "child<pcie-root-port>"}, {"name": "net0", "type": "child<virtio-net-pci>"}, {"name": "pci.12", "type": "child<pcie-root-port>"}, {"name": "pci.11", "type": "child<pcie-root-port>"}, {"name": "pci.10", "type": "child<pcie-root-port>"}, {"name": "usb", "type": "child<qemu-xhci>"}, {"name": "channel0", "type": "child<virtserialport>"}, {"name": "serial0", "type": "child<isa-serial>"}, {"name": "scsi0", "type": "child<virtio-scsi-pci>"}, {"name": "video0", "type": "child<cirrus-vga>"}], "id": "libvirt-400"} Openstack then retries right away a second time. We re-issue the command as device_del requests may be ignored: 2020-09-14 07:29:25.029+0000: 7966: info : qemuMonitorIOWrite:433 : QEMU_MONITOR_IO_WRITE: mon=0x7fe44c03a5b0 buf={"execute":"device_del","arguments":{"id":"virtio-disk1"},"id":"libvirt-401"} len=79 ret=79 errno=0 2020-09-14 07:29:25.032+0000: 7966: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"id": "libvirt-401", "error": {"class": "GenericError", "desc": "Device virtio-disk1 is already in the process of unplug"}}] 2020-09-14 07:29:25.032+0000: 7966: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fe44c03a5b0 reply={"id": "libvirt-401", "error": {"class": "GenericError", "desc": "Device virtio-disk1 is already in the process of unplug"}} qemu then reports this failure. Moving to qemu for further investigation.
Created attachment 1758814 [details] detach_interface_log.tgz
Short story: looks like there is a QEMU bug to fix. Long story follows. The "already in the process of unplug" error is set in qmp_device_del() when the device exists and has pending_deleted_event set. Goes back to upstream commit cce8944cc9 "qdev-monitor: Forbid repeated device_del". But what does pending_deleted_event mean? Member pending_deleted_event was introduced in upstream commit 352e8da743 "qdev: correctly send DEVICE_DELETED for recursively-deleted devices". Its purpose then was to mark devices that require sending a DEVICE_DELETED event on unparent. Straightforward. Commit c000a9bd06 "pci: mark device having guest unplug request pending" and commit 9711cd0dfc "net/virtio: add failover support" reused the member. The former commit sets ->pending_deleted_event in HotplugHandlerClass unplug_request method pcie_cap_slot_unplug_request_cb() of device TYPE_PCIE_SLOT, and clears it in pcie_unplug_device(), but only if ->partially_hotplugged. The commit message explains: Set pending_deleted_event in DeviceState for failover primary devices that were successfully unplugged by the Guest OS. I'm afraid I'm not getting it. The commit message says "for failover primary devices", but I can't see that in the code. Why do we want to set it in the first place, and why do we want to clear it again? How do the two new assignments work together with the existing two assignments and the existing guard around qapi_event_send_device_deleted()? The latter commit adds another test in new function primary_unplug_pending(): return n->primary_dev ? n->primary_dev->pending_deleted_event : false; I asked the people involved in these commits for advice, and Paolo Bonzini responded right away. He thinks the reuse of pending_deleted_event in these two commits is wrong, and suggested ways to fix it. Awesome, thanks, Paolo! I'll find the right person to fix it.
And here is my comment for posterity: pending_deleted_event was not the right flag to use in any of these commits. The purpose and meaning of ->pending_deleted_event should be the original one, and everything else is a bug. The simplest fix would be to add a new bool unplug_requested and set it in all the places that were touched by Jens and Julia's patches. But there would be many more places where the flag should be set it, and yet other places where the flag is cleared (for example the aforementioned case of cold reset). Keeping the flag in sync everywhere between the device and the HotplugHandler would be unmaintainable, so I'm not suggesting this approach. A better possibility is to add a new *bool-returning function pointer* unplug_requested to HotplugHandlerClass, and then qmp_device_del would query the device via a function like bool device_unplug_requested(DeviceState *dev) { HotplugHandler *hp = qdev_get_hotplug_handler(dev); return dev->pending_deleted_event || (hp && hotplug_handler_unplug_requested(hp, dev)); } and presumably the code from commit 9711cd0dfc would be adjusted to use it as well. The implementations wouldn't be hard to write using all the unplug_request_cb functions as a guideline. For example, the PCIe implementation would check the attention button state. It's still all but trivial to test the implementations, but my hunch is that it'd fix more bugs than it would introduce them.
I've reopened the OSP bug, there was no reference to this underlying QEMU issue in the original report and the reporter didn't reopen the bug when updating it after it had already been closed out due to INSUFFICIENT_DATA. Apologies but I couldn't decipher what the plan actually is at a high level within QEMU to handle this from c#14 and c#15. Will QEMU ignore repeat calls to device_del or continue to raise an error to libvirt? I ask as that obviously changes how we resolve this on the OpenStack Nova side. My initial feeling being that we can ignore these errors and allow our pretty basic retry logic to cycle again if the device is still present next time around. FWIW we are planning on replacing the current dumb retry detach logic within OpenStack Nova with a libvirt events based flow when detaching devices but that isn't going to be backported to OSP 16.2 as used in this bug.
(In reply to Lee Yarwood from comment #16) [...] > Apologies but I couldn't decipher what the plan actually is at a high level > within QEMU to handle this from c#14 and c#15. Will QEMU ignore repeat calls > to device_del or continue to raise an error to libvirt? While commit 1) cce8944cc9efa qdev-monitor: Forbid repeated device_del abuses pending_deleted_event that was abused by commit 9711cd0dfc "net/virtio: add failover support" like it was pointed out in comment 14 and comment 15 suggest a way to stop that abuse, that won't change the way unplug behaves now, QEMU will continue to throw error. Duplicate device_del shouldn't cancel unplug in guest and I'd say users should be notified that device is being (might be) unplugged when guest decides to do it (unplug time depends on guest's drivers, assuming it would wish to unplug at all, i.e time range is [soon:never). > I ask as that obviously changes how we resolve this on the OpenStack Nova > side. My initial feeling being that we can ignore these errors and allow our > pretty basic retry logic to cycle again if the device is still present next > time around. > > FWIW we are planning on replacing the current dumb retry detach logic within > OpenStack Nova with a libvirt events based flow when detaching devices but > that isn't going to be backported to OSP 16.2 as used in this bug. It should be safe to ignore this particular error and repeat device_del. Will it work for your usecase?
(In reply to Igor Mammedov from comment #17) > (In reply to Lee Yarwood from comment #16) > [...] > > Apologies but I couldn't decipher what the plan actually is at a high level > > within QEMU to handle this from c#14 and c#15. Will QEMU ignore repeat calls > > to device_del or continue to raise an error to libvirt? > > While commit > 1) cce8944cc9efa qdev-monitor: Forbid repeated device_del > abuses pending_deleted_event that was abused by commit > 9711cd0dfc "net/virtio: add failover support" > like it was pointed out in comment 14 and comment 15 suggest a way to > stop that abuse, that won't change the way unplug behaves now, > QEMU will continue to throw error. > > Duplicate device_del shouldn't cancel unplug in guest and I'd say > users should be notified that device is being (might be) unplugged when > guest decides to do it (unplug time depends on guest's drivers, assuming > it would wish to unplug at all, i.e time range is [soon:never). ACK thanks understood. > > I ask as that obviously changes how we resolve this on the OpenStack Nova > > side. My initial feeling being that we can ignore these errors and allow our > > pretty basic retry logic to cycle again if the device is still present next > > time around. > > > > FWIW we are planning on replacing the current dumb retry detach logic within > > OpenStack Nova with a libvirt events based flow when detaching devices but > > that isn't going to be backported to OSP 16.2 as used in this bug. > > It should be safe to ignore this particular error and repeat device_del. > Will it work for your usecase? Yes that's the way we've gone for now here and will backport to 16.2: libvirt: Ignore device already in the process of unplug errors https://review.opendev.org/c/openstack/nova/+/785682 This will then be replaced by the following libvirt event based approach shortly for future versions: Replace blind retry with libvirt event waiting in detach https://review.opendev.org/c/openstack/nova/+/770246
Closing BZ as QEMU works the way it's expected in this case. PS: comment 15 is unrelated issue, and should be fixed upstream