2087047 – Disk detach is unsuccessful while the guest is still booting

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2087047 - Disk detach is unsuccessful while the guest is still booting

Summary: Disk detach is unsuccessful while the guest is still booting

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Igor Mammedov
QA Contact:	Yiqian Wei
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2080893 (view as bug list)
Depends On:
Blocks:	2012096 2186397 2203745
TreeView+	depends on / blocked

Reported:	2022-05-17 08:14 UTC by Balazs Gibizer
Modified:	2023-11-07 09:19 UTC (History)
CC List:	21 users (show)
Fixed In Version:	qemu-kvm-8.0.0-2.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2186397 2203745 (view as bug list)
Environment:
Last Closed:	2023-11-07 08:26:38 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	libvirt libvirt issues 309	None	opened	Disk detach is unsuccessfull while the guest is still booting	2022-05-17 08:28:49 UTC
Gitlab	redhat/centos-stream/src qemu-kvm merge_requests 159	None	opened	acpi: pcihp: allow repeating hot-unplug requests	2023-04-25 14:45:03 UTC
Launchpad	1960346	None	None	None	2022-07-04 15:44:01 UTC
Red Hat Issue Tracker	RHELPLAN-122320	None	None	None	2022-05-17 08:21:10 UTC
Red Hat Product Errata	RHSA-2023:6368	None	None	None	2023-11-07 08:27:42 UTC

Description Balazs Gibizer 2022-05-17 08:14:53 UTC

This bug report is based on the upstream bug: https://gitlab.com/libvirt/libvirt/-/issues/309 but I have updated the description and reproduction based on the discussion in the upstream report.

Description of problem:

If disk is detached from a guest while the guest OS is still booting then that disk get stuck. It seems that the detach succeeds from virsh perspective. But the disk is still visible both from the guest and from virsh as attached. However when the detach is retried, even after the guest OS is fully booted, it fails with "Failed to detach disk
error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug".

This was observed in OpenStack upstream CI with cirros 0.5.2 guest OS. But now reproduced without OpenStack with a more normal guest (Ubuntu 22.04). The OpenStack bug is being worked around by changing the test in the CI to wait until the guest is fully booted before trying to attach the volume.

Version-Release number of selected component (if applicable):

Host:

* Operating system: Debian sid
* Architecture: x86_64
* kernel version: 5.17.0-1-amd64 #1 (closed) SMP PREEMPT Debian 5.17.3-1 (2022-04-18) x86_64 GNU/Linux
* libvirt version: 8.2.0-1
* Hypervisor and version: qemu-system-x86_64 1:7.0+dfsg-1

Guest:

* Operating system: Ubuntu 22.04 (cloud image)

How reproducible:
If the guest OS boot is slowed down it is 100% reporducible

Steps to Reproduce:
1. Modify the Ubuntu cloud guest image to have the boot_delay=100 added to the kernel args to simulate a slow host
2. Start the Ubuntu domain and connect to the serial console to see it boot
3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start.
4. From a second terminal attach an additional disk to the guest. It succeeds.
5. Wait a second
6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds.
7. Check the domain XML, the disk is still attached
8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached.
9. Check the virsh domblklist output. The disk is still attached.
10. Try to detach the disk again. It fails with "Failed to detach disk error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug".

Actual results:
The disk cannot be detached even after the guest OS is fully booted. Retrying the detach always fails.

Expected results:
Either the disk is eventually detach from the guest after it is fully booted.
Or the detach can be successfully retried from via libvirt / virsh

Additional info:
Please see the debug logs and detailed reproduction sequence in the upstream bug https://gitlab.com/libvirt/libvirt/-/issues/309

Comment 1 Martin Kletzander 2022-05-17 10:21:40 UTC

This to me looks like a thing that needs some work in QEMU since libvirt is trying to detach the device again, as requested.  Looking at the linked issue it confirms my speculations.  Therefore I am moving this to QEMU to further triage this.

Comment 2 qing.wang 2022-05-23 09:10:54 UTC

Reproduce it on
Red Hat Enterprise Linux release 9.0 (Plow)
5.14.0-70.13.1.el9_0.x86_64
qemu-kvm-6.2.0-11.el9_0.2.x86_64
seabios-bin-1.15.0-1.el9.noarch
edk2-ovmf-20220126gitbb1bba3d77-3.el9.noarch


Test steps:

1.Create image file if need
qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg1.qcow2 1G

2.Boot vm

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35,memory-backend=mem-machine_mem \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 8G \
    -object memory-backend-ram,size=8G,id=mem-machine_mem  \
    -smp 2 \
  	-cpu host,vmx,+kvm_pv_unhalt \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0 \
    \
    -blockdev node-name=file_stg1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/stg1.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_stg1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_stg1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    \
    -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
    -device virtio-net-pci,mac=9a:e1:e5:87:89:d2,id=idhDtYbt,netdev=id15e8Je,bus=pcie-root-port-5,addr=0x0  \
    -netdev tap,id=id15e8Je,vhost=on \
    -vnc :5  \
    -monitor stdio \
    -qmp tcp:0:5955,server,nowait \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=7 \
    -chardev socket,id=charserial1,path=/var/tmp/run-serial.log,server=on,wait=off \
  	-device isa-serial,chardev=charserial1,id=serial1 \

3.Sleep 3 seconds 

4.execute qmp command to hot-plug/unplug disk

{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "bus": "pcie-root-port-3"}}

{"execute":"device_del","arguments":{"id":"stg1"}}

No any error on qmp command

5.wait for guest finish booting the login  and check disk
lsblk
there is new disk found in guest. It expect the disk non-exist in guest

6.execute qmp command to unplug disk again
{"execute":"device_del","arguments":{"id":"stg1"}}

it get error return
{"error": {"class": "GenericError", "desc": "Device stg1 is already in the process of unplug"}}

Comment 3 Yiqian Wei 2022-05-26 09:35:29 UTC

Can reproduce this bug with virtio-net-pci and virtio-blk-pci device on the latest rhel9.1.0 host with the test steps of Comment 2. 

host version:
qemu-kvm-7.0.0-4.el9.x86_64
kernel-5.14.0-96.el9.x86_64
seabios-1.16.0-2.el9.x86_64
guest: rhel9.1.0

Test result:
hot-plug/unplug virtio-net-pci device in qmp:
{ "execute": "netdev_add","arguments": { "type": "tap", "id": "hostnet0" } }
{ "execute": "device_add","arguments": { "driver": "virtio-net-pci", "id": "net1", "bus": "pcie-root-port-5", "mac": "52:54:00:12:34:56", "netdev": "hostnet0" } }
{ "execute": "device_del", "arguments": { "id": "net1" } }{"return": {}}
{"return": {}}
{"return": {}}

{ "execute": "device_del", "arguments": { "id": "net1" } }
{"error": {"class": "GenericError", "desc": "Device net1 is already in the process of unplug"}}


hot-plug/unplug virtio-blk-pci device in qmp:
{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "bus": "pcie-root-port-4"}}
{"execute":"device_del","arguments":{"id":"stg1"}}
{"return": {}}
{"return": {}}

{"execute":"device_del","arguments":{"id":"stg1"}}
{"error": {"class": "GenericError", "desc": "Device stg1 is already in the process of unplug"}}


Boot a guest with cmd:
 /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35,memory-backend=mem-machine_mem \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 16G \
    -object memory-backend-ram,size=16G,id=mem-machine_mem  \
    -smp 6,maxcpus=6,cores=2,threads=1,dies=1,sockets=3  \
    -cpu Icelake-Server-noTSX,enforce \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/rhel9.1-seabios.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:5d:b0:f5:04:0f,id=idlokhzs,netdev=id4YbMcO,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=id4YbMcO,vhost=on   \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \
    -S \
    -qmp tcp:0:4444,server=on,wait=off \
    -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
    -blockdev node-name=file_stg1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/test.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_stg1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_stg1 \
    -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \

Comment 5 Attila Fazekas 2022-08-03 17:51:16 UTC

IMHO The issue is the guest simply ignores the release requests for devices which he never learned it exists,
it was added and remove requested before it initialized all devices+hotplug. 

The hotplug what we are using everywhere is just emulating the hotplug devs meant to be used for physical machines, where
people are not expected to plug and remove a device at the first ms of the boot.

If we really want to solve these kind of issues for once and for all,
probably we should invent a new "cloud-plug" device named hotplug device for virtual machines.

However some mitigation might be possible in same cases.
 - guest OS should acknowledge releasing devices what he never initialized (guest kernel modification)
 - guest kernel (requested by the init system?) should do another pci rescan to avoid not detected devices from the blind spot.
    The blind spot is between the pciscan and the hotplug initialization

The feature expected from the cloud-plug device, if the guest os is not booted (yet) it simply allows to remove devices. The virtualization layer would know it is safe.
So the guest os is expected to claim a device from the cloudplug in order to prevent removal, proper handshaking needed.
The challenge here, is what to do with guests which does not supports the new "cloud-plug",
probably we should just wait 3+/5+/.. years before we dare to try making it default expected.

Comment 6 Asma Syed Hameed 2022-09-07 05:39:45 UTC

*** Bug 2080893 has been marked as a duplicate of this bug. ***

Comment 9 smooney 2023-03-30 11:13:41 UTC

We have been discusssing this regression upstream in the virtual OpenStack project team gathering (vPTG)
i just wanted to pass on the feedback that this is still a pain point for us both upstream and in our downstream product.
hopefully this is something that can be addressed with a higher priority.

fell free to reach out to me as the User Advocate for the OpenStack compute team or to our pm Erwan Gallen <egallen> if you need
additional information but this is still impacting our downstream product and affecting our upstream si stability.

Comment 11 Igor Mammedov 2023-04-06 08:34:23 UTC

Fix posted upstream: 
 https://www.mail-archive.com/qemu-devel@nongnu.org/msg952944.html
it's too late for merging into this release, but it should make into the next one.

In nutshell, it was regression introduced in QEMU
  * v5.0
      * 'pc' machine with ACPI hotplug
      * 'q35' native PCIe hotplug
  * v6.1
      * + 'q35' with ACPI hotplug (default)
Fixed in:
  * 6.2 'q35' native PCIe hotplug
  * TBD (8.1?): 'q35' and 'pc' ACPI hotplug
       (once it's merged upstream we can backport it)

Need to look into SHPC one, which seems to be broken as well.

Comment 19 Yanan Fu 2023-05-09 11:35:50 UTC

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 27 errata-xmlrpc 2023-11-07 08:26:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368

Note You need to log in before you can comment on or make changes to this bug.

afazekas
ailan
apevec
asyedham
coli
hhan
imammedo
jinzhao
jparker
jsuvorov
jusual
juzhang
kchamart
kwolf
nilal
smooney
virt-maint
xuzhang
yalzhang
yiwei
ymankad