RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 805461 - the windows guest hang and can't resume the VM after hot-unplug the data disk.
Summary: the windows guest hang and can't resume the VM after hot-unplug the data disk.
Keywords:
Status: CLOSED DUPLICATE of bug 751700
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-21 11:19 UTC by Sibiao Luo
Modified: 2012-07-31 07:17 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-31 07:17:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sibiao Luo 2012-03-21 11:19:50 UTC
Description of problem:
hut-plug a virtio data disk to guest, and then initialize and formate it in the guest, after that hot-unplug the data disk(remove the drive firstly, then remov the device), the guest hang and can't resume the guest any more, the QEMU's reaction is unacceptable. 

Version-Release number of selected component (if applicable):
host info:
2.6.32-251.el6.x86_64
qemu-kvm-0.12.1.2-2.249.el6.x86_64
Seabios:seabios-0.6.1.2-12.el6
virtio-win:virtio-win-prewhql-0.1-24 
guest info:
win7sp1-64

How reproducible:
100%

Steps to Reproduce:
1.boot a guest.
# /usr/libexec/qemu-kvm -M rhel6.2.0 -cpu Penryn -enable-kvm -m 2048 -smp 2,sockets=1,cores=2,threads=1 -usbdevice tablet -name win7-sp1-64 -uuid `uuidgen` -drive file=win7sp1-virtio-64.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,sndbuf=0,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=no -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=05:1a:4a:32:0b:26,bus=pci.0,addr=0x3,bootindex=2 -spice disable-ticketing,port=5931 -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -monitor stdio
2.hut-plug a virtio data disk to guest.
(qemu) __com.redhat_drive_add file=/home/my_qcow2_disk.qcow2,format=qcow2,id=drive-virtio-disk1,cache=none,werror=stop,rerror=stop
(qemu) device_add virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=my_disk
(qemu) info pci
...
Bus 0, device 7, function 0:
SCSI controller: PCI device 1af4:1001
IRQ 0.
BAR0: I/O at 0xffc0 [0xffff].
BAR1: 32 bit memory at 0xfebff000 [0xfebfffff].
id "my_disk"
3.initialize and formate the virtio data disk in the guest.
computer---->manage---->computer management---->disk management
4.hot-unplug the virtio data disk.
(qemu) __com.redhat_drive_del drive-virtio-disk1
(qemu) info pci
...
Bus 0, device 7, function 0:
SCSI controller: PCI device 1af4:1001
IRQ 0.
BAR0: I/O at 0xffc0 [0xffff].
BAR1: 32 bit memory at 0xfebff000 [0xfebfffff].
id "my_disk"
(qemu) device_del my_disk
(qemu) block I/O error in device '': Input/output error (5)
handle_dev_stop: stop
5.resume the guest.
 
Actual results:
after the step 5, the guest still hang, and can't resume the VM any more.
(qemu) cont
(qemu) block I/O error in device '': Input/output error (5)
handle_dev_stop: stop 

Expected results:
we should can resume the VM by "cont", and we can continue the VM.

Additional info:
If i remove the device directly using "(qemu) device_del my_disk" in the step 4, it can successfully and the VM not hang. I think my removing the drive first is wrong indeed in the step 4, but the QEMU's reaction to it is still bad, we should can resume the VM.

Comment 2 Kevin Wolf 2012-03-21 11:35:24 UTC
Windows seems to be trying to unmount the drive when doing the device_del, but as the qemu block device was removed already by drive_del, trying to write to it results in an I/O error. With werror=stop this means that the VM is stopped. You can't ever get the VM back to run because 'cont' resubmits the request, which will obviously fail again.

Possible solution for this specific case would be to reset or ignore werror when the BlockDriverState is closed. However, there seems to be a more general problem with non-recoverable I/O errors.

Comment 3 Markus Armbruster 2012-03-21 12:13:52 UTC
Removing the backend (__com.redhat_drive_del) before the device (device_del) is nasty.  The physical equivalent would be to first hit the disk with a hammer, then push the unplug button.  The unplug button asks the OS nicely to give up the disk, but since you first hammered it dead, the OS won't be happy.

Recommended usage is to unplug first.  Only if the unplug doesn't succeed within a reasonable time (most likely because the guest OS doesn't cooperate) should you switch to the __com.redhat_drive_del hammer.

Regardless, we probably need to handle persistent block I/O errors more gracefully.

Comment 4 Ronen Hod 2012-03-22 06:24:20 UTC
How about adding device_del at the beginning of drive_del?

Comment 5 Sibiao Luo 2012-03-22 06:57:14 UTC
(In reply to comment #4)
> How about adding device_del at the beginning of drive_del?

Hi rhod,

  If i remove the device directly using "(qemu) device_del $device_id", it can remove the device and drive successfully, and the VM do not hang. I know my removing is wrong indeed in the step 4, but the guest hang and qemu's reaction are unacceptable.

Best wishes.

Comment 6 Markus Armbruster 2012-03-22 07:41:23 UTC
Adding device_del at the beginning of drive_del won't do, I'm afraid.  Let me explain.

device_del's behavior depends on the bus.

With some buses, such as USB, it unplugs the device immediately, no questions asked.

With other buses it merely initiates the unplug.  For instance, with PCI, it kicks off the ACPI hot unplug dance, which goes through a series of steps involving device model, guest BIOS, guest OS.  Takes an indeterminate time to complete, and it needn't complete at all.  In particular, if the guest doesn't have an ACPI driver, it takes forever without any notification to the device model.  Same if it absolutely cannot give up the device, say because it got its root partition there.

If you simply do a device_del at the beginning of drive_del, the unplug dance races with drive_del, and if it loses the race, you got the nasty drive_del before unplug scenario again.

That's why I wrote: Only if the unplug doesn't succeed within a reasonable time (most likely because the guest OS doesn't cooperate) should you switch to the __com.redhat_drive_del hammer.

Comment 7 Markus Armbruster 2012-03-22 07:42:24 UTC
The reporter is right: we need to handle this error more gracefully.

Comment 8 Qunfang Zhang 2012-05-08 05:56:09 UTC
Is this a duplicate of Bug 751700 - "block I/O error" while hot unplug a virtio disk?

Comment 9 Asias He 2012-07-16 08:22:39 UTC
I have fixed the hot un-plug problem in upstream kernel and will backport to RHEL when it hit linus's tree without any changes to qemu-kvm. 

http://lists.linuxfoundation.org/pipermail/virtualization/2012-June/020173.html

I thinks we need some change in windows driver as well. So, reassign this bug to Vadim Rozenfeld.

Comment 10 Vadim Rozenfeld 2012-07-31 07:17:28 UTC
(In reply to comment #8)
> Is this a duplicate of Bug 751700 - "block I/O error" while hot unplug a
> virtio disk?

Yes.

*** This bug has been marked as a duplicate of bug 751700 ***


Note You need to log in before you can comment on or make changes to this bug.