Hide Forgot
Description of problem: hut-plug a virtio data disk to guest, and then initialize and formate it in the guest, after that hot-unplug the data disk(remove the drive firstly, then remov the device), the guest hang and can't resume the guest any more, the QEMU's reaction is unacceptable. Version-Release number of selected component (if applicable): host info: 2.6.32-251.el6.x86_64 qemu-kvm-0.12.1.2-2.249.el6.x86_64 Seabios:seabios-0.6.1.2-12.el6 virtio-win:virtio-win-prewhql-0.1-24 guest info: win7sp1-64 How reproducible: 100% Steps to Reproduce: 1.boot a guest. # /usr/libexec/qemu-kvm -M rhel6.2.0 -cpu Penryn -enable-kvm -m 2048 -smp 2,sockets=1,cores=2,threads=1 -usbdevice tablet -name win7-sp1-64 -uuid `uuidgen` -drive file=win7sp1-virtio-64.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,sndbuf=0,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=no -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=05:1a:4a:32:0b:26,bus=pci.0,addr=0x3,bootindex=2 -spice disable-ticketing,port=5931 -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -monitor stdio 2.hut-plug a virtio data disk to guest. (qemu) __com.redhat_drive_add file=/home/my_qcow2_disk.qcow2,format=qcow2,id=drive-virtio-disk1,cache=none,werror=stop,rerror=stop (qemu) device_add virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=my_disk (qemu) info pci ... Bus 0, device 7, function 0: SCSI controller: PCI device 1af4:1001 IRQ 0. BAR0: I/O at 0xffc0 [0xffff]. BAR1: 32 bit memory at 0xfebff000 [0xfebfffff]. id "my_disk" 3.initialize and formate the virtio data disk in the guest. computer---->manage---->computer management---->disk management 4.hot-unplug the virtio data disk. (qemu) __com.redhat_drive_del drive-virtio-disk1 (qemu) info pci ... Bus 0, device 7, function 0: SCSI controller: PCI device 1af4:1001 IRQ 0. BAR0: I/O at 0xffc0 [0xffff]. BAR1: 32 bit memory at 0xfebff000 [0xfebfffff]. id "my_disk" (qemu) device_del my_disk (qemu) block I/O error in device '': Input/output error (5) handle_dev_stop: stop 5.resume the guest. Actual results: after the step 5, the guest still hang, and can't resume the VM any more. (qemu) cont (qemu) block I/O error in device '': Input/output error (5) handle_dev_stop: stop Expected results: we should can resume the VM by "cont", and we can continue the VM. Additional info: If i remove the device directly using "(qemu) device_del my_disk" in the step 4, it can successfully and the VM not hang. I think my removing the drive first is wrong indeed in the step 4, but the QEMU's reaction to it is still bad, we should can resume the VM.
Windows seems to be trying to unmount the drive when doing the device_del, but as the qemu block device was removed already by drive_del, trying to write to it results in an I/O error. With werror=stop this means that the VM is stopped. You can't ever get the VM back to run because 'cont' resubmits the request, which will obviously fail again. Possible solution for this specific case would be to reset or ignore werror when the BlockDriverState is closed. However, there seems to be a more general problem with non-recoverable I/O errors.
Removing the backend (__com.redhat_drive_del) before the device (device_del) is nasty. The physical equivalent would be to first hit the disk with a hammer, then push the unplug button. The unplug button asks the OS nicely to give up the disk, but since you first hammered it dead, the OS won't be happy. Recommended usage is to unplug first. Only if the unplug doesn't succeed within a reasonable time (most likely because the guest OS doesn't cooperate) should you switch to the __com.redhat_drive_del hammer. Regardless, we probably need to handle persistent block I/O errors more gracefully.
How about adding device_del at the beginning of drive_del?
(In reply to comment #4) > How about adding device_del at the beginning of drive_del? Hi rhod, If i remove the device directly using "(qemu) device_del $device_id", it can remove the device and drive successfully, and the VM do not hang. I know my removing is wrong indeed in the step 4, but the guest hang and qemu's reaction are unacceptable. Best wishes.
Adding device_del at the beginning of drive_del won't do, I'm afraid. Let me explain. device_del's behavior depends on the bus. With some buses, such as USB, it unplugs the device immediately, no questions asked. With other buses it merely initiates the unplug. For instance, with PCI, it kicks off the ACPI hot unplug dance, which goes through a series of steps involving device model, guest BIOS, guest OS. Takes an indeterminate time to complete, and it needn't complete at all. In particular, if the guest doesn't have an ACPI driver, it takes forever without any notification to the device model. Same if it absolutely cannot give up the device, say because it got its root partition there. If you simply do a device_del at the beginning of drive_del, the unplug dance races with drive_del, and if it loses the race, you got the nasty drive_del before unplug scenario again. That's why I wrote: Only if the unplug doesn't succeed within a reasonable time (most likely because the guest OS doesn't cooperate) should you switch to the __com.redhat_drive_del hammer.
The reporter is right: we need to handle this error more gracefully.
Is this a duplicate of Bug 751700 - "block I/O error" while hot unplug a virtio disk?
I have fixed the hot un-plug problem in upstream kernel and will backport to RHEL when it hit linus's tree without any changes to qemu-kvm. http://lists.linuxfoundation.org/pipermail/virtualization/2012-June/020173.html I thinks we need some change in windows driver as well. So, reassign this bug to Vadim Rozenfeld.
(In reply to comment #8) > Is this a duplicate of Bug 751700 - "block I/O error" while hot unplug a > virtio disk? Yes. *** This bug has been marked as a duplicate of bug 751700 ***