Bug 805461
Summary: | the windows guest hang and can't resume the VM after hot-unplug the data disk. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Sibiao Luo <sluo> |
Component: | qemu-kvm | Assignee: | Vadim Rozenfeld <vrozenfe> |
Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.3 | CC: | acathrow, armbru, bsarathy, chayang, juzhang, kwolf, michen, mkenneth, qzhang, rhod, shuang, sluo, virt-maint, wdai |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-07-31 07:17:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sibiao Luo
2012-03-21 11:19:50 UTC
Windows seems to be trying to unmount the drive when doing the device_del, but as the qemu block device was removed already by drive_del, trying to write to it results in an I/O error. With werror=stop this means that the VM is stopped. You can't ever get the VM back to run because 'cont' resubmits the request, which will obviously fail again. Possible solution for this specific case would be to reset or ignore werror when the BlockDriverState is closed. However, there seems to be a more general problem with non-recoverable I/O errors. Removing the backend (__com.redhat_drive_del) before the device (device_del) is nasty. The physical equivalent would be to first hit the disk with a hammer, then push the unplug button. The unplug button asks the OS nicely to give up the disk, but since you first hammered it dead, the OS won't be happy. Recommended usage is to unplug first. Only if the unplug doesn't succeed within a reasonable time (most likely because the guest OS doesn't cooperate) should you switch to the __com.redhat_drive_del hammer. Regardless, we probably need to handle persistent block I/O errors more gracefully. How about adding device_del at the beginning of drive_del? (In reply to comment #4) > How about adding device_del at the beginning of drive_del? Hi rhod, If i remove the device directly using "(qemu) device_del $device_id", it can remove the device and drive successfully, and the VM do not hang. I know my removing is wrong indeed in the step 4, but the guest hang and qemu's reaction are unacceptable. Best wishes. Adding device_del at the beginning of drive_del won't do, I'm afraid. Let me explain. device_del's behavior depends on the bus. With some buses, such as USB, it unplugs the device immediately, no questions asked. With other buses it merely initiates the unplug. For instance, with PCI, it kicks off the ACPI hot unplug dance, which goes through a series of steps involving device model, guest BIOS, guest OS. Takes an indeterminate time to complete, and it needn't complete at all. In particular, if the guest doesn't have an ACPI driver, it takes forever without any notification to the device model. Same if it absolutely cannot give up the device, say because it got its root partition there. If you simply do a device_del at the beginning of drive_del, the unplug dance races with drive_del, and if it loses the race, you got the nasty drive_del before unplug scenario again. That's why I wrote: Only if the unplug doesn't succeed within a reasonable time (most likely because the guest OS doesn't cooperate) should you switch to the __com.redhat_drive_del hammer. The reporter is right: we need to handle this error more gracefully. Is this a duplicate of Bug 751700 - "block I/O error" while hot unplug a virtio disk? I have fixed the hot un-plug problem in upstream kernel and will backport to RHEL when it hit linus's tree without any changes to qemu-kvm. http://lists.linuxfoundation.org/pipermail/virtualization/2012-June/020173.html I thinks we need some change in windows driver as well. So, reassign this bug to Vadim Rozenfeld. (In reply to comment #8) > Is this a duplicate of Bug 751700 - "block I/O error" while hot unplug a > virtio disk? Yes. *** This bug has been marked as a duplicate of bug 751700 *** |