Bug 217859
Summary: | HVM device model 'qemu-dm' needs to handle ENOSPC sparse files | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Daniel Berrangé <berrange> | ||||||
Component: | xen | Assignee: | Daniel Berrangé <berrange> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 5.0 | CC: | xen-maint | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 5.0.0 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-01-26 20:07:21 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 217765 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Daniel Berrangé
2006-11-30 14:04:47 UTC
Created attachment 142510 [details] Propagate i/o errors back through IDE layer Ref: http://post-office.corp.redhat.com/archives/virtualist/2006-November/msg00379.html The core of the problem lies in the QEMU IDE codebase, tools/ioemu/hw/ide.c In particular 4 methods ide_read_dma_cb, ide_sector_read, ide_write_dma_cb and ide_sector_write. These methods call bdrv_read / bdrv_write which are contracted to return 0 on success, -1 on failure. Fixing the code to check the return status here is pretty simple. My question is what are the best error conditions to return from the IDE protocol POV. I'm attaching a proof-of-concept patch which returns ERR_STAT + ICRC_ERR (aka BadSector) for read failures, and returns WRERR_STAT (aka DeviceFault) for write failures. I am far from convinced I'm using the optimal status codes here though, since I know next-to-nothing about the IDE protocol. Originally I had write failures also returning BadSector, but the effect of that was that the guest OS simply retried the request writing to the next sector..which failed...so retried to next sector...so on for the entire disk. This clearly isn't too useful, so I switched to DeviceFault for write failures. With this though, the guest OS sees the fault, resets the IDE device, and tries again, forever, Nov 30 11:59:58 dhcp-4-205 kernel: hdc: dma_intr: status=0x20 { DeviceFault } Nov 30 11:59:58 dhcp-4-205 kernel: ide: failed opcode was: unknown Nov 30 11:59:58 dhcp-4-205 kernel: hdc: DMA disabled Nov 30 11:59:58 dhcp-4-205 kernel: ide1: reset: success Nov 30 12:00:28 dhcp-4-205 kernel: hdc: lost interrupt Nov 30 12:00:28 dhcp-4-205 kernel: hdc: task_out_intr: status=0x20 { DeviceFault } Nov 30 12:00:28 dhcp-4-205 kernel: ide: failed opcode was: unknown Nov 30 12:00:28 dhcp-4-205 kernel: ide1: reset: success Nov 30 12:00:58 dhcp-4-205 kernel: hdc: lost interrupt Nov 30 12:00:58 dhcp-4-205 kernel: hdc: task_out_intr: status=0x20 { DeviceFault } Nov 30 12:00:58 dhcp-4-205 kernel: ide: failed opcode was: unknown Nov 30 12:00:58 dhcp-4-205 kernel: ide1: reset: success What I think we want is for the process doing the I/O in the guest to get an -EIO from the write() call it is doing & the guest OS then re-mount the filesystem read only. QE ack for RHEL5. Two separate issues: One-off media errors need to get propagated upwards. But catastrophic failures of the backing store, as ENOSPC implies, may well require more serious handling, including potentially terminating the guest with prejudice. Created attachment 142755 [details]
Propagate QEMU I/O errors back to guest through IDE layer
This is the updated patch merged in xen-unstable.hg to propagate QEMU I/O
errors back to the guest through the IDE layer.
in xen-3.0.3-11.el5 xen-3.0.3-22.el5 included in 20070125.0. |