865336 – RFE: add werror=stop,rerror=stop to -drive parameter in QEMU command line (so VMs will pause upon IO errors)

Bug 865336 - RFE: add werror=stop,rerror=stop to -drive parameter in QEMU command line (so VMs will pause upon IO errors)

Summary: RFE: add werror=stop,rerror=stop to -drive parameter in QEMU command line (so...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	1.0 (Essex)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	beta
Target Release:	4.0
Assignee:	Solly Ross
QA Contact:	Ami Jeain
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-11 09:09 UTC by Yaniv Kaul
Modified:	2023-09-18 10:02 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-06-11 18:07:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-28803	0	None	None	None	2023-09-18 09:59:21 UTC

Description Yaniv Kaul 2012-10-11 09:09:38 UTC

Description of problem:
Unless this is the default of downstream QEMU, we should have the VM pause upon IO errors to prevent data loss.

Version-Release number of selected component (if applicable):
Essex

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Alan Pevec 2012-12-13 17:16:41 UTC

What would be the recovery action / alert to the user?

Comment 3 Dave Allan 2013-04-30 15:27:17 UTC

Dan, stop is the qemu default, right, so there's nothing needed here, is there?

Comment 4 Daniel Berrangé 2013-04-30 15:30:02 UTC

Yeah, but the bigger question is what todo when this situation occurs. Just marking the VM as paused in libvirt is not a full soltion

Comment 5 Dave Allan 2013-04-30 17:43:31 UTC

What needs to happen?

Comment 6 Daniel Berrangé 2013-05-01 07:40:07 UTC

I don't know - that's what someone needs to figure out.

Comment 8 Solly Ross 2013-06-03 13:56:31 UTC

According to the official qemu documentation (http://qemu.weilnetz.de/qemu-doc.html, linked from qemu.org), the default flags are werror=enospc and rerror=report, meaning report read errors to guest and on write errors pause qemu if host disk is full, otherwise report them to the guest..

HOWEVER:

This is supported via libvirt's error_policy (for both, override read error policy with rerror_policy) on the driver tag for the disk specification.  According to the libvirt doc, the default setting for libvirt is REPORT (I'm guessing that libvirt actually passes these to qemu, so the qemu defaults are moot).

So, injecting it in to the configuration should be pretty easy, but we probably also want to provide a configuration option.  As for state, perhaps we could use the metadata tag (http://libvirt.org/formatdomain.html#elementsMetadata) and have to store a flag, but we'd need to figure out how to detect if the stopping of the VM was intentional or accidental (we could manually set the flag to "on_purpose" whenever we intentionally shut down the vm, then look and see if the vm is shut down but on_purpose is not set).

Comment 9 Solly Ross 2013-06-06 20:35:34 UTC

(ping -- see above)

Comment 10 Solly Ross 2013-06-10 16:48:25 UTC

Bug added upstream: https://bugs.launchpad.net/nova/+bug/1189543 (no review id yet, though)

Comment 11 Solly Ross 2013-06-11 18:07:26 UTC

Polled upstream, consensus was WORKS AS INTENDED --

The thought was that this would be confusing for people running software in VMs (why is my VM suddenly stopped when I have code inside to handle IO errors) and that much existing software would rather just have the default (REPORT), and have the guest software (OS, database, etc) deal with the IO errors instead.

Note You need to log in before you can comment on or make changes to this bug.