Currently, there are different error_policy values available for KVM guests: 'report' - propagate all errors to the guest 'enospc' - stop guest on ENOSPC, propagate all other errors to the guest 'stop' - stop guest 'ignore' - ignore it and let OS handle it. Today, RHEV supports only one option - stop. There are scenarios, when a VM has more then one disk attached to it and customers are willing to handle IO errors on the OS level, when the error is not on the system disk, instead of having the whole VM got paused. RHEV should add the ability of specifying error_policy for each disk. This option should be available to the end user - system Administrator. And it should support all the supported values by qemu-kvm, as stated above.
Engine/Vdsm API supports the control of stop/enospace since ever via the propagateErrors attribute. Would extending this to REST and GUI be enough? If not, please explain why it is useful to report "enospace" to the guest. Another option to solve/hack this bug is to extend http://www.ovirt.org/Features/Device_Custom_Properties to disk devices, and have a hook tweak the error_policy to whatever the user wants to.
Dan, enospace will still pause the VM. I am not interested in this resolution. In order not to stop the guest, we should specify report or ignore, and from what I see on vdsm code, those options are not supported right now.
Sorry, not only vdsm code, but everywhere in RHEV. base_disks table has a boolean field - propagate_errors, that vdsm uses later to decide if to use stop or enospc, but it is still not enough. If the device property defined here: http://www.ovirt.org/Features/Device_Custom_Properties would let the user specify error_policy per disk - would be fine with this RFE, as long as the rest of the code would be able to handle it.
*** Bug 970520 has been marked as a duplicate of this bug. ***
*** Bug 1064630 has been marked as a duplicate of this bug. ***
Right now we are using the following workaround: # /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "UPDATE base_disks SET propagate_errors = 'On'" This, together with the fix of vdsm in https://bugzilla.redhat.com/show_bug.cgi?id=1064630, will switch the guests from pausing on all I/O errors, to pausing only on out-of-space (which is what RHEV actually needs to expand the disk volumes). But this is not a convenient way to implement this. It should be run for each new guest added to the system. And it is a hack, not a proper solution.
Draft VDSM patch to allow Engine fine tune disk's error policy: http://gerrit.ovirt.org/#/c/38009/
The safest option would be to expose the propagateErrors option in the ui per disk. Supporting the "report" option should be ok, but needs lot of testing. I don't see how the "ignore" option can be useful - I don't think we should support this in the engine ui. Kevin, what happens in the guest when using error_policy="ignore" and qemu is having an io error reading or writing from a device?
(In reply to Nir Soffer from comment #29) > Supporting the "report" option should be ok, but needs lot of testing. I don't think that would actually be useful in a lot of cases, because the only thing this achieves is breaking sparse image allocation, but it can probably be made work. > I don't see how the "ignore" option can be useful - I don't think we should > support this in the engine ui. I agree. "ignore" is not only hardly ever useful, but it is dangerous and not a mode that should be used in production. > Kevin, what happens in the guest when using error_policy="ignore" and > qemu is having an io error reading or writing from a device? It pretends that the operation has succeeded while it really hasn't. In case of a write operation, nothing is written, and in case of a read, garbage is returned.
(In reply to Kevin Wolf from comment #31) > (In reply to Nir Soffer from comment #29) > > Supporting the "report" option should be ok, but needs lot of testing. > > I don't think that would actually be useful in a lot of cases, because the > only > thing this achieves is breaking sparse image allocation, but it can probably > be > made work. Can you elaborate on this? Do you think this can be useful in some cases, and we can support this option?
In the customer case where this originated, this feature was wanted as a workaround for the fact that in the qemu errors mode which RHEV defaults to, all IO errors always caused a pause, even if they were triggered by a specific guest action, and in no way the responsibility of the host. In that case, something on the guest was causing disk accesses beyond the bounds of the logical device - obviously pausing the guest isn't going to help with that. At the time, qemu didn't have a mode which would report those sorts of errors to the guest, while still pausing for "backend" IO errors - other bugs have been filed to address that. However qemu's "enospc" mode does do the right thing for that case, without breaking lazy image growing. Even once the qemu side bug of not always reporting guest-caused IO errors is fixed, there are cases where "enospc" mode would make more sense than "pause always". It's plausible to want the guest to see and deal with IO errors, rather than assuming they can be handled on the host side. Doing so without hacking the RHEV database requires the feature requested by this bug. Hope that background helps to clarify.
(In reply to David Gibson from comment #33) > In the customer case where this originated, this feature was wanted as a > workaround for the fact that in the qemu errors mode which RHEV defaults to, > all IO errors always caused a pause, even if they were triggered by a > specific guest action, and in no way the responsibility of the host. In > that case, something on the guest was causing disk accesses beyond the > bounds of the logical device - obviously pausing the guest isn't going to > help with that. Aha! Another good example why bug reports should be describing the problem instead of requesting what someone thinks is a solution. Stopping on such errors is a qemu bug and needs to be fixed there. Adding more options to the mangement UI isn't going to solve this in a helpful way, because while it may improve the behaviour for guest-caused errors, it breaks the behaviour for errors on the host side. > At the time, qemu didn't have a mode which would report those sorts of > errors to the guest, while still pausing for "backend" IO errors - other > bugs have been filed to address that. However qemu's "enospc" mode does do > the right thing for that case, without breaking lazy image growing. "enospc" has existed pretty much always. It's not the solution, though, because a short network outage for a network-backed image would break the guest now. The correct solution for the problem you described above is backporting upstream commits 3c2daac0 (virtio-blk) and 58ac3211 (IDE). This happened already in 6.6 in the context of bug 1064643. So to fix that specific use case no change to management is needed at all. > Even once the qemu side bug of not always reporting guest-caused IO errors > is fixed, there are cases where "enospc" mode would make more sense than > "pause always". It's plausible to want the guest to see and deal with IO > errors, rather than assuming they can be handled on the host side. Doing so > without hacking the RHEV database requires the feature requested by this bug. Yes, I agree that there could be cases where werror=enospc is a useful setting. These are corner cases, though, and the attached customer cases don't seem to be such cases, but rather about the qemu bug discussed above. Therefore, we may consider adding it, but it might have lower priority than this BZ may make it appear. I am very doubtful about [rw]error=report ever being useful and would advise against providing it in the UI unless there is a very specific need for it. Getting it right might be non-trivial and isn't worth the effort without a customer use case. As for [rw]error=ignore, I'm fully and decidedly against providing it. It has no use outside of testing and possibly debugging and is dangerous.
(In reply to Kevin Wolf from comment #34) > "enospc" has existed pretty much always. It's not the solution, though, > because > a short network outage for a network-backed image would break the guest now. > > The correct solution for the problem you described above is backporting > upstream commits 3c2daac0 (virtio-blk) and 58ac3211 (IDE). This happened > already in 6.6 in the context of bug 1064643. > > So to fix that specific use case no change to management is needed at all. > Are you suggesting then to have management defaulting to enospc rather then stop?
No, the management defaults should stay as they are. That problem was fixed in qemu, and it needed to be fixed only in qemu.
(In reply to Kevin Wolf from comment #36) > No, the management defaults should stay as they are. That problem was fixed > in qemu, and it needed to be fixed only in qemu. So, I probably misunderstand something. If we continue stopping the guest on each IO error (current default), how would we handle the case where we propagate the error to OS?
Marina, as I understand the current state, the idea of the qemu side fixes is that even in "errors=stop" mode, qemu will no longer stop on all disk errors. Specifically, it will always report to the guest errors which cannot be anything but the guest's fault - the main case being accesses outside the bounds of the device. Kevin, so to be clear. In comment 33 I'm explaining the context in which this bug was filed (which, yes, I should have better explained initially). But the intention isn't for this bug to specifically address that problem (other bugs were filed for that). Instead the idea of this bug is to allow easy configuration of a guest which is in errors=enospc mode for cases where you really do want errors reported to the guest (e.g. this might be useful for a guest with some secondary storage on a remote/flaky network connection for which the guest application has its own recovery mechanisms). At present RHEV sort of supports such a use case with the propagate errors flag, but there's no way to actually activate this flag other than direct database manipulation. Dropping the priority is appropriate. I concur that "errors=ignore" is a bad idea.
Let's summarize the above and understand what action items should be taken on RHEV side to cover all possible functionalities. Right now RHEV supports only "errors=stop". Qemu exposed the following options: 'report' - propagate all errors to the guest 'enospc' - stop guest on ENOSPC, propagate all other errors to the guest 'stop' - stop guest 'ignore' - ignore it and let OS handle it. However, today's qemu behavior has been changed so that: 'stop' -> will not stop the guest and always propagate the error to the guest. 'ignore' -> should be discarded as an option / should not be available. All the rest are unchanged. Is this correct? David, Kevin? Next - what options should be exposed to the end user (RHEV/RHOS)? And why? What is the expected behavior if qemu on each of the values today?
> Is this correct? David, Kevin? That's correct with one clarification. "stop" has been changed so that errors which can only be guest triggered (such as access beyond the bounds of the device) will always propagate to the guest. All other errors still stop the VM, as before. > Next - what options should be exposed to the end user (RHEV/RHOS)? Really that's a question for the RHEV and RHOS guys. As you say RHEV currently only supports errors=stop. It does also have kinda-sorta half-baked support for errors=enospc: setting the propagate_errors field in the db to true will switch qemu to errors=enospc mode, but there's no way to set that field short of direct database manipulation. When this cluster of bugs was started, I was suggesting changing the half-baked support for errors=enospc to full support. I think there are cases where it makes sense for the guest to handle genuine disk errors rather than handling them in the host. But since then I think I've seen some RHEV people indicating they're not convinced that's a good idea.
(In reply to Marina from comment #39) > 'report' - propagate all errors to the guest > 'enospc' - stop guest on ENOSPC, propagate all other errors to the guest > 'stop' - stop guest > 'ignore' - ignore it and let OS handle it. No, it's just "ignore it", period. qemu pretends that the request has succeeded even if it hasn't, so the guest OS can't handle anything because it doesn't even know that there was an error. > However, today's qemu behavior has been changed so that: > 'stop' -> will not stop the guest and always propagate the error to the > guest. 'stop' will stop the guest if there was a host-side error (e.g. network went away and image is stored on NFS). It won't stop the guest for guest-side errors like submitting invalid requests. > 'ignore' -> should be discarded as an option / should not be available. qemu won't remove the option, but it has very little use outside of debugging. RHEV shouldn't expose it in the GUI. I have no strong opinion on whether it should be accessible using some method like manipulating the database.
Thank you for clarification. So here is the summary of today's options: 'report' - propagate all errors to the guest 'enospc' - stop guest on ENOSPC, propagate all other errors to the guest 'stop' - stop guest on host errors, propagate to guests - guest OS related errors 'ignore' - ignore the error (used for internal debugging only). With this new 'stop' implementation/fix, it would make the life of end users easier. And with it, it is also more reasonable to have this as a default option in RHEV (as it is now already). Question: can you see any needs for exposing 'report' and 'enospc' to the end user? What can be the use case for it? And second question: assuming we decide there is no big need exposing it to the UI and enough to keep it in db, for special debugging cases; would we need to change rhev to support the 'report' option too, since right now we support only 'enospc' and 'stop'.
> With this new 'stop' implementation/fix, it would make the life of end users easier. And with it, it is also more reasonable to have this as a default option in RHEV (as it is now already). I don't quite follow what you're saying here. The new "stop" implementation is simply more correct. The old version would sometimes stop - essentially asking the host to intervene - for errors which the host cannot possibly fix. > Question: can you see any needs for exposing 'report' and 'enospc' to the end user? What can be the use case for it? I don't think "report" can reasonably be exposed, because without at least stopping for out of space errors, RHEV can't implement on-demand growing of guest volumes. "enospc" is almost the same as "report", except for the handling of out of space errors. I think there are use cases for this mode: e.g. (a) you have a physical disk with known errors, and you have a tool for analyzing / recovering those errors which you want to run inside a guest. (b) you are running some sort of fault-tolerant server application inside your guest - if the application has better information to recover from errors than the host, you want the errors reported to it, rather than just to the host. > And second question: assuming we decide there is no big need exposing it to the UI and enough to keep it in db, for special debugging cases; would we need to change rhev to support the 'report' option too, since right now we support only 'enospc' and 'stop'. As above, "report" would break on-demand volumes, so I don't think we can use it. "enospc" essentially exists for exactly this sort of case - most errors are reported to the guest, but the few required for on-demand volume resizing are reported to the host.
So, based on David's explanation, I think we should be ok as we are now - without exposing anything to UI and without adding additional fields to the database. If you all agree, I am ok closing this RFE as closed wontfix and cover the info in a kcs article.
(In reply to Marina from comment #45) > So, based on David's explanation, I think we should be ok as we are now - > without exposing anything to UI and without adding additional fields to the > database. > > If you all agree, I am ok closing this RFE as closed wontfix and cover the > info in a kcs article. Ack. Please close it.