Bug 1314160

Summary: [RFE] Option to Let VM handle EIO on Direct LUNs, not pausing it
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: RFEsAssignee: Tal Nisan <tnisan>
Status: CLOSED CURRENTRELEASE QA Contact: Avihai <aefrat>
Severity: medium Docs Contact:
Priority: high    
Version: 3.5.7CC: aefrat, germano, gveitmic, jortialc, kwolf, lsurette, mkalinin, mtessun, mwest, nashok, nsoffer, pelauter, rhodain, sirao, sraje, srevivo, tnisan
Target Milestone: ovirt-4.4.3-2Keywords: FutureFeature, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-09 16:20:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1862534    
Bug Blocks:    

Description Germano Veit Michel 2016-03-03 05:48:08 UTC
1. Proposed title of this feature request  

Option to Let VM handle EIO on Direct LUNs, not pausing it.          
   
3. What is the nature and description of the request?  

- Important Application Data is kept in a Direct LUN B for a VM.
- Storage Side replicates LUN B in LUN A. But LUN A is Read-Only.
- DR VM is kept up, with Direct LUN A attached.

Illustration:

+---------------------------+
|                           |  D. LUN  +--+
|LUN A     STORAGE     LUN B+--------->+VM|
|  +                        |    RW    +--+
+---------------------------+
   |                           D. LUN  +--+
   +---------------------------------->+DR|
                                 RO    |VM|
                                       +--+


If DR VM tries to write to Direct LUN, which is RO from Storage Side, VM is Paused with EIO. Completely expected behavior from RHEV at this point, since LUN was not marked as Read-Only.

Since this is a Direct LUN, perhaps we should pass the errors to the VM and let it deal with it. Apparently this is what VMWare and Hyper-V do. This is currently preventing a deployment of a Disaster Recovery setup in RHEV. As the DR VM sporadically tries to write to the LUN (especially during system updates - not able to reproduce this part).

Possible solution:
- Have an option in RHEV-M, off by default, to let the VM deal with EIO errors on Direct LUNs (Direct, isn't it?), not pausing it. It's the VMs problem and it's sysadmin to make sure no data is lost.

Additional info:
- Marking the LUN as Read-Only in RHEV-M works, but then when disaster happens the VM needs to be shutdown, set the LUN as RW and then start the VM again.
- It is not trivial to control the VM to never write to the LUN, even if not mounted. System upgrades, application may scan the disks and mount it even as read-only (which may trigger journal replay in some systems for example).

Or perhaps we could try to work something out from using the RO information from the Host and do this automatically.

# lsblk
NAME                                    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                                       8:0    0     5G  1 disk  <-------
`-360014059fe638e559a2488aa4db314d9     253:6    0     5G  0 mpath 
  `-360014059fe638e559a2488aa4db314d9p1 253:7    0     5G  0 part 

Guest RO is currently handled manually by RHEV-M check-box button:

# lsblk
NAME                                    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
vdb                                    252:16    0     5G  0 disk  <-------
`-test_pause_vg_lvol0                  253:2     0   160M  0 lvm
`-360014059fe638e559a2488aa4db314d9     253:6    0     5G  0 mpath 
  `-360014059fe638e559a2488aa4db314d9p1 253:7    0     5G  0 part 

      
4. Why does the customer need this? (List the business requirements here)  

- Disaster Recovery Setup currently only works on VMWare and Hyper-V due to this.
- Not easy to guarantee VMs will never write to RO Direct LUN and also keep it in place and ready to be mounted RW in case Disaster happens.

10. List any affected packages or components.  

- qemu, libvirt, vdsm, rhev-m

Comment 15 Marina Kalinin 2018-10-04 12:55:28 UTC
To share some background about decision on how things work in RHEV/qemu:
https://bugzilla.redhat.com/show_bug.cgi?id=1024428#c43
https://access.redhat.com/solutions/526303
https://bugzilla.redhat.com/show_bug.cgi?id=1064630

Not sure if this all still behaves same way today.
And it does not cover the direct lun portion requested in this RFE.

Comment 16 Marina Kalinin 2018-10-04 13:21:16 UTC
Specifically, check this comment how to set up a disk in RHEV to behave as "report" behavior:
https://bugzilla.redhat.com/show_bug.cgi?id=1064630#c20

Hopefully, it still works the same way in RHV and will be good enough to at least provide a workaround.

Comment 17 Germano Veit Michel 2018-10-05 03:43:37 UTC
Thanks Marina!

I've tested it in 4.2.6, wrote a script to manage this without using the DB (the SQL edits are a bit ugly and customers need to open support cases). And also rewrote the KCS.

We can keep the RFE open to make this configurable via UI or change the default behavior of direct LUNs as per comment #14. Otherwise this can be closed.

Comment 18 Marina Kalinin 2018-10-05 13:39:01 UTC
I would like to keep this RFE open, since having it in UI is a good thing to have.
Maybe we should change the component then to UI instead.

Comment 19 Marina Kalinin 2018-10-05 13:57:37 UTC
Germano, can you please open RFE to have this script in anisble?

Comment 20 Germano Veit Michel 2018-10-07 23:21:04 UTC
(In reply to Marina from comment #19)
> Germano, can you please open RFE to have this script in anisble?

I was planning a much simpler script and also submit a patch to ansible, but had to do it that way due to this: https://bugzilla.redhat.com/show_bug.cgi?id=1636331

Once its fixed (or clarified what I am doing wrong) I'm planning on doing both (simplifying the script on the KCS) and submit an ansible one.

Comment 22 Germano Veit Michel 2019-01-15 05:09:05 UTC
The customer would prefer if this could be done online (no VM shutdown). Even better if the disk does not need to be detached+attached too.

Comment 23 Roman Hodain 2019-11-25 13:55:19 UTC
*** Bug 1719166 has been marked as a duplicate of this bug. ***

Comment 25 Marina Kalinin 2020-06-09 20:17:05 UTC
(In reply to Germano Veit Michel from comment #17)
> Thanks Marina!
> 
> I've tested it in 4.2.6, wrote a script to manage this without using the DB
> (the SQL edits are a bit ugly and customers need to open support cases). And
> also rewrote the KCS.
> 
This KCS: https://access.redhat.com/solutions/526303  has a workaround for the customer how to enable it via a script (instead modifying database) and should work for customers with direct luns.

> We can keep the RFE open to make this configurable via UI or change the
> default behavior of direct LUNs as per comment #14. Otherwise this can be
> closed.
Would be nice to have this behavior is default for direct luns. It does not sound like lots of work for dev, but more for testing. However - it makes sense to have this configuration.
Tal, Avihai?

Comment 26 Avihai 2020-06-10 09:16:11 UTC
(In reply to Marina Kalinin from comment #25)
> (In reply to Germano Veit Michel from comment #17)
> > Thanks Marina!
> > 
> > I've tested it in 4.2.6, wrote a script to manage this without using the DB
> > (the SQL edits are a bit ugly and customers need to open support cases). And
> > also rewrote the KCS.
> > 
> This KCS: https://access.redhat.com/solutions/526303  has a workaround for
> the customer how to enable it via a script (instead modifying database) and
> should work for customers with direct luns.
> 
> > We can keep the RFE open to make this configurable via UI or change the
> > default behavior of direct LUNs as per comment #14. Otherwise this can be
> > closed.
> Would be nice to have this behavior is default for direct luns. It does not
> sound like lots of work for dev, but more for testing. However - it makes
> sense to have this configuration.
> Tal, Avihai?

To answer this question I need more details on how this will be implemented by DEV/PM/GSS like:

What is the most common use case? clear verification scenario by DEV AFTER they will announce what can they implement and how.

If all that is needed is to check VM does not pause using RO LUN than sure, but it looks like much more than that.

How should the VM handle an EIO error without pausing the VM what should be the expected result in testing this part?

Is this going to be added to the existing DR (as we already have a DR solution/ansible script)?

As you can see a lot of Q's that relay on DEV/PM to answer/think about.

Comment 27 Nir Soffer 2020-07-28 12:47:58 UTC
I think the request makes sense, but this conflicts with the way we
handle errors on other types of disks.

For thin disks on block storage we must not propagate the error to the
guest. When we get ENOSPC error, vdsm extend the disk and resume the VM.
The guest should not be aware that a disk was extended. Anything else
will break the guest.

For thin disks on file storage, maybe the error was caused by full disk
on the storage server. The storage admin can fix the issue resuming the
VM will fix the issue. If we pass ENOSPC to the guest the guest will be
broken.

For prealocated disks or LUNs, ENOSPC should not be possible and we
don't have any way to fix this.

For other errors, I don't see why we should stop the VM and how this can
help, so maybe propagating these errors always should be the default.

Kevin, what do you think?

Comment 28 Kevin Wolf 2020-07-28 14:29:06 UTC
You don't have to use the same defaults for every type of disk if different configurations make sense for different disk types.

The idea with stopping the VM on I/O errors is situations like that the virtual disk is backed by a network connection and the network goes down temporarily (I'm sure you can think of other kinds of temporary failure, too). When a disk returns an error for a request, the guest usually assumes that its disk is broken and will never retry. If it's stopped instead, you can continue the guest as soon as the problem is fixed and it will look to the guest as if there had never been a problem, but obviously the guest doesn't run and perform its job in the meantime.

There are probably valid use cases for both way to respond to an I/O failure. This is what makes it policy and why it's an option on the QEMU side.

Comment 29 Nir Soffer 2020-07-28 17:04:45 UTC
Right, I don't think we can have any default that will work for all cases.

This must be configurable per disk, so users can tune configure the system
as needed.

Comment 34 Martin Tessun 2021-02-09 16:20:50 UTC
As there is a workaround that does provide the requested functionality, closing this one.

The feature is working since RHV 4.4.3 as this release includes a system wide configuration to use "report" error
policy for direct LUN.

You can use something like:

    engine-config -s PropagateDiskErrors=true

With this configuration, Direct LUN disk will report errors to the guest
instead of pausing the VM.