Bug 1314160 - [RFE] -EIO on RO Direct LUN pauses VM. For DR setups it may be desirable to let the VM handle the error.
[RFE] -EIO on RO Direct LUN pauses VM. For DR setups it may be desirable to l...
Status: NEW
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: RFEs (Show other bugs)
3.5.7
All Linux
medium Severity medium
: ---
: ---
Assigned To: Scott Herold
Raz Tamir
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-03 00:48 EST by Germano Veit Michel
Modified: 2017-05-28 09:08 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Germano Veit Michel 2016-03-03 00:48:08 EST
1. Proposed title of this feature request  

Option to Let VM handle EIO on Direct LUNs, not pausing it.          
   
3. What is the nature and description of the request?  

- Important Application Data is kept in a Direct LUN B for a VM.
- Storage Side replicates LUN B in LUN A. But LUN A is Read-Only.
- DR VM is kept up, with Direct LUN A attached.

Illustration:

+---------------------------+
|                           |  D. LUN  +--+
|LUN A     STORAGE     LUN B+--------->+VM|
|  +                        |    RW    +--+
+---------------------------+
   |                           D. LUN  +--+
   +---------------------------------->+DR|
                                 RO    |VM|
                                       +--+


If DR VM tries to write to Direct LUN, which is RO from Storage Side, VM is Paused with EIO. Completely expected behavior from RHEV at this point, since LUN was not marked as Read-Only.

Since this is a Direct LUN, perhaps we should pass the errors to the VM and let it deal with it. Apparently this is what VMWare and Hyper-V do. This is currently preventing a deployment of a Disaster Recovery setup in RHEV. As the DR VM sporadically tries to write to the LUN (especially during system updates - not able to reproduce this part).

Possible solution:
- Have an option in RHEV-M, off by default, to let the VM deal with EIO errors on Direct LUNs (Direct, isn't it?), not pausing it. It's the VMs problem and it's sysadmin to make sure no data is lost.

Additional info:
- Marking the LUN as Read-Only in RHEV-M works, but then when disaster happens the VM needs to be shutdown, set the LUN as RW and then start the VM again.
- It is not trivial to control the VM to never write to the LUN, even if not mounted. System upgrades, application may scan the disks and mount it even as read-only (which may trigger journal replay in some systems for example).

Or perhaps we could try to work something out from using the RO information from the Host and do this automatically.

# lsblk
NAME                                    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                                       8:0    0     5G  1 disk  <-------
`-360014059fe638e559a2488aa4db314d9     253:6    0     5G  0 mpath 
  `-360014059fe638e559a2488aa4db314d9p1 253:7    0     5G  0 part 

Guest RO is currently handled manually by RHEV-M check-box button:

# lsblk
NAME                                    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
vdb                                    252:16    0     5G  0 disk  <-------
`-test_pause_vg_lvol0                  253:2     0   160M  0 lvm
`-360014059fe638e559a2488aa4db314d9     253:6    0     5G  0 mpath 
  `-360014059fe638e559a2488aa4db314d9p1 253:7    0     5G  0 part 

      
4. Why does the customer need this? (List the business requirements here)  

- Disaster Recovery Setup currently only works on VMWare and Hyper-V due to this.
- Not easy to guarantee VMs will never write to RO Direct LUN and also keep it in place and ready to be mounted RW in case Disaster happens.

10. List any affected packages or components.  

- qemu, libvirt, vdsm, rhev-m

Note You need to log in before you can comment on or make changes to this bug.