Bug 1082673 - [engine] [RO-disk] Direct-LUN connected by Virt-IO-SCSI which is configured to be RO to a VM is writeable
Summary: [engine] [RO-disk] Direct-LUN connected by Virt-IO-SCSI which is configured t...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.4.0
Assignee: Vered Volansky
QA Contact: Ori Gofen
URL:
Whiteboard: storage
Depends On:
Blocks: 1095666 1097754
TreeView+ depends on / blocked
 
Reported: 2014-03-31 15:20 UTC by Elad
Modified: 2016-02-10 18:30 UTC (History)
16 users (show)

Fixed In Version: org.ovirt.engine-root-3.4.0-17
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1097754 (view as bug list)
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:
amureini: Triaged+


Attachments (Terms of Use)
engine, vdsm, libvirt, qemu and sanlock logs (1.05 MB, application/x-gzip)
2014-03-31 15:20 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 27354 0 master MERGED core: Disallow RO LUN ISCSI disks 2021-02-11 12:58:17 UTC
oVirt gerrit 27452 0 ovirt-engine-3.4 MERGED core: Disallow RO LUN ISCSI disks 2021-02-11 12:58:17 UTC
oVirt gerrit 27453 0 None ABANDONED core: Disallow RO LUN ISCSI disks in UI 2021-02-11 12:58:17 UTC
oVirt gerrit 27454 0 ovirt-engine-3.4 MERGED core: Disallow RO disks to be attached to VM 2021-02-11 12:58:17 UTC
oVirt gerrit 27475 0 master MERGED core: Disallow RO disks to be attached to VM 2021-02-11 12:58:17 UTC
oVirt gerrit 27652 0 master MERGED core: Consolidating DiskValidator's RO validations 2021-02-11 12:58:17 UTC
oVirt gerrit 27665 0 master MERGED webadmin: Repair disallow RO in UI 2021-02-11 12:58:17 UTC

Description Elad 2014-03-31 15:20:27 UTC
Created attachment 880817 [details]
engine, vdsm, libvirt, qemu and sanlock logs

Description of problem:
Attached a direct LUN to a RHEL-6 VM using Virt-IO-SCSI as read-only. I tried to write to the disk using 'dd' from the guest and succeeded.
I tried to connect the same direct-LUN as RO to the same VM, only via Virt-IO. Writing to the disk from the guest wasn't allowed as expected.

Version-Release number of selected component (if applicable):
rhevm-3.4.0-0.12.beta2.el6ev.noarch
vdsm-4.14.2-0.2.el6ev.x86_64
libvirt-0.10.2-29.el6_5.5.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.7.x86_64
sanlock-2.8-1.el6.x86_64
On the guest - RHEL6.5

How reproducible:
Always

Steps to Reproduce:
On a shared DC
1. Create a VM with disk attached, install RHEL OS
2. Expose a LUN to the hosts, attach it to the setup
3. Attach the LUN to the VM as direct LUN via Virt-IO-SCSI as read-only
4. Try to write to the disk from the guest. I tried with 'dd':

# dd if=/dev/zero of=/dev/sdc bs=1K count=50


Actual results:

Data is written on the device when it is connected via Virt-IO-SCSI:

[root@localhost ~]# dd if=/dev/zero of=/dev/sdc bs=1K count=50
50+0 records in
50+0 records out
51200 bytes (51 kB) copied, 0.0255978 s, 2.0 MB/s



When connecting a direct LUN using Virt-IO as RO, the dd fails:

[root@localhost ~]# dd if=/dev/zero of=/dev/vde bs=1K count=50
dd: writing `/dev/vde': Operation not permitted
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00173086 s, 0.0 kB/s



==================

lsblk on the guest:

[root@localhost ~]# lsblk
NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0                     11:0    1 1024M  0 rom
vda                    252:0    0    7G  0 disk
├─vda1                 252:1    0  500M  0 part /boot
└─vda2                 252:2    0  6.5G  0 part
  ├─vg0-lv_root (dm-0) 253:0    0  3.1G  0 lvm  /
  ├─vg0-lv_swap (dm-1) 253:1    0  3.1G  0 lvm  [SWAP]
  └─vg0-lv_home (dm-2) 253:2    0  308M  0 lvm  /home
vdb                    252:16   0    1G  1 disk
vdc                    252:32   0    1G  1 disk
vdd                    252:48   0    1G  1 disk
sdc                      8:32   0   50G  0 disk
sdb                      8:16   0    1G  1 disk
vde                    252:64   0   50G  1 disk

==================
Both disk are RO as passed by the XML request presented in vdsm.log:


Hotplug call to the Direct LUN which is connected via Virt-IO-SCSI:

Thread-7394::DEBUG::2014-03-31 17:11:18,534::vm::3565::vm.Vm::(hotplugDisk) vmId=`8e50d783-1973-49e7-861a-b530ca22aa74`::Hotplug disk xml: <disk device="lun" snapshot="no" type="block">
        <address bus="0" controller="0" target="0" type="drive" unit="1"/>
        <source dev="/dev/mapper/3514f0c59af40010b"/>
        <target bus="scsi" dev="sdd"/>
        <readonly/>
        <serial></serial>
        <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/>
</disk>



Hotplug call to the Direct LUN which is connected via Virt-IO:

Thread-7546::DEBUG::2014-03-31 17:17:06,322::vm::3565::vm.Vm::(hotplugDisk) vmId=`8e50d783-1973-49e7-861a-b530ca22aa74`::Hotplug disk xml: <disk device="lun" snapshot="no" type="block">
        <source dev="/dev/mapper/3514f0c59af40010c"/>
        <target bus="virtio" dev="vdf"/>
        <readonly/>
        <serial></serial>
        <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/>
</disk>


libvirt.log:

2014-03-31 14:17:06.344+0000: 2818: debug : qemuMonitorAddDrive:2708 : mon=0x7f01500da980 drive=file=/dev/mapper/3514f0c59af40010c,if=none,id=drive-virtio-disk5,readonly=on,format=raw,serial=,cache=none,werror=sto
p,rerror=stop,aio=native


==================

Expected results:
RO direct LUN disk connected via Virt-IO-SCSI is supposed to be write protected.

=================

Additional info: engine, vdsm, libvirt, qemu and sanlock logs 
(notice to time difference of 3 hours between vdsm.log to libvirt.log

Comment 1 Elad 2014-04-01 06:20:05 UTC
Note that RO image disk (not a direct-LUN) connected via Virt-IO-SCSI is write protected as should be.

Comment 2 Allon Mureinik 2014-04-01 10:04:17 UTC
Need to check if we're passing the RO flag properly in the direct lun scenario.
If we are, this should be moved to a lower component than RHEV.

Comment 3 Vered Volansky 2014-04-24 07:46:41 UTC
(In reply to Allon Mureinik from comment #2)
> Need to check if we're passing the RO flag properly in the direct lun
> scenario.
> If we are, this should be moved to a lower component than RHEV.

We are.
Some more related log snippets to prove that:

libvirt.log for the iSCSI scenario:

2014-03-31 14:11:18.545+0000: 2822: debug : virDomainAttachDevice:9677 : dom=0x7f0154008670, (VM: name=1, uuid=8e50d783-1973-49e7-861a-b530ca22aa74), xml=<disk device="lun" snapshot="no" type="block">
        <address bus="0" controller="0" target="0" type="drive" unit="1"/>
        <source dev="/dev/mapper/3514f0c59af40010b"/>
        <target bus="scsi" dev="sdd"/>
        <readonly/>
        <serial></serial>
        <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/>
</disk>

2014-03-31 14:11:18.551+0000: 2822: debug : qemuMonitorAddDrive:2708 : mon=0x7f01500da980 drive=file=/dev/mapper/3514f0c59af40010b,if=none,id=drive-scsi0-0-0-1,readonly=on,format=raw,serial=,cache=none,werror=st
op,rerror=stop,aio=native

engine.log on VM creation with this device attached as RO:

{shared=false, iface=scsi, GUID=3514f0c59af40010c, address={unit=3, bus=0, target=0, controller=0, type=drive}, specParams={}, optional=false, pr
opagateErrors=off, device=lun, format=raw, sgio=unfiltered, type=disk, readonly=true, deviceId=93a1d9ef-d226-4661-aaf8-60e65bf51c93}

(Time of last log message is: 2014-03-31 16:26:01,437 INFO )

Comment 4 Vered Volansky 2014-04-24 07:52:34 UTC
Engine passes to vdsm RO=true for the device, and vdsm passes this value correctly to libvirt. So this is a problem with qemu hadling of this value. Any chance this was already fixed in a newer qemu version? If not, in which version should we expect this?

Comment 5 Vered Volansky 2014-04-24 07:54:52 UTC
Note that if this is not supported yet, we need to block this option in the engine until it's resolved.

Comment 6 Allon Mureinik 2014-05-04 06:03:48 UTC
Elad, I'm trying to understand the scope here.

Does this reproduce with any Direct LUN (regardless of the interface)? Does this reproduce with image disks with VirtIO-SCSI?

Comment 7 Elad 2014-05-04 06:57:44 UTC
(In reply to Allon Mureinik from comment #6)
> Elad, I'm trying to understand the scope here.
> 
> Does this reproduce with any Direct LUN (regardless of the interface)? Does
> this reproduce with image disks with VirtIO-SCSI?

Reproduced only when using direct LUN connected with VirtIO-SCSI, as stated in comments #0 and #1

Comment 8 Allon Mureinik 2014-05-04 07:58:56 UTC
(In reply to Elad from comment #7)
> (In reply to Allon Mureinik from comment #6)
> > Elad, I'm trying to understand the scope here.
> > 
> > Does this reproduce with any Direct LUN (regardless of the interface)? Does
> > this reproduce with image disks with VirtIO-SCSI?
> 
> Reproduced only when using direct LUN connected with VirtIO-SCSI, as stated
> in comments #0 and #1
Missed that.
Thanks Elad!

Comment 9 Allon Mureinik 2014-05-04 08:00:01 UTC
Returning needinfo on Paolo which was removed by mistake to address question on comment 4

Comment 10 Allon Mureinik 2014-05-04 13:21:24 UTC
The patch in the external tracker is a last resort - it disables RO VirtIO SCSI Direct LUNs.
We're working on it as a contingency plan, but I'd much prefer to support this.

Comment 11 Paolo Bonzini 2014-05-05 08:34:15 UTC
I think disabling read-only virtio-scsi direct LUNs is actually the right fix, because the fix is not trivial.  What is the usecase for read-only direct LUNs (in general, not just virtio-scsi)?

Making a SCSI LUN read-only requires intercepting passed-through commands and modifying the results (for example change the inquiry data to report the LUN as read-only, and QEMU never needed to do this yet).

Is QEMU running as root?  If so, one problem is that the kernel doesn't filter write commands on read-only file descriptors when the process has CAP_SYS_RAWIO.  This would be a separate fix.

But even if QEMU is not running as root, the kernel will treat the disk as writable and write data to the page cache, only to fail later at writeback time.  If QEMU is not running as root, you should notice that adding "oflag=direct" causes writes to fail directly, because they bypass the page cache.

Comment 12 Dan Kenigsberg 2014-05-06 10:41:40 UTC
(It does not matter much, but for the record, oVirt runs qemu as non-root (user qemu))

Comment 13 Paolo Bonzini 2014-05-06 15:57:06 UTC
Thanks. Allon, can you check that "oflag=direct" works, and create a qemu-kvm RFE for this?  vdsm can disable the RO LUNs in the meanwhile.

Comment 14 Allon Mureinik 2014-05-08 07:39:16 UTC
The functionality to block "illegal" configurations is merged. The UX can be improved, but should not block 3.4.0.

Comment 15 Allon Mureinik 2014-05-08 10:24:25 UTC
(In reply to Paolo Bonzini from comment #13)
> Thanks. Allon, can you check that "oflag=direct" works, and create a
> qemu-kvm RFE for this?  vdsm can disable the RO LUNs in the meanwhile.

Created bug 1095663.
Thanks a lot for all your help!

Comment 16 Ori Gofen 2014-05-12 14:33:55 UTC
verified in av9 when trying to create virtio-iscsi Lun disk it fails with:

"Cannot add Virtual Machine Disk. A VirtIO-SCSI LUN disk can't be read-only.
Close"

Comment 17 Allon Mureinik 2014-05-14 13:08:22 UTC
This bug is verified. I've opened bug 1097754 to continue tracking disabling such illegal configurations in the GUI.

Comment 18 Itamar Heim 2014-06-12 14:11:10 UTC
Closing as part of 3.4.0

Comment 19 Rob Washburn 2014-07-10 15:08:12 UTC
(In reply to Paolo Bonzini from comment #11)
> I think disabling read-only virtio-scsi direct LUNs is actually the right
> fix, because the fix is not trivial.  What is the usecase for read-only
> direct LUNs (in general, not just virtio-scsi)?
> 
> Making a SCSI LUN read-only requires intercepting passed-through commands
> and modifying the results (for example change the inquiry data to report the
> LUN as read-only, and QEMU never needed to do this yet).
> 
> Is QEMU running as root?  If so, one problem is that the kernel doesn't
> filter write commands on read-only file descriptors when the process has
> CAP_SYS_RAWIO.  This would be a separate fix.
> 
> But even if QEMU is not running as root, the kernel will treat the disk as
> writable and write data to the page cache, only to fail later at writeback
> time.  If QEMU is not running as root, you should notice that adding
> "oflag=direct" causes writes to fail directly, because they bypass the page
> cache.

One important use case for direct LUN read-only with virtio-SCSI disks is CloudForms smart-state analysis, which requires the presentation of all LUNs backing a RHEV 3.4 storage domain to the CloudForms appliance via the Direct LUN read-only functionality that is native to RHEV 3.4.

It is not uncommon for RHEV storage domains in large deployments to be backed by large numbers of smaller LUNs.  In the CloudForms smart-state analysis scenario which requires presenting the backing LUNs to the CloudForms appliance via direct LUN read-only it is very easy to exceed the QEMU PCI device limit of the CloudForms appliance VM if you are constrained to using virtio-blk.

Presenting direct LUNs read-only to the CloudForms appliance as virtio-scsi devices is a natural way to get around the QEMU PCI device limitation problem, as now the CloudForms appliance can conduct smart-state analysis of RHEV VMs across hundreds of storage domain backing LUNs.

Because of the scope of access that CloudForms has to RHEV VM data across the enterprise it is essential that it be confined to read-only access for the VM disks that it is analyzing.  If the direct LUN presentation of storage domain backing LUNs is the only mechanism for conducting smart-state analysis of RHEV VMs and if virtio-SCSI is a necessity for deploying CloudForms on large RHEV environments that RHEV’s direct LUN read-only functionality needs to work with virtio-SCSI disks.

Comment 20 Paolo Bonzini 2014-07-10 15:27:28 UTC
RHEV's direct LUN functionality has the limitation of always doing SCSI passthrough, and not presenting the choice between emulation in QEMU and passthrough.

For the simple usecase of "might exceed the PCI device limit", emulation is a better match, and it already supports read-only.

Comment 21 Rob Washburn 2014-07-10 21:14:12 UTC
(In reply to Paolo Bonzini from comment #20)
> RHEV's direct LUN functionality has the limitation of always doing SCSI
> passthrough, and not presenting the choice between emulation in QEMU and
> passthrough.
> 
> For the simple usecase of "might exceed the PCI device limit", emulation is
> a better match, and it already supports read-only.

By SCSI emulation do you mean using RHEV's mechanism of sharing disks between VMs (which does support the use of a read-only flag)?

If so the major catch is that CloudForms smart-state analysis functionality expects the LUNs backing the storage domain to presented via Direct LUN - I don't know if CloudForms would do if VM disks were directly shared with it via RHEV's functionality.  If it worked it would change the import/discovery process of VM disks at the CloudForms level.

Comment 22 Paolo Bonzini 2014-07-11 10:49:18 UTC
> By SCSI emulation do you mean using RHEV's mechanism of sharing disks between 
> VMs (which does support the use of a read-only flag)?

I don't know.  I mean using <disk ... type='block'> instead of <disk ... type='lun'> in the libvirt domain XML.


Note You need to log in before you can comment on or make changes to this bug.