Created attachment 880817 [details] engine, vdsm, libvirt, qemu and sanlock logs Description of problem: Attached a direct LUN to a RHEL-6 VM using Virt-IO-SCSI as read-only. I tried to write to the disk using 'dd' from the guest and succeeded. I tried to connect the same direct-LUN as RO to the same VM, only via Virt-IO. Writing to the disk from the guest wasn't allowed as expected. Version-Release number of selected component (if applicable): rhevm-3.4.0-0.12.beta2.el6ev.noarch vdsm-4.14.2-0.2.el6ev.x86_64 libvirt-0.10.2-29.el6_5.5.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.7.x86_64 sanlock-2.8-1.el6.x86_64 On the guest - RHEL6.5 How reproducible: Always Steps to Reproduce: On a shared DC 1. Create a VM with disk attached, install RHEL OS 2. Expose a LUN to the hosts, attach it to the setup 3. Attach the LUN to the VM as direct LUN via Virt-IO-SCSI as read-only 4. Try to write to the disk from the guest. I tried with 'dd': # dd if=/dev/zero of=/dev/sdc bs=1K count=50 Actual results: Data is written on the device when it is connected via Virt-IO-SCSI: [root@localhost ~]# dd if=/dev/zero of=/dev/sdc bs=1K count=50 50+0 records in 50+0 records out 51200 bytes (51 kB) copied, 0.0255978 s, 2.0 MB/s When connecting a direct LUN using Virt-IO as RO, the dd fails: [root@localhost ~]# dd if=/dev/zero of=/dev/vde bs=1K count=50 dd: writing `/dev/vde': Operation not permitted 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00173086 s, 0.0 kB/s ================== lsblk on the guest: [root@localhost ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 7G 0 disk ├─vda1 252:1 0 500M 0 part /boot └─vda2 252:2 0 6.5G 0 part ├─vg0-lv_root (dm-0) 253:0 0 3.1G 0 lvm / ├─vg0-lv_swap (dm-1) 253:1 0 3.1G 0 lvm [SWAP] └─vg0-lv_home (dm-2) 253:2 0 308M 0 lvm /home vdb 252:16 0 1G 1 disk vdc 252:32 0 1G 1 disk vdd 252:48 0 1G 1 disk sdc 8:32 0 50G 0 disk sdb 8:16 0 1G 1 disk vde 252:64 0 50G 1 disk ================== Both disk are RO as passed by the XML request presented in vdsm.log: Hotplug call to the Direct LUN which is connected via Virt-IO-SCSI: Thread-7394::DEBUG::2014-03-31 17:11:18,534::vm::3565::vm.Vm::(hotplugDisk) vmId=`8e50d783-1973-49e7-861a-b530ca22aa74`::Hotplug disk xml: <disk device="lun" snapshot="no" type="block"> <address bus="0" controller="0" target="0" type="drive" unit="1"/> <source dev="/dev/mapper/3514f0c59af40010b"/> <target bus="scsi" dev="sdd"/> <readonly/> <serial></serial> <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/> </disk> Hotplug call to the Direct LUN which is connected via Virt-IO: Thread-7546::DEBUG::2014-03-31 17:17:06,322::vm::3565::vm.Vm::(hotplugDisk) vmId=`8e50d783-1973-49e7-861a-b530ca22aa74`::Hotplug disk xml: <disk device="lun" snapshot="no" type="block"> <source dev="/dev/mapper/3514f0c59af40010c"/> <target bus="virtio" dev="vdf"/> <readonly/> <serial></serial> <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/> </disk> libvirt.log: 2014-03-31 14:17:06.344+0000: 2818: debug : qemuMonitorAddDrive:2708 : mon=0x7f01500da980 drive=file=/dev/mapper/3514f0c59af40010c,if=none,id=drive-virtio-disk5,readonly=on,format=raw,serial=,cache=none,werror=sto p,rerror=stop,aio=native ================== Expected results: RO direct LUN disk connected via Virt-IO-SCSI is supposed to be write protected. ================= Additional info: engine, vdsm, libvirt, qemu and sanlock logs (notice to time difference of 3 hours between vdsm.log to libvirt.log
Note that RO image disk (not a direct-LUN) connected via Virt-IO-SCSI is write protected as should be.
Need to check if we're passing the RO flag properly in the direct lun scenario. If we are, this should be moved to a lower component than RHEV.
(In reply to Allon Mureinik from comment #2) > Need to check if we're passing the RO flag properly in the direct lun > scenario. > If we are, this should be moved to a lower component than RHEV. We are. Some more related log snippets to prove that: libvirt.log for the iSCSI scenario: 2014-03-31 14:11:18.545+0000: 2822: debug : virDomainAttachDevice:9677 : dom=0x7f0154008670, (VM: name=1, uuid=8e50d783-1973-49e7-861a-b530ca22aa74), xml=<disk device="lun" snapshot="no" type="block"> <address bus="0" controller="0" target="0" type="drive" unit="1"/> <source dev="/dev/mapper/3514f0c59af40010b"/> <target bus="scsi" dev="sdd"/> <readonly/> <serial></serial> <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/> </disk> 2014-03-31 14:11:18.551+0000: 2822: debug : qemuMonitorAddDrive:2708 : mon=0x7f01500da980 drive=file=/dev/mapper/3514f0c59af40010b,if=none,id=drive-scsi0-0-0-1,readonly=on,format=raw,serial=,cache=none,werror=st op,rerror=stop,aio=native engine.log on VM creation with this device attached as RO: {shared=false, iface=scsi, GUID=3514f0c59af40010c, address={unit=3, bus=0, target=0, controller=0, type=drive}, specParams={}, optional=false, pr opagateErrors=off, device=lun, format=raw, sgio=unfiltered, type=disk, readonly=true, deviceId=93a1d9ef-d226-4661-aaf8-60e65bf51c93} (Time of last log message is: 2014-03-31 16:26:01,437 INFO )
Engine passes to vdsm RO=true for the device, and vdsm passes this value correctly to libvirt. So this is a problem with qemu hadling of this value. Any chance this was already fixed in a newer qemu version? If not, in which version should we expect this?
Note that if this is not supported yet, we need to block this option in the engine until it's resolved.
Elad, I'm trying to understand the scope here. Does this reproduce with any Direct LUN (regardless of the interface)? Does this reproduce with image disks with VirtIO-SCSI?
(In reply to Allon Mureinik from comment #6) > Elad, I'm trying to understand the scope here. > > Does this reproduce with any Direct LUN (regardless of the interface)? Does > this reproduce with image disks with VirtIO-SCSI? Reproduced only when using direct LUN connected with VirtIO-SCSI, as stated in comments #0 and #1
(In reply to Elad from comment #7) > (In reply to Allon Mureinik from comment #6) > > Elad, I'm trying to understand the scope here. > > > > Does this reproduce with any Direct LUN (regardless of the interface)? Does > > this reproduce with image disks with VirtIO-SCSI? > > Reproduced only when using direct LUN connected with VirtIO-SCSI, as stated > in comments #0 and #1 Missed that. Thanks Elad!
Returning needinfo on Paolo which was removed by mistake to address question on comment 4
The patch in the external tracker is a last resort - it disables RO VirtIO SCSI Direct LUNs. We're working on it as a contingency plan, but I'd much prefer to support this.
I think disabling read-only virtio-scsi direct LUNs is actually the right fix, because the fix is not trivial. What is the usecase for read-only direct LUNs (in general, not just virtio-scsi)? Making a SCSI LUN read-only requires intercepting passed-through commands and modifying the results (for example change the inquiry data to report the LUN as read-only, and QEMU never needed to do this yet). Is QEMU running as root? If so, one problem is that the kernel doesn't filter write commands on read-only file descriptors when the process has CAP_SYS_RAWIO. This would be a separate fix. But even if QEMU is not running as root, the kernel will treat the disk as writable and write data to the page cache, only to fail later at writeback time. If QEMU is not running as root, you should notice that adding "oflag=direct" causes writes to fail directly, because they bypass the page cache.
(It does not matter much, but for the record, oVirt runs qemu as non-root (user qemu))
Thanks. Allon, can you check that "oflag=direct" works, and create a qemu-kvm RFE for this? vdsm can disable the RO LUNs in the meanwhile.
The functionality to block "illegal" configurations is merged. The UX can be improved, but should not block 3.4.0.
(In reply to Paolo Bonzini from comment #13) > Thanks. Allon, can you check that "oflag=direct" works, and create a > qemu-kvm RFE for this? vdsm can disable the RO LUNs in the meanwhile. Created bug 1095663. Thanks a lot for all your help!
verified in av9 when trying to create virtio-iscsi Lun disk it fails with: "Cannot add Virtual Machine Disk. A VirtIO-SCSI LUN disk can't be read-only. Close"
This bug is verified. I've opened bug 1097754 to continue tracking disabling such illegal configurations in the GUI.
Closing as part of 3.4.0
(In reply to Paolo Bonzini from comment #11) > I think disabling read-only virtio-scsi direct LUNs is actually the right > fix, because the fix is not trivial. What is the usecase for read-only > direct LUNs (in general, not just virtio-scsi)? > > Making a SCSI LUN read-only requires intercepting passed-through commands > and modifying the results (for example change the inquiry data to report the > LUN as read-only, and QEMU never needed to do this yet). > > Is QEMU running as root? If so, one problem is that the kernel doesn't > filter write commands on read-only file descriptors when the process has > CAP_SYS_RAWIO. This would be a separate fix. > > But even if QEMU is not running as root, the kernel will treat the disk as > writable and write data to the page cache, only to fail later at writeback > time. If QEMU is not running as root, you should notice that adding > "oflag=direct" causes writes to fail directly, because they bypass the page > cache. One important use case for direct LUN read-only with virtio-SCSI disks is CloudForms smart-state analysis, which requires the presentation of all LUNs backing a RHEV 3.4 storage domain to the CloudForms appliance via the Direct LUN read-only functionality that is native to RHEV 3.4. It is not uncommon for RHEV storage domains in large deployments to be backed by large numbers of smaller LUNs. In the CloudForms smart-state analysis scenario which requires presenting the backing LUNs to the CloudForms appliance via direct LUN read-only it is very easy to exceed the QEMU PCI device limit of the CloudForms appliance VM if you are constrained to using virtio-blk. Presenting direct LUNs read-only to the CloudForms appliance as virtio-scsi devices is a natural way to get around the QEMU PCI device limitation problem, as now the CloudForms appliance can conduct smart-state analysis of RHEV VMs across hundreds of storage domain backing LUNs. Because of the scope of access that CloudForms has to RHEV VM data across the enterprise it is essential that it be confined to read-only access for the VM disks that it is analyzing. If the direct LUN presentation of storage domain backing LUNs is the only mechanism for conducting smart-state analysis of RHEV VMs and if virtio-SCSI is a necessity for deploying CloudForms on large RHEV environments that RHEV’s direct LUN read-only functionality needs to work with virtio-SCSI disks.
RHEV's direct LUN functionality has the limitation of always doing SCSI passthrough, and not presenting the choice between emulation in QEMU and passthrough. For the simple usecase of "might exceed the PCI device limit", emulation is a better match, and it already supports read-only.
(In reply to Paolo Bonzini from comment #20) > RHEV's direct LUN functionality has the limitation of always doing SCSI > passthrough, and not presenting the choice between emulation in QEMU and > passthrough. > > For the simple usecase of "might exceed the PCI device limit", emulation is > a better match, and it already supports read-only. By SCSI emulation do you mean using RHEV's mechanism of sharing disks between VMs (which does support the use of a read-only flag)? If so the major catch is that CloudForms smart-state analysis functionality expects the LUNs backing the storage domain to presented via Direct LUN - I don't know if CloudForms would do if VM disks were directly shared with it via RHEV's functionality. If it worked it would change the import/discovery process of VM disks at the CloudForms level.
> By SCSI emulation do you mean using RHEV's mechanism of sharing disks between > VMs (which does support the use of a read-only flag)? I don't know. I mean using <disk ... type='block'> instead of <disk ... type='lun'> in the libvirt domain XML.