Bug 1557769
Summary: | Start VM with direct LUN attached with SCSI Pass-Through enabled fails on libvirtError | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Elad <ebenahar> | ||||||||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | yisun | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | 7.5 | CC: | adevolder, agk, areis, bmarzins, bugs, chorn, coughlan, dyuan, ebenahar, famz, jbrassow, jdenemar, jherrman, jiyan, jmoyer, jsuchane, knoel, lmen, michal.skrivanek, minlei, mprivozn, msnitzer, mtessun, pbonzini, ratamir, redhat, salmy, skozina, srodrigu, tnisan, vgoyal, xuzhang, yafu | ||||||||||
Target Milestone: | pre-dev-freeze | Keywords: | Automation, Regression, Upstream, ZStream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | libvirt-4.3.0-1.el7 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
In Red Hat Enterprise Linux 7.5, guests with SCSI passthrough enabled failed to boot because of changes in kernel CGroup detection. With this update, libvirt fetches dependencies and adds them to the device CGroup. As a result, and the affected guests now start as expected.
|
Story Points: | --- | ||||||||||
Clone Of: | |||||||||||||
: | 1562960 1562962 1564996 1568441 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2018-10-30 09:53:14 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1564996 | ||||||||||||
Attachments: |
|
Elad, a first hit from the Google search on the string "cannot get SG_IO version number: Operation not permitted. Is this a SCSI device" led me to https://bugzilla.redhat.com/show_bug.cgi?id=1525829 - is that it? I'm not sure, but worth asking. Alternatively, can you indeed perform a sg query on the device? Which storage is it? (lastly I wonder if it has anything to do with the device aliases). Elad, to check if that's a domain XML issue please run: update vdc_options set option_value=false where option_name='DomainXML' and version='4.2'; On your database, restart Engine and try to reproduce Tal, the bug still happens with DomainXML as false for 4.2 I'll check on RHEL7.4 Created attachment 1409768 [details]
logs7.4
Tested on RHEL7.4, VM starts successfully with direct LUN with SCSI Pass-Through enabled.
Altough, in the domain XML in vdsm.log, sgio is set to filtered so I'm a bit confused:
</disk>
<disk device="lun" sgio="filtered" snapshot="no" type="block">
2018-03-19 11:42:12,613+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.create succeeded in 0.02 seconds (__init__:539)
qemu-kvm-tools-rhev-2.10.0-21.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
qemu-guest-agent-2.8.0-2.el7.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
qemu-kvm-rhev-2.10.0-21.el7.x86_64
libvirt-daemon-driver-interface-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7_4.9.x86_64
vdsm-yajsonrpc-4.19.48-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.19.48-1.el7ev.noarch
libvirt-libs-3.2.0-14.el7_4.9.x86_64
vdsm-xmlrpc-4.19.48-1.el7ev.noarch
libvirt-daemon-driver-nwfilter-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64
vdsm-cli-4.19.48-1.el7ev.noarch
libvirt-daemon-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-nodedev-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-14.el7_4.9.x86_64
vdsm-hook-localdisk-4.19.48-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-img-rhev-2.10.0-21.el7.x86_64
vdsm-api-4.19.48-1.el7ev.noarch
qemu-kvm-common-rhev-2.10.0-21.el7.x86_64
libvirt-daemon-driver-storage-core-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-qemu-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-lxc-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-14.el7_4.9.x86_64
vdsm-hook-ethtool-options-4.19.48-1.el7ev.noarch
libvirt-3.2.0-14.el7_4.9.x86_64
vdsm-python-4.19.48-1.el7ev.noarch
libvirt-daemon-driver-network-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-config-network-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-3.2.0-14.el7_4.9.x86_64
ovirt-imageio-common-1.0.0-0.el7ev.noarch
libvirt-python-3.2.0-3.el7_4.1.x86_64
libvirt-client-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-secret-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64
vdsm-jsonrpc-4.19.48-1.el7ev.noarch
vdsm-4.19.48-1.el7ev.x86_64
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
libvirt-daemon-config-nwfilter-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-14.el7_4.9.x86_64
libvirt-lock-sanlock-3.2.0-14.el7_4.9.x86_64
I don't think this is a libvirt bug. Firstly, I dug out the domain XML from attached logs. The interesting part is this: <disk snapshot="no" type="block" device="lun" sgio="filtered"> <target dev="sda" bus="scsi"/> <source dev="/dev/mapper/3514f0c5a51600274"/> <driver name="qemu" io="native" type="raw" error_policy="stop" cache="none"/> <alias name="ua-4cb96609-0cd7-498d-992f-5c7008dc4b17"/> <address bus="0" controller="0" unit="0" type="drive" target="0"/> <boot order="1"/> </disk> <controller type="scsi" model="virtio-scsi" index="0"> <alias name="ua-f5d3bb3c-5607-4db6-bed9-949425a07b11"/> </controller> Other parts of domain XML are just syntax-sugar from this bug's POV. Now, I am able to reproduce locally (of course if I replace /dev/mapper/... with another non-SCSI device. However, as soon as I pass SCSI device (an iSCSI target in my testing) qemu is able to start again. Regardless of user aliases. Having said that, I think this is a dup of bug that Yaniv linked earlier. What's /dev/mapper/3514f0c5a51600274 for a device? Checked with Kevin Wolf that Bug 1525829 is only about improving the error message. Elad, Can you please confirm that /dev/mapper/3514f0c5a51600274 is a device that accepts the SG_IO ioctl? Also, worth understanding if it happens with 4.1.10 and RHEL 7.5 hosts. Created attachment 1410034 [details] 4.1-el7.5 (In reply to Ala Hino from comment #6) > Checked with Kevin Wolf that Bug 1525829 is only about improving the error > message. > > Elad, > > Can you please confirm that /dev/mapper/3514f0c5a51600274 is a device that > accepts the SG_IO ioctl? Ala, 3514f0c5a51600274 is a LUN provided by XtremIO. This was also tested with Netapp with the same result (In reply to Yaniv Kaul from comment #7) > Also, worth understanding if it happens with 4.1.10 and RHEL 7.5 hosts. Yaniv, The same on latest 4.1.10 RHEL7.5 host: libvirt-daemon-driver-storage-gluster-3.9.0-14.el7.x86_64 vdsm-4.19.50-1.el7ev.x86_64 qemu-guest-agent-2.8.0-2.el7.x86_64 libvirt-daemon-driver-nwfilter-3.9.0-14.el7.x86_64 libvirt-daemon-driver-nodedev-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-rbd-3.9.0-14.el7.x86_64 vdsm-python-4.19.50-1.el7ev.noarch libvirt-client-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-mpath-3.9.0-14.el7.x86_64 vdsm-xmlrpc-4.19.50-1.el7ev.noarch vdsm-cli-4.19.50-1.el7ev.noarch qemu-img-rhev-2.10.0-21.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7.x86_64 libvirt-python-3.9.0-1.el7.x86_64 libvirt-daemon-config-nwfilter-3.9.0-14.el7.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch qemu-kvm-common-rhev-2.10.0-21.el7.x86_64 libvirt-daemon-driver-storage-core-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-iscsi-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-3.9.0-14.el7.x86_64 libvirt-daemon-3.9.0-14.el7.x86_64 libvirt-daemon-driver-interface-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-logical-3.9.0-14.el7.x86_64 vdsm-api-4.19.50-1.el7ev.noarch libvirt-libs-3.9.0-14.el7.x86_64 libvirt-lock-sanlock-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-disk-3.9.0-14.el7.x86_64 libvirt-daemon-kvm-3.9.0-14.el7.x86_64 vdsm-hook-vmfex-dev-4.19.50-1.el7ev.noarch vdsm-yajsonrpc-4.19.50-1.el7ev.noarch libvirt-daemon-driver-network-3.9.0-14.el7.x86_64 libvirt-daemon-driver-secret-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-scsi-3.9.0-14.el7.x86_64 libvirt-daemon-driver-qemu-3.9.0-14.el7.x86_64 vdsm-jsonrpc-4.19.50-1.el7ev.noarch qemu-kvm-tools-rhev-2.10.0-21.el7.x86_64 kernel - 3.10.0-860.el7.x86_64 2018-03-19 19:05:39,355+0200 ERROR (vm/ecc627be) [virt.vm] (vmId='ecc627be-d05a-4846-ad27-d973d9b2524d') The vm start process failed (vm:631) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 2060, in _run self._connection.createXML(domxml, flags), File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3658, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-03-19T17:05:38.960301Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NU MA mappings is obsoleted and will be removed in future (In reply to Elad from comment #8) > 2018-03-19 19:05:39,355+0200 ERROR (vm/ecc627be) [virt.vm] > (vmId='ecc627be-d05a-4846-ad27-d973d9b2524d') The vm start process failed > (vm:631) > Traceback (most recent call last): > File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm > self._run() > File "/usr/share/vdsm/virt/vm.py", line 2060, in _run > self._connection.createXML(domxml, flags), > File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line > 123, in wrapper > ret = f(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in > wrapper > return func(inst, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3658, in > createXML > if ret is None:raise libvirtError('virDomainCreateXML() failed', > conn=self) > libvirtError: internal error: qemu unexpectedly closed the monitor: > 2018-03-19T17:05:38.960301Z qemu-kvm: warning: All CPU(s) up to maxcpus > should be described in NUMA config, ability to start up with partial NU > MA mappings is obsoleted and will be removed in future This is just a harmless warning. The true error message is the one on the next line: 2018-03-19T17:05:39.035831Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Operation not permitted. Is this a SCSI device? Unfortunately, it looks like /dev/mapper/3514f0c5a5160048f cannot handle SG_IO. What's the output of: sginfo /dev/mapper/3514f0c5a5160048f ? [root@storage-ge7-vdsm1 ~]# sginfo /dev/mapper/3514f0c5a5160048f INQUIRY response (cmd: 0x12) ---------------------------- Device Type 0 Vendor: XtremIO Product: XtremApp Revision level: 40f0 The log says "operation not permitted", not "operation not supported". This could be incorrect cgroup management in libvirt. Moving the bug to libvirt Elad, can you please try to reproduce with libvirt out of the picture? /usr/libexec/qemu-kvm \ -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \ -drive file=/dev/mapper/3514f0c5a5160048f,format=raw,if=none,id=drive-scsi0-0-0-1,werror=stop,rerror=stop,cache=none,aio=native \ -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 Also, are there any SELinux messages in the logs? Elad, can you please also reproduce on different setup? Hi, sorry for the delay, we had a power outage here in the labs.
Michal Skrivanek, this was reproduced on 3 environments already (happens every time): 4.2-el7.5-Netapp, 4.2-el7.5-Xtremio, 4.1-el7.4-Xtremio. See above comments
Michal Privoznik,
Seems like the VM starts successfully without libvirt:
[root@storage-ge13-vdsm1 ~]# /usr/libexec/qemu-kvm \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
> -drive file=/dev/mapper/3514f0c5a51601393,format=raw,if=none,id=drive-scsi0-0-0-1,werror=stop,rerror=stop,cache=none,aio=native \
> -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
warning: host doesn't support requested feature: CPUID.01H:ECX.cx16 [bit 13]
VNC server running on ::1:5900
[root@storage-ge13-vdsm1 ~]# ps aux |grep qemu
root 612 0.0 0.0 25036 1792 ? Ss 16:07 0:00 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open,guest-file-close,guest-file-read,guest-file-write,guest-file-seek,guest-file-flush,guest-exec,guest-exec-status -F/etc/qemu-ga/fsfreeze-hook
root 20856 11.9 1.0 792380 59896 pts/0 Sl+ 17:22 0:18 /usr/libexec/qemu-kvm -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -drive file=/dev/mapper/3514f0c5a51601393,format=raw,if=none,id=drive-scsi0-0-0-1,werror=stop,rerror=stop,cache=none,aio=native -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
Sorry, on 4.1-el7.4-Xtremio the bug didn't reproduce (comment #4) Just to put findings of my investigation somewhere before I forget them. Here's minimalistic domain XML which reproduces the bug: <domain type='kvm'> <name>testdom</name> <uuid>9ecd05ac-a83d-497b-a9ab-a523b6239d73</uuid> <memory unit='KiB'>262144</memory> <currentMemory unit='KiB'>262144</currentMemory> <vcpu placement='static'>1</vcpu> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.5.0'>hvm</type> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='block' device='lun' sgio='filtered' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/dev/mapper/3514f0c5a5160138f'/> <target dev='sda' bus='scsi'/> <boot order='1'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <controller type='usb' index='0' model='piix3-uhci'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='scsi' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </controller> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </memballoon> </devices> <seclabel type='static' model='dac' relabel='yes'> <label>root:root</label> </seclabel> </domain> If I disable cgroups in qemu.conf (cgroup_controllers = []) the domain is able to start. I've managed to reproduce this outside of libvirt too. Problem indeed is cgroup management. Libvirt allows /dev/mapper/XXX (which is a symlink to /dev/dm-N). However, /dev/dm-N is a multipath device, so we need to allow all the devices that multipath consists of. Indeed: # multipath -l 3514f0c5a5160138f dm-2 XtremIO ,XtremApp size=150G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=0 status=active `- 3:0:0:1 sdb 8:16 active undef unknown so adding /dev/sdb in cgroup_device_acl in qemu.conf makes everything work again. Now question is, whether libvirt should try getting all devices belonging to a multipath device OR its admin responsibility to allow them in qemu.conf. However, from git-log is seems like libvirt never cared. So if this has ever worked something outside libvirt must have changed. Regardless of my previous comment, we need to resolve this ASAP (instead of trying to find what has changed outside of libvirt) so I've proposed patches upstream: https://www.redhat.com/archives/libvir-list/2018-March/msg01541.html reproduced on libvirt-3.9.0-14.el7_5.2.x86_64 with following steps: =================================================================== [root@hp-dl360eg8-06 15632705]# rpm -qa | grep libvirt-3 libvirt-3.9.0-14.el7_5.2.x86_64 [root@hp-dl360eg8-06 15632705]# virsh domblklist vm1 Target Source ------------------------------------------------ sda /dev/mapper/mpathb [root@hp-dl360eg8-06 15632705]# virsh start vm1 error: Failed to start domain vm1 error: internal error: qemu unexpectedly closed the monitor: 2018-03-26T08:09:22.018327Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Operation not permitted. Is this a SCSI device? And on scratch build the issue gone, so qa_ack+ this bug: =================================================================== [root@hp-dl360eg8-06 15632705]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@hp-dl360eg8-06 15632705]# rpm -qa | grep libvirt-3 libvirt-3.9.0-15.el7_5.2mp.x86_64 [root@hp-dl360eg8-06 15632705]# virsh start vm1 Domain vm1 started [root@hp-dl360eg8-06 15632705]# virsh dumpxml vm1 | grep sgio -A8 <disk type='block' device='lun' sgio='filtered' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/dev/mapper/mpathb'/> <backingStore/> <target dev='sda' bus='scsi'/> <boot order='1'/> <alias name='scsi0-0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> [root@hp-dl360eg8-06 15632705]# virsh edit vm1 Domain vm1 XML configuration edited. [root@hp-dl360eg8-06 15632705]# virsh destroy vm1; virsh start vm1 Domain vm1 destroyed vDomain vm1 started [root@hp-dl360eg8-06 15632705]# virsh dumpxml vm1 | grep sgio -A8 <disk type='block' device='lun' sgio='unfiltered' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/dev/mapper/mpathb'/> <backingStore/> <target dev='sda' bus='scsi'/> <boot order='1'/> <alias name='scsi0-0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> So after some more investigation this looks like a kernel bug to me. I've even written a small reproducer that allowed me to reproduce this bug even without libvirt/qemu. All you need is a devmapper target, for instance I am using: # dmsetup create blah --table "0 10 linear /dev/sdb 0" and then I can run the reproducer like this: # ./repro.sh /dev/mapper/blah And on 7.4 everything works and the script prints out scsi version. However, on 7.5 I get this error: ioctl: Operation not permitted. So it is a regression. But not in libvirt rather than kernel. Created attachment 1414045 [details]
devmapper_repro.tar.gz
(In reply to Michal Privoznik from comment #32) > # dmsetup create blah --table "0 10 linear /dev/sdb 0" Is anything different if you make the size of that device 'blah' match the size of /dev/sdb? static int dm_get_bdev_for_ioctl(struct mapped_device *md, ... r = tgt->type->prepare_ioctl(tgt, bdev, mode); ... r = blkdev_get(*bdev, *mode, _dm_claim_ptr); ... return r; dm_blk_ioctl() calls this and expects to see the result of ->prepare_ioctl() but that gets clobbered by blkdev_get() ? (In reply to Michal Privoznik from comment #33) > Created attachment 1414045 [details] > devmapper_repro.tar.gz what is ./devmapper suppose to do in that script? It isn't in the tarball you included. Maybe try reverting this one: commit 8a589be04b93bfe27c5f6ea3d6781eea90794916 Author: Mike Snitzer <snitzer> Date: Thu Feb 22 21:02:50 2018 -0500 [md] dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl Message-id: <1519333370-21773-1-git-send-email-snitzer> Patchwork-id: 206006 O-Subject: [RHEL7.5 PATCH] dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl Bugzilla: 1513037 RH-Acked-by: Benjamin Marzinski <bmarzins> RH-Acked-by: Heinz Mauelshagen <heinzm> BZ: 1513037 Brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15385976 The referenced commit is staged for 4.16-rc inclusion via linux-dm.git, see: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16&id=51a05338a6f82d53843743c3813c52b02ca24ff5 Tested to pass the mptest tests on my testbed (which tests issuing pass-through ioctls using the dm_blk_ioctl() interface). (In reply to Jonathan Earl Brassow from comment #38) > (In reply to Michal Privoznik from comment #33) > > Created attachment 1414045 [details] > > devmapper_repro.tar.gz > > what is ./devmapper suppose to do in that script? It isn't in the tarball > you included. It is the included devmapper.c that once built is devmapper binary. I'll try this reproducer now. (In reply to Mike Snitzer from comment #42) > I'll try this reproducer now. # modprobe scsi_debug (scsi_debug created /dev/sdb) # ./repro.sh /dev/sdb sg version: 30527 # dmsetup create blah --table "0 10 linear /dev/sdb 0" # ./repro.sh /dev/mapper/blah ioctl: Operation not permitted # uname -r 4.16.0-rc6.snitm+ SO even with a very recent upstream kernel this doesn't work. Reverting upstream commit 519049afead4f7c3e6446028c41e99fde958cc04 ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") enables the reproducer to work: # ./repro.sh /dev/mapper/blah sg version: 30527 But sadly just fixing the error code propagation, like proposed in comment#40, doesn't. This is very much a DM meets cgroups issue. There is something about using blkdev_get() that is clamping down on cgroup permissions: # ./devmapper /dev/mapper/blah sg version: 30527 # ./repro.sh /dev/mapper/blah ioctl: Operation not permitted (repro.sh is imposing cgroups) I know next to nothing about cgroups or CONFIG_BLK_CGROUP related code. It could easily be DM is missing proper cgroup propagation. In fact Vivek proposed this change a while ago but I never took it (never mind that bio_associate_current wasn't exported from non-block layer use, etc): https://patchwork.kernel.org/patch/8485451/ Upstream has seen __bio_clone_fast() and bio_clone_bioset() changes to call a new bio_clone_blkcg_association() interface that RHEL7.5 doesn't have. But even with those advances upstream isn't working. Cc'ing Vivek, Ming and Jeff in the hope that they have additional insight on why using blkdev_get(), like upstream commit 519049afea introduced, would cause issues with cgroup permissions. comment#22 above says that adding the multipath device's underlying "/dev/sdb in cgroup_device_acl in qemu.conf makes everything work again." Could it be that DM is just blind to cgroups, especially so in the context of RHEL7, and as a result it enabled cgroup enforcing infrastructure to be blissfully unaware that it wasn't plumbing things in properly? Leaving said infrastructure exposed (thinking cgroups was working, but in reality DM was just ignoring and blind to it all?) I added some extra debugging to dm code. # tail -f /var/log/messages & ... # ./repro.sh /dev/mapper/blah ioctl: Operation not permitted Mar 29 20:30:04 thegoat kernel: device-mapper: core: dm_get_bdev_for_ioctl: blkdev_get failed with -1 SO it is clear that the use of blkdev_get() is the limiting factor IFF cgroups are used (which we already basically knew given comment#43 details reverting upstream commit 519049a "fixes" the issue). I spoke with Vivek and now have a better handle on the scope of this issue: device cgroups is what are being used to control access, libvirt has enjoyed the ability to only add the top-level /dev/mapper/<multipath> device to guest's device cgroup. The ability to open the top-level multipath device implicitly gives the guest the ability to issue IO to the multipath's underlying device(s). So the question is: should we or shouldn't we carry this fiction through to the DM multipath passthrough ioctl interface? -- Doing so implies the need for a change to either bypass the device cgroup check (akin to __blkdev_get()'s 'for_part' flag) or some other solution (vivek had an idea to pursue about flipping to the root cgroup if the ioctl wasn't issued to a partition). But all said, the way to skin this device cgroup ioctl permission issue in the kernel needs further design and upstream discussion. Which is on a longer timescale than arriving at the 0day solution. I'm left unsure which way we should go with the 0day: option A: 1) go with the libvirt 0day that adds all underlying devices (comment#23 and comment#24) 2) _and_ a kernel 0day that fixes the return code issue detailed in comment#40 - this would serve to preserve the fix, rhel7.git commit 8a589be04b9, that went itn to address bug#1513037 (customer escalation issue). Or option B: 1) revert rhel7.git commit 8a589be04b9 2) work upstream to establish consensus on the broader issue of whether the DM ioctl interface should just implicitly allow ioctls, as in allow device cgroups permissions, to underlying devices if they are issued to a top-level device that covers the entire underlying device (as is the case with DM multipath.. though a multipath device can be partitioned with linear dm devices ontop) Sadly both of these options imply a 0day kernel change is needed no matter what. Given that, I'm inclined to go with option A because we have a libvirt workaround; but could users just update to the RHEL7.5 kernel but _not_ update libvirt? If so then they'd get boot failures for the virt guest config in question, so option A may not be acceptable. My preference is option A - a minimal/easy/safe kernel fix to correct the committed patch - long-term userspace change that accepts that all layers must be validated B(2) seems wrong to me - while disk partitions are very tightly defined and the underlying 'whole disk' is merely an in-kernel implementation detail and so you can make a coherent argument that permission is necessarily implied, when you use dm, the device stacking is completely arbitrary - and dynamically changeable - and so I think it's wrong to infer that permission to use a top layer automatically implies permission to use whatever happens to be underneath. Well, even though we have libvirt workaround if we go with option A we will need workaround for every other app that relies on CGroups and is using DM. This potentially includes customer written applications. A change in behaviour like this is undesired IMO between minor releases, therefore I vote for option B. For doing IO to underlying device, we don't have to add that device to device cgroup and adding top level device is enough. But for issuing ioctl, one has to add underlying device, that feels like a contradiction to me. Device cgroup seems to be able to control 3 types of permissions. read (r), write (w) and mknod (m). So by adding top level device, one automatically gets permissions to do r/w on underlying device (Through dm device). I am wondering why ioctls should be any different. (In reply to Michal Privoznik from comment #48) > Well, even though we have libvirt workaround if we go with option A we will > need workaround for every other app that relies on CGroups and is using DM. > This potentially includes customer written applications. A change in > behaviour like this is undesired IMO between minor releases, therefore I > vote for option B. We need to deal with what we know not be paranoid about the unknown. Reality is that there are very few applications that are using cgroups and ioctls. If there were more it wouldn't have taken until the 11th hour for us to become aware of this 7.5 problem. (In reply to Vivek Goyal from comment #49) > For doing IO to underlying device, we don't have to add that device to > device cgroup and adding top level device is enough. But for issuing ioctl, > one has to add underlying device, that feels like a contradiction to me. OK but ioctls aren't normal IO. An ioctl is inherently out-of-band and (potentially) invasive. SO while this may feel like a contradiction they are completely disjoint capabilities. (In reply to Mike Snitzer from comment #51) > OK but ioctls aren't normal IO. An ioctl is inherently out-of-band and > (potentially) invasive. SO while this may feel like a contradiction they > are completely disjoint capabilities. Sure, if that's the desire then it should be implemented in device cgroup. That is a separate control for ioctls. But as of now there are only 3 controls. read, write and mknod. And any restrictions on ioctls are pure side affects of how code has been implemented. In the absence of any explicit control for ioctl in device cgroup, I would think that ioctl fall into same category as read/write operation and should be treated accordingly. (In reply to Vivek Goyal from comment #52) > (In reply to Mike Snitzer from comment #51) > > OK but ioctls aren't normal IO. An ioctl is inherently out-of-band and > > (potentially) invasive. SO while this may feel like a contradiction they > > are completely disjoint capabilities. > > Sure, if that's the desire then it should be implemented in device cgroup. > That is a separate control for ioctls. > > But as of now there are only 3 controls. read, write and mknod. And any > restrictions on ioctls are pure side affects of how code has been > implemented. > > In the absence of any explicit control for ioctl in device cgroup, I would > think that ioctl fall into same category as read/write operation and should > be treated accordingly. As is DM calls blkdev_get_by_dev() for each underlying device listed in the top-level multipath device's DM table. So I'm struggling to appreciate how the virt team isn't hitting the same device cgroup permission issue on DM multipath table load (initial open for read/write) that they are for this ioctl case. But I'll look closer. (In reply to Mike Snitzer from comment #53) > As is DM calls blkdev_get_by_dev() for each underlying device listed in the > top-level multipath device's DM table. So I'm struggling to appreciate how > the virt team isn't hitting the same device cgroup permission issue on DM > multipath table load (initial open for read/write) that they are for this > ioctl case. > > But I'll look closer. Jeff Moyer helped me reason through the difference: the initial DM multipath table load (or the reproducer's linear device creation/load) is done using the root cgroup. Whereas the guest's ioctl is being issued from within, or using, the created cgroup (which only has the multipath device being "allowed"). It just so happens that the DM passthrough ioctl code in 7.5's implementation now does a blkdev_get(). But in the end this isn't DM's cgroup inconsistency. It is the cgroup user's inconsistency (in this case: libvirt). Basically the guest has _never_ been allowed, on a device cgroup level, to issue ioctls or read/write IO to the underlying DM devices. Just that the device cgroup permission check was never performed until now (via dm's extra blkdev_get()). And furthermore: normal IO is being issued to the multipath device, from within the restricted cgroup, without the need to blkdev_get() the multipath's underlying device(s). Therefore, even though a future open of the underlying devices would fail within the guest: the guest is blissfully unaware that DM multipath is actually issuing IO to the underlying devices _without_ validated cgroup permission. Just spoke with Linda Wang: a 7.5 0day kernel is _not_ possible. So that leaves us with having to execute on a revised "option A" from comment#46: 1) go with the libvirt 0day that adds all underlying devices (comment#23 and comment#24) to the cgroup 2) _and_ fix the return code issue detailed in comment#40 via z-stream - this serves to preserve the fix, rhel7.git commit 8a589be04b9, that went in to address bug#1513037 (customer escalation issue). In addition, there is another z-stream fix that is needed for DM, this upstream commit needs backporting to various RHEL7 z-streams: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.17&id=e26a42c55b08ddaeac284ceea951ad379453473c v3: https://www.redhat.com/archives/libvir-list/2018-April/msg00083.html BTW: I was surprised that this bug did not reproduce on my 4.15-vanilla. But after upgrading to 4.16-vanilla it started to reproduce, so this is not RHEL specific anymore. (In reply to Michal Privoznik from comment #57) > v3: > > https://www.redhat.com/archives/libvir-list/2018-April/msg00083.html > > BTW: I was surprised that this bug did not reproduce on my 4.15-vanilla. But > after upgrading to 4.16-vanilla it started to reproduce, so this is not RHEL > specific anymore. Right, the blkdev_get() change only just went upstream during the 4.16 merge window. I've just pushed the patches upstream: ommit cd9bbb7fad5102013b202a8a066798ef23eb15ac Author: Michal Privoznik <mprivozn> AuthorDate: Mon Mar 26 07:11:42 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Thu Apr 5 16:53:19 2018 +0200 news: Document device mapper fix Signed-off-by: Michal Privoznik <mprivozn> commit 6dd84f6850ca4379203d1e7b999430ed59041208 Author: Michal Privoznik <mprivozn> AuthorDate: Thu Apr 5 09:34:25 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Thu Apr 5 16:52:55 2018 +0200 qemu_cgroup: Handle device mapper targets properly https://bugzilla.redhat.com/show_bug.cgi?id=1557769 Problem with device mapper targets is that there can be several other devices 'hidden' behind them. For instance, /dev/dm-1 can consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when setting up devices CGroup and namespaces we have to take this into account. This bug was exposed after Linux kernel was fixed. Initially, kernel used different functions for getting block device in open() and ioctl(). While CGroup permissions were checked in the former case, due to a bug in kernel they were not checked in the latter case. This changed with the upstream commit of 519049afead4f7c3e6446028c41e99fde958cc04 (v4.16-rc5~11^2~4). Signed-off-by: Michal Privoznik <mprivozn> commit fd9d1e686db64fa9481b9eab4dabafa46713e2cf Author: Michal Privoznik <mprivozn> AuthorDate: Mon Mar 26 14:48:07 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Thu Apr 5 09:58:44 2018 +0200 util: Introduce virDevMapperGetTargets This helper fetches dependencies for given device mapper target. At the same time, we need to provide a dummy log function because by default libdevmapper prints out error messages to stderr which we need to suppress. Signed-off-by: Michal Privoznik <mprivozn> v4.2.0-48-gcd9bbb7fad Tal, do we have a clone for this for consuming the fix in RHV? Seems like we don't, you've opened this bug on RHEL directly, the correct way IMO was to open a RHV bug and a RHEL bug that clocks him, can you please clone? No, I opened the bug on RHV and it was moved to RHEL in comment #12. Anyway, I was asked in https://bugzilla.redhat.com/show_bug.cgi?id=1564996#c7 to test it on RHV with the fix and I think it would be better if we have a clone for RHV to consume the fix OK done - bug 1568441 1. Prepare a multipath device as follow: # multipath -ll mpathb (3600140520321d9fc74c4a79bb492bd37) dm-3 LIO-ORG ,device.logical- size=2.0G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- 4:0:0:0 sdb 8:16 active ready running Verified with 1. Having a multipath device [root@amd-9600b-8-1 ~]# multipath -ll mpathb (3600140520321d9fc74c4a79bb492bd37) dm-3 LIO-ORG ,device.logical- size=2.0G features='0' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=1 status=active `- 4:0:0:0 sdb 8:16 active ready running 2. Having a shutdown vm with following xml # virsh dumpxml avocado-vt-vm1 ... <disk type='block' device='lun' sgio='filtered' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/dev/mapper/mpathb'/> <backingStore/> <target dev='sda' bus='scsi'/> <boot order='1'/> <alias name='scsi0-0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> 3. start the vm and check the image is in use by it # virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started # virsh domblklist avocado-vt-vm1 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2 sda /dev/mapper/mpathb 4. check the same steps as above with disk hotplug # cat disk <disk type='block' device='lun' sgio='filtered' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/dev/mapper/mpathb'/> <backingStore/> <target dev='sda' bus='scsi'/> <boot order='1'/> </disk> # virsh attach-device avocado-vt-vm1 disk Device attached successfully # virsh domblklist avocado-vt-vm1 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2 sda /dev/mapper/mpathb Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |
Created attachment 1409486 [details] logs Description of problem: Failure to start VM that has a direct LUN attached with SCSI Pass-Through enabled (sgio unfiltered). Version-Release number of selected component (if applicable): RHEL7.5 kernel - 3.10.0-861.el7.x86_64 sanlock-python-3.6.0-1.el7.x86_64 libvirt-daemon-driver-nodedev-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-iscsi-3.9.0-14.el7.x86_64 libselinux-utils-2.5-12.el7.x86_64 vdsm-yajsonrpc-4.20.22-1.el7ev.noarch vdsm-http-4.20.22-1.el7ev.noarch vdsm-hook-fcoe-4.20.22-1.el7ev.noarch selinux-policy-3.13.1-192.el7.noarch libvirt-daemon-driver-nwfilter-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-rbd-3.9.0-14.el7.x86_64 libvirt-3.9.0-14.el7.x86_64 vdsm-python-4.20.22-1.el7ev.noarch vdsm-hook-vmfex-dev-4.20.22-1.el7ev.noarch sanlock-3.6.0-1.el7.x86_64 selinux-policy-targeted-3.13.1-192.el7.noarch libvirt-libs-3.9.0-14.el7.x86_64 libvirt-daemon-3.9.0-14.el7.x86_64 libvirt-daemon-driver-qemu-3.9.0-14.el7.x86_64 libvirt-daemon-config-nwfilter-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-scsi-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-mpath-3.9.0-14.el7.x86_64 libvirt-daemon-kvm-3.9.0-14.el7.x86_64 qemu-img-rhev-2.10.0-21.el7_5.1.x86_64 vdsm-client-4.20.22-1.el7ev.noarch vdsm-4.20.22-1.el7ev.x86_64 vdsm-hook-vhostmd-4.20.22-1.el7ev.noarch vdsm-hook-openstacknet-4.20.22-1.el7ev.noarch libselinux-python-2.5-12.el7.x86_64 sanlock-lib-3.6.0-1.el7.x86_64 libvirt-client-3.9.0-14.el7.x86_64 libvirt-python-3.9.0-1.el7.x86_64 libvirt-daemon-driver-storage-core-3.9.0-14.el7.x86_64 libvirt-daemon-driver-secret-3.9.0-14.el7.x86_64 libvirt-daemon-driver-lxc-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-logical-3.9.0-14.el7.x86_64 libvirt-lock-sanlock-3.9.0-14.el7.x86_64 vdsm-api-4.20.22-1.el7ev.noarch vdsm-jsonrpc-4.20.22-1.el7ev.noarch qemu-kvm-common-rhev-2.10.0-21.el7_5.1.x86_64 qemu-guest-agent-2.8.0-2.el7.x86_64 vdsm-hook-vfio-mdev-4.20.22-1.el7ev.noarch libselinux-2.5-12.el7.x86_64 libvirt-daemon-driver-network-3.9.0-14.el7.x86_64 libvirt-daemon-config-network-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-3.9.0-14.el7.x86_64 vdsm-common-4.20.22-1.el7ev.noarch vdsm-network-4.20.22-1.el7ev.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch libvirt-daemon-driver-interface-3.9.0-14.el7.x86_64 libvirt-daemon-driver-storage-disk-3.9.0-14.el7.x86_64 vdsm-hook-ethtool-options-4.20.22-1.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Create a VM with a direct LUN attached with SCSI Pass-Through enabled 2. Start the VM Actual results: 2018-03-18 15:23:00,000+0200 ERROR (vm/9afe8eaf) [virt.vm] (vmId='9afe8eaf-0ae7-4a00-b4af-374d4211a237') The vm start process failed (vm:940) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2832, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: internal error: process exited while connecting to monitor: 2018-03-18T13:22:56.863904Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with part ial NUMA mappings is obsoleted and will be removed in future 2018-03-18T13:22:56.915676Z qemu-kvm: -device scsi-block,bus=ua-459df768-ae29-42d1-a9cb-15a42ba29024.0,channel=0,scsi-id=0,lun=0,drive=drive-ua-f9216966-e220-4b01-8ac1-db57fe227b06,id=ua-f9216966-e220-4b01-8ac1- db57fe227b06: cannot get SG_IO version number: Operation not permitted. Is this a SCSI device? Expected results: Start VM should succeed Additional info: logs