1557769 – Start VM with direct LUN attached with SCSI Pass-Through enabled fails on libvirtError

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1557769 - Start VM with direct LUN attached with SCSI Pass-Through enabled fails on libvirtError

Summary: Start VM with direct LUN attached with SCSI Pass-Through enabled fails on lib...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	pre-dev-freeze
Target Release:	---
Assignee:	Michal Privoznik
QA Contact:	yisun
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1564996
TreeView+	depends on / blocked

Reported:	2018-03-18 13:54 UTC by Elad
Modified:	2021-12-10 15:49 UTC (History)
CC List:	33 users (show)
Fixed In Version:	libvirt-4.3.0-1.el7
Doc Type:	Bug Fix
Doc Text:	In Red Hat Enterprise Linux 7.5, guests with SCSI passthrough enabled failed to boot because of changes in kernel CGroup detection. With this update, libvirt fetches dependencies and adds them to the device CGroup. As a result, and the affected guests now start as expected.
Clone Of:
Clones:	1562960 1562962 1564996 1568441 (view as bug list)
Environment:
Last Closed:	2018-10-30 09:53:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (1.63 MB, application/x-gzip) 2018-03-18 13:54 UTC, Elad	no flags	Details
logs7.4 (274.92 KB, application/x-gzip) 2018-03-19 10:00 UTC, Elad	no flags	Details
4.1-el7.5 (193.11 KB, application/x-gzip) 2018-03-19 17:10 UTC, Elad	no flags	Details
devmapper_repro.tar.gz (851 bytes, application/x-gzip) 2018-03-28 07:58 UTC, Michal Privoznik	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Article)	3401221	0	None	None	None	2018-04-06 01:28:32 UTC
Red Hat Product Errata	RHSA-2018:3113	0	None	None	None	2018-10-30 09:55:20 UTC

Internal Links: 1562049 1562074

Description Elad 2018-03-18 13:54:41 UTC

Created attachment 1409486 [details]
logs

Description of problem:
Failure to start VM that has a direct LUN attached with SCSI Pass-Through enabled (sgio unfiltered).

Version-Release number of selected component (if applicable):
RHEL7.5
kernel - 3.10.0-861.el7.x86_64
sanlock-python-3.6.0-1.el7.x86_64
libvirt-daemon-driver-nodedev-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-iscsi-3.9.0-14.el7.x86_64
libselinux-utils-2.5-12.el7.x86_64
vdsm-yajsonrpc-4.20.22-1.el7ev.noarch
vdsm-http-4.20.22-1.el7ev.noarch
vdsm-hook-fcoe-4.20.22-1.el7ev.noarch
selinux-policy-3.13.1-192.el7.noarch
libvirt-daemon-driver-nwfilter-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-rbd-3.9.0-14.el7.x86_64
libvirt-3.9.0-14.el7.x86_64
vdsm-python-4.20.22-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.20.22-1.el7ev.noarch
sanlock-3.6.0-1.el7.x86_64
selinux-policy-targeted-3.13.1-192.el7.noarch
libvirt-libs-3.9.0-14.el7.x86_64
libvirt-daemon-3.9.0-14.el7.x86_64
libvirt-daemon-driver-qemu-3.9.0-14.el7.x86_64
libvirt-daemon-config-nwfilter-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-scsi-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-mpath-3.9.0-14.el7.x86_64
libvirt-daemon-kvm-3.9.0-14.el7.x86_64
qemu-img-rhev-2.10.0-21.el7_5.1.x86_64
vdsm-client-4.20.22-1.el7ev.noarch
vdsm-4.20.22-1.el7ev.x86_64
vdsm-hook-vhostmd-4.20.22-1.el7ev.noarch
vdsm-hook-openstacknet-4.20.22-1.el7ev.noarch
libselinux-python-2.5-12.el7.x86_64
sanlock-lib-3.6.0-1.el7.x86_64
libvirt-client-3.9.0-14.el7.x86_64
libvirt-python-3.9.0-1.el7.x86_64
libvirt-daemon-driver-storage-core-3.9.0-14.el7.x86_64
libvirt-daemon-driver-secret-3.9.0-14.el7.x86_64
libvirt-daemon-driver-lxc-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-logical-3.9.0-14.el7.x86_64
libvirt-lock-sanlock-3.9.0-14.el7.x86_64
vdsm-api-4.20.22-1.el7ev.noarch
vdsm-jsonrpc-4.20.22-1.el7ev.noarch
qemu-kvm-common-rhev-2.10.0-21.el7_5.1.x86_64
qemu-guest-agent-2.8.0-2.el7.x86_64
vdsm-hook-vfio-mdev-4.20.22-1.el7ev.noarch
libselinux-2.5-12.el7.x86_64
libvirt-daemon-driver-network-3.9.0-14.el7.x86_64
libvirt-daemon-config-network-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-3.9.0-14.el7.x86_64
vdsm-common-4.20.22-1.el7ev.noarch
vdsm-network-4.20.22-1.el7ev.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
libvirt-daemon-driver-interface-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-disk-3.9.0-14.el7.x86_64
vdsm-hook-ethtool-options-4.20.22-1.el7ev.noarch


How reproducible:
Always

Steps to Reproduce:
1. Create a VM with a direct LUN attached with SCSI Pass-Through enabled
2. Start the VM


Actual results:

2018-03-18 15:23:00,000+0200 ERROR (vm/9afe8eaf) [virt.vm] (vmId='9afe8eaf-0ae7-4a00-b4af-374d4211a237') The vm start process failed (vm:940)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2832, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: internal error: process exited while connecting to monitor: 2018-03-18T13:22:56.863904Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with part
ial NUMA mappings is obsoleted and will be removed in future
2018-03-18T13:22:56.915676Z qemu-kvm: -device scsi-block,bus=ua-459df768-ae29-42d1-a9cb-15a42ba29024.0,channel=0,scsi-id=0,lun=0,drive=drive-ua-f9216966-e220-4b01-8ac1-db57fe227b06,id=ua-f9216966-e220-4b01-8ac1-
db57fe227b06: cannot get SG_IO version number: Operation not permitted.  Is this a SCSI device?



Expected results:
Start VM should succeed

Additional info:
logs

Comment 1 Yaniv Kaul 2018-03-19 07:31:17 UTC

Elad, a first hit from the Google search on the string "cannot get SG_IO version number: Operation not permitted. Is this a SCSI device" led me to https://bugzilla.redhat.com/show_bug.cgi?id=1525829 - is that it? I'm not sure, but worth asking.

Alternatively, can you indeed perform a sg query on the device? Which storage is it?

(lastly I wonder if it has anything to do with the device aliases).

Comment 2 Tal Nisan 2018-03-19 09:14:50 UTC

Elad, to check if that's a domain XML issue please run:

update vdc_options set option_value=false where option_name='DomainXML' and version='4.2';

On your database, restart Engine and try to reproduce

Comment 3 Elad 2018-03-19 09:23:11 UTC

Tal, the bug still happens with DomainXML as false for 4.2

I'll check on RHEL7.4

Comment 4 Elad 2018-03-19 10:00:15 UTC

Created attachment 1409768 [details]
logs7.4

Tested on RHEL7.4, VM starts successfully with direct LUN with SCSI Pass-Through enabled.
Altough, in the domain XML in vdsm.log, sgio is set to filtered so I'm a bit confused:



        </disk>
        <disk device="lun" sgio="filtered" snapshot="no" type="block">



2018-03-19 11:42:12,613+0200 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.create succeeded in 0.02 seconds (__init__:539)



 
qemu-kvm-tools-rhev-2.10.0-21.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
qemu-guest-agent-2.8.0-2.el7.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
qemu-kvm-rhev-2.10.0-21.el7.x86_64
libvirt-daemon-driver-interface-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7_4.9.x86_64
vdsm-yajsonrpc-4.19.48-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.19.48-1.el7ev.noarch
libvirt-libs-3.2.0-14.el7_4.9.x86_64
vdsm-xmlrpc-4.19.48-1.el7ev.noarch
libvirt-daemon-driver-nwfilter-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64
vdsm-cli-4.19.48-1.el7ev.noarch
libvirt-daemon-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-nodedev-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-14.el7_4.9.x86_64
vdsm-hook-localdisk-4.19.48-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-img-rhev-2.10.0-21.el7.x86_64
vdsm-api-4.19.48-1.el7ev.noarch
qemu-kvm-common-rhev-2.10.0-21.el7.x86_64
libvirt-daemon-driver-storage-core-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-qemu-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-lxc-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-14.el7_4.9.x86_64
vdsm-hook-ethtool-options-4.19.48-1.el7ev.noarch
libvirt-3.2.0-14.el7_4.9.x86_64
vdsm-python-4.19.48-1.el7ev.noarch
libvirt-daemon-driver-network-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-config-network-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-3.2.0-14.el7_4.9.x86_64
ovirt-imageio-common-1.0.0-0.el7ev.noarch
libvirt-python-3.2.0-3.el7_4.1.x86_64
libvirt-client-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-secret-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64
vdsm-jsonrpc-4.19.48-1.el7ev.noarch
vdsm-4.19.48-1.el7ev.x86_64
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
libvirt-daemon-config-nwfilter-3.2.0-14.el7_4.9.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-14.el7_4.9.x86_64
libvirt-lock-sanlock-3.2.0-14.el7_4.9.x86_64

Comment 5 Michal Privoznik 2018-03-19 10:35:44 UTC

I don't think this is a libvirt bug. Firstly, I dug out the domain XML from attached logs. The interesting part is this:

    <disk snapshot="no" type="block" device="lun" sgio="filtered">
      <target dev="sda" bus="scsi"/>
      <source dev="/dev/mapper/3514f0c5a51600274"/>
      <driver name="qemu" io="native" type="raw" error_policy="stop" cache="none"/>
      <alias name="ua-4cb96609-0cd7-498d-992f-5c7008dc4b17"/>
      <address bus="0" controller="0" unit="0" type="drive" target="0"/>
      <boot order="1"/>
    </disk>

    <controller type="scsi" model="virtio-scsi" index="0">
      <alias name="ua-f5d3bb3c-5607-4db6-bed9-949425a07b11"/>
    </controller>


Other parts of domain XML are just syntax-sugar from this bug's POV. Now, I am able to reproduce locally (of course if I replace /dev/mapper/... with another non-SCSI device. However, as soon as I pass SCSI device (an iSCSI target in my testing) qemu is able to start again. Regardless of user aliases. Having said that, I think this is a dup of bug that Yaniv linked earlier.

What's /dev/mapper/3514f0c5a51600274 for a device?

Comment 6 Ala Hino 2018-03-19 14:47:51 UTC

Checked with Kevin Wolf that Bug 1525829 is only about improving the error message. 

Elad,

Can you please confirm that /dev/mapper/3514f0c5a51600274 is a device that accepts the SG_IO ioctl?

Comment 7 Yaniv Kaul 2018-03-19 14:57:45 UTC

Also, worth understanding if it happens with 4.1.10 and RHEL 7.5 hosts.

Comment 8 Elad 2018-03-19 17:10:14 UTC

Created attachment 1410034 [details]
4.1-el7.5

(In reply to Ala Hino from comment #6)
> Checked with Kevin Wolf that Bug 1525829 is only about improving the error
> message. 
> 
> Elad,
> 
> Can you please confirm that /dev/mapper/3514f0c5a51600274 is a device that
> accepts the SG_IO ioctl?

Ala,
3514f0c5a51600274 is a LUN provided by XtremIO.
This was also tested with Netapp with the same result


(In reply to Yaniv Kaul from comment #7)
> Also, worth understanding if it happens with 4.1.10 and RHEL 7.5 hosts.

Yaniv, The same on latest 4.1.10 RHEL7.5 host:


libvirt-daemon-driver-storage-gluster-3.9.0-14.el7.x86_64
vdsm-4.19.50-1.el7ev.x86_64
qemu-guest-agent-2.8.0-2.el7.x86_64
libvirt-daemon-driver-nwfilter-3.9.0-14.el7.x86_64
libvirt-daemon-driver-nodedev-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-rbd-3.9.0-14.el7.x86_64
vdsm-python-4.19.50-1.el7ev.noarch
libvirt-client-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-mpath-3.9.0-14.el7.x86_64
vdsm-xmlrpc-4.19.50-1.el7ev.noarch
vdsm-cli-4.19.50-1.el7ev.noarch
qemu-img-rhev-2.10.0-21.el7.x86_64
qemu-kvm-rhev-2.10.0-21.el7.x86_64
libvirt-python-3.9.0-1.el7.x86_64
libvirt-daemon-config-nwfilter-3.9.0-14.el7.x86_64
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
qemu-kvm-common-rhev-2.10.0-21.el7.x86_64
libvirt-daemon-driver-storage-core-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-iscsi-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-3.9.0-14.el7.x86_64
libvirt-daemon-3.9.0-14.el7.x86_64
libvirt-daemon-driver-interface-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-logical-3.9.0-14.el7.x86_64
vdsm-api-4.19.50-1.el7ev.noarch
libvirt-libs-3.9.0-14.el7.x86_64
libvirt-lock-sanlock-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-disk-3.9.0-14.el7.x86_64
libvirt-daemon-kvm-3.9.0-14.el7.x86_64
vdsm-hook-vmfex-dev-4.19.50-1.el7ev.noarch
vdsm-yajsonrpc-4.19.50-1.el7ev.noarch
libvirt-daemon-driver-network-3.9.0-14.el7.x86_64
libvirt-daemon-driver-secret-3.9.0-14.el7.x86_64
libvirt-daemon-driver-storage-scsi-3.9.0-14.el7.x86_64
libvirt-daemon-driver-qemu-3.9.0-14.el7.x86_64
vdsm-jsonrpc-4.19.50-1.el7ev.noarch
qemu-kvm-tools-rhev-2.10.0-21.el7.x86_64
kernel - 3.10.0-860.el7.x86_64




2018-03-19 19:05:39,355+0200 ERROR (vm/ecc627be) [virt.vm] (vmId='ecc627be-d05a-4846-ad27-d973d9b2524d') The vm start process failed (vm:631)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 2060, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3658, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-03-19T17:05:38.960301Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NU
MA mappings is obsoleted and will be removed in future

Comment 9 Michal Privoznik 2018-03-20 07:39:41 UTC

(In reply to Elad from comment #8)

> 2018-03-19 19:05:39,355+0200 ERROR (vm/ecc627be) [virt.vm]
> (vmId='ecc627be-d05a-4846-ad27-d973d9b2524d') The vm start process failed
> (vm:631)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
>     self._run()
>   File "/usr/share/vdsm/virt/vm.py", line 2060, in _run
>     self._connection.createXML(domxml, flags),
>   File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
> 123, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in
> wrapper
>     return func(inst, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3658, in
> createXML
>     if ret is None:raise libvirtError('virDomainCreateXML() failed',
> conn=self)
> libvirtError: internal error: qemu unexpectedly closed the monitor:
> 2018-03-19T17:05:38.960301Z qemu-kvm: warning: All CPU(s) up to maxcpus
> should be described in NUMA config, ability to start up with partial NU
> MA mappings is obsoleted and will be removed in future

This is just a harmless warning. The true error message is the one on the next line:

2018-03-19T17:05:39.035831Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Operation not permitted.  Is this a SCSI device?


Unfortunately, it looks like /dev/mapper/3514f0c5a5160048f cannot handle SG_IO. What's the output of: sginfo /dev/mapper/3514f0c5a5160048f  ?

Comment 10 Elad 2018-03-20 08:06:48 UTC

[root@storage-ge7-vdsm1 ~]# sginfo /dev/mapper/3514f0c5a5160048f
INQUIRY response (cmd: 0x12)
----------------------------
Device Type                        0
Vendor:                    XtremIO 
Product:                   XtremApp        
Revision level:            40f0

Comment 11 Paolo Bonzini 2018-03-21 10:52:01 UTC

The log says "operation not permitted", not "operation not supported".  This could be incorrect cgroup management in libvirt.

Comment 12 Ala Hino 2018-03-21 11:00:15 UTC

Moving the bug to libvirt

Comment 18 Michal Privoznik 2018-03-22 15:12:53 UTC

Elad,

can you please try to reproduce with libvirt out of the picture?

/usr/libexec/qemu-kvm \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-drive file=/dev/mapper/3514f0c5a5160048f,format=raw,if=none,id=drive-scsi0-0-0-1,werror=stop,rerror=stop,cache=none,aio=native \
-device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1


Also, are there any SELinux messages in the logs?

Comment 19 Michal Skrivanek 2018-03-23 14:08:07 UTC

Elad, can you please also reproduce on different setup?

Comment 20 Elad 2018-03-23 14:26:30 UTC

Hi, sorry for the delay, we had a power outage here in the labs.

Michal Skrivanek, this was reproduced on 3 environments already (happens every time): 4.2-el7.5-Netapp, 4.2-el7.5-Xtremio, 4.1-el7.4-Xtremio. See above comments


Michal Privoznik,

Seems like the VM starts successfully without libvirt:

[root@storage-ge13-vdsm1 ~]# /usr/libexec/qemu-kvm \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
> -drive file=/dev/mapper/3514f0c5a51601393,format=raw,if=none,id=drive-scsi0-0-0-1,werror=stop,rerror=stop,cache=none,aio=native \
> -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
warning: host doesn't support requested feature: CPUID.01H:ECX.cx16 [bit 13]
VNC server running on ::1:5900



[root@storage-ge13-vdsm1 ~]# ps aux |grep qemu
root       612  0.0  0.0  25036  1792 ?        Ss   16:07   0:00 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open,guest-file-close,guest-file-read,guest-file-write,guest-file-seek,guest-file-flush,guest-exec,guest-exec-status -F/etc/qemu-ga/fsfreeze-hook
root     20856 11.9  1.0 792380 59896 pts/0    Sl+  17:22   0:18 /usr/libexec/qemu-kvm -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -drive file=/dev/mapper/3514f0c5a51601393,format=raw,if=none,id=drive-scsi0-0-0-1,werror=stop,rerror=stop,cache=none,aio=native -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1

Comment 21 Elad 2018-03-23 14:27:34 UTC

Sorry, on 4.1-el7.4-Xtremio the bug didn't reproduce (comment #4)

Comment 22 Michal Privoznik 2018-03-24 09:32:48 UTC

Just to put findings of my investigation somewhere before I forget them. Here's minimalistic domain XML which reproduces the bug:

<domain type='kvm'>
  <name>testdom</name>
  <uuid>9ecd05ac-a83d-497b-a9ab-a523b6239d73</uuid>
  <memory unit='KiB'>262144</memory>
  <currentMemory unit='KiB'>262144</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.5.0'>hvm</type>
  </os>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='lun' sgio='filtered' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/dev/mapper/3514f0c5a5160138f'/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='scsi' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='static' model='dac' relabel='yes'>
    <label>root:root</label>
  </seclabel>
</domain>



If I disable cgroups in qemu.conf (cgroup_controllers = []) the domain is able to start. I've managed to reproduce this outside of libvirt too. Problem indeed is cgroup management. Libvirt allows /dev/mapper/XXX (which is a symlink to /dev/dm-N). However, /dev/dm-N is a multipath device, so we need to allow all the devices that multipath consists of. Indeed:

# multipath -l
3514f0c5a5160138f dm-2 XtremIO ,XtremApp        
size=150G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
  `- 3:0:0:1 sdb 8:16 active undef unknown


so adding /dev/sdb in cgroup_device_acl in qemu.conf makes everything work again.  Now question is, whether libvirt should try getting all devices belonging to a multipath device OR its admin responsibility to allow them in qemu.conf. However, from git-log is seems like libvirt never cared. So if this has ever worked something outside libvirt must have changed.

Comment 23 Michal Privoznik 2018-03-26 05:24:32 UTC

Regardless of my previous comment, we need to resolve this ASAP (instead of trying to find what has changed outside of libvirt) so I've proposed patches upstream:

https://www.redhat.com/archives/libvir-list/2018-March/msg01541.html

Comment 27 yisun 2018-03-26 09:07:54 UTC

reproduced on libvirt-3.9.0-14.el7_5.2.x86_64 with following steps:
===================================================================
[root@hp-dl360eg8-06 15632705]# rpm -qa | grep libvirt-3
libvirt-3.9.0-14.el7_5.2.x86_64
[root@hp-dl360eg8-06 15632705]# virsh domblklist vm1
Target     Source
------------------------------------------------
sda        /dev/mapper/mpathb

[root@hp-dl360eg8-06 15632705]# virsh start vm1
error: Failed to start domain vm1
error: internal error: qemu unexpectedly closed the monitor: 2018-03-26T08:09:22.018327Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Operation not permitted.  Is this a SCSI device?


And on scratch build the issue gone, so qa_ack+ this bug:
===================================================================
[root@hp-dl360eg8-06 15632705]# service libvirtd restart
Redirecting to /bin/systemctl restart libvirtd.service
[root@hp-dl360eg8-06 15632705]# rpm -qa | grep libvirt-3
libvirt-3.9.0-15.el7_5.2mp.x86_64
[root@hp-dl360eg8-06 15632705]# virsh start vm1
Domain vm1 started
[root@hp-dl360eg8-06 15632705]# virsh dumpxml vm1 | grep sgio -A8
    <disk type='block' device='lun' sgio='filtered' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/dev/mapper/mpathb'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
[root@hp-dl360eg8-06 15632705]# virsh edit vm1
Domain vm1 XML configuration edited.

[root@hp-dl360eg8-06 15632705]# virsh destroy vm1; virsh start vm1
Domain vm1 destroyed

vDomain vm1 started

[root@hp-dl360eg8-06 15632705]# virsh dumpxml vm1 | grep sgio -A8
    <disk type='block' device='lun' sgio='unfiltered' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/dev/mapper/mpathb'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

Comment 29 Michal Privoznik 2018-03-26 14:43:50 UTC

v2:

https://www.redhat.com/archives/libvir-list/2018-March/msg01599.html

Comment 32 Michal Privoznik 2018-03-28 07:57:25 UTC

So after some more investigation this looks like a kernel bug to me. I've even written a small reproducer that allowed me to reproduce this bug even without libvirt/qemu. All you need is a devmapper target, for instance I am using:

# dmsetup create blah --table "0 10 linear /dev/sdb 0"

and then I can run the reproducer like this:

# ./repro.sh /dev/mapper/blah

And on 7.4 everything works and the script prints out scsi version. However, on 7.5  I get this error: ioctl: Operation not permitted. So it is a regression. But not in libvirt rather than kernel.

Comment 33 Michal Privoznik 2018-03-28 07:58:19 UTC

Created attachment 1414045 [details]
devmapper_repro.tar.gz

Comment 35 Alasdair Kergon 2018-03-29 19:44:37 UTC

(In reply to Michal Privoznik from comment #32)
> # dmsetup create blah --table "0 10 linear /dev/sdb 0"

Is anything different if you make the size of that device 'blah' match the size of /dev/sdb?

Comment 37 Alasdair Kergon 2018-03-29 20:54:30 UTC

static int dm_get_bdev_for_ioctl(struct mapped_device *md,
...
        r = tgt->type->prepare_ioctl(tgt, bdev, mode);
...
        r = blkdev_get(*bdev, *mode, _dm_claim_ptr);
...
        return r;

dm_blk_ioctl() calls this and expects to see the result of ->prepare_ioctl() but that gets clobbered by blkdev_get() ?

Comment 38 Jonathan Earl Brassow 2018-03-29 21:17:40 UTC

(In reply to Michal Privoznik from comment #33)
> Created attachment 1414045 [details]
> devmapper_repro.tar.gz

what is ./devmapper suppose to do in that script?  It isn't in the tarball you included.

Comment 39 Alasdair Kergon 2018-03-29 21:17:55 UTC

Maybe try reverting this one:

commit 8a589be04b93bfe27c5f6ea3d6781eea90794916
Author: Mike Snitzer <snitzer>
Date:   Thu Feb 22 21:02:50 2018 -0500

    [md] dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl
    
    Message-id: <1519333370-21773-1-git-send-email-snitzer>
    Patchwork-id: 206006
    O-Subject: [RHEL7.5 PATCH] dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl
    Bugzilla: 1513037
    RH-Acked-by: Benjamin Marzinski <bmarzins>
    RH-Acked-by: Heinz Mauelshagen <heinzm>
    
    BZ: 1513037
    Brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15385976
    
    The referenced commit is staged for 4.16-rc inclusion via linux-dm.git, see:
    https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16&id=51a05338a6f82d53843743c3813c52b02ca24ff5
    
    Tested to pass the mptest tests on my testbed (which tests issuing
    pass-through ioctls using the dm_blk_ioctl() interface).

Comment 42 Mike Snitzer 2018-03-29 23:01:13 UTC

(In reply to Jonathan Earl Brassow from comment #38)
> (In reply to Michal Privoznik from comment #33)
> > Created attachment 1414045 [details]
> > devmapper_repro.tar.gz
> 
> what is ./devmapper suppose to do in that script?  It isn't in the tarball
> you included.

It is the included devmapper.c that once built is devmapper binary.

I'll try this reproducer now.

Comment 43 Mike Snitzer 2018-03-29 23:18:53 UTC

(In reply to Mike Snitzer from comment #42)

> I'll try this reproducer now.

# modprobe scsi_debug
(scsi_debug created /dev/sdb)
# ./repro.sh /dev/sdb
sg version: 30527
# dmsetup create blah --table "0 10 linear /dev/sdb 0"
# ./repro.sh /dev/mapper/blah
ioctl: Operation not permitted
# uname -r
4.16.0-rc6.snitm+

SO even with a very recent upstream kernel this doesn't work.

Reverting upstream commit 519049afead4f7c3e6446028c41e99fde958cc04 ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") enables the reproducer to work:

# ./repro.sh /dev/mapper/blah
sg version: 30527

But sadly just fixing the error code propagation, like proposed in comment#40, doesn't.

Comment 44 Mike Snitzer 2018-03-30 00:06:50 UTC

This is very much a DM meets cgroups issue.  There is something about using blkdev_get() that is clamping down on cgroup permissions:

# ./devmapper /dev/mapper/blah
sg version: 30527
# ./repro.sh /dev/mapper/blah
ioctl: Operation not permitted

(repro.sh is imposing cgroups)

I know next to nothing about cgroups or CONFIG_BLK_CGROUP related code.

It could easily be DM is missing proper cgroup propagation.  In fact Vivek proposed this change a while ago but I never took it (never mind that bio_associate_current wasn't exported from non-block layer use, etc):
https://patchwork.kernel.org/patch/8485451/

Upstream has seen __bio_clone_fast() and bio_clone_bioset() changes to call a new bio_clone_blkcg_association() interface that RHEL7.5 doesn't have.  But even with those advances upstream isn't working.

Cc'ing Vivek, Ming and Jeff in the hope that they have additional insight on why using blkdev_get(), like upstream commit 519049afea introduced, would cause issues with cgroup permissions.

comment#22 above says that adding the multipath device's underlying "/dev/sdb in cgroup_device_acl in qemu.conf makes everything work again."

Could it be that DM is just blind to cgroups, especially so in the context of RHEL7, and as a result it enabled cgroup enforcing infrastructure to be blissfully unaware that it wasn't plumbing things in properly?  Leaving said infrastructure exposed (thinking cgroups was working, but in reality DM was just ignoring and blind to it all?)

Comment 45 Mike Snitzer 2018-03-30 00:33:28 UTC

I added some extra debugging to dm code.

# tail -f /var/log/messages &
...
# ./repro.sh /dev/mapper/blah
ioctl: Operation not permitted
Mar 29 20:30:04 thegoat kernel: device-mapper: core: dm_get_bdev_for_ioctl: blkdev_get failed with -1

SO it is clear that the use of blkdev_get() is the limiting factor IFF cgroups are used (which we already basically knew given comment#43 details reverting upstream commit 519049a "fixes" the issue).

Comment 46 Mike Snitzer 2018-03-30 02:48:30 UTC

I spoke with Vivek and now have a better handle on the scope of this issue:
device cgroups is what are being used to control access, libvirt has enjoyed the ability to only add the top-level /dev/mapper/<multipath> device to guest's device cgroup.  The ability to open the top-level multipath device implicitly gives the guest the ability to issue IO to the multipath's underlying device(s).  So the question is: should we or shouldn't we carry this fiction through to the DM multipath passthrough ioctl interface?
 -- Doing so implies the need for a change to either bypass the device cgroup check (akin to __blkdev_get()'s 'for_part' flag) or some other solution (vivek had an idea to pursue about flipping to the root cgroup if the ioctl wasn't issued to a partition).

But all said, the way to skin this device cgroup ioctl permission issue in the kernel needs further design and upstream discussion.  Which is on a longer timescale than arriving at the 0day solution.

I'm left unsure which way we should go with the 0day:

option A:
1) go with the libvirt 0day that adds all underlying devices (comment#23 and comment#24)
2) _and_ a kernel 0day that fixes the return code issue detailed in comment#40
   - this would serve to preserve the fix, rhel7.git commit 8a589be04b9, that went itn to address bug#1513037 (customer escalation issue).

Or

option B:
1) revert rhel7.git commit 8a589be04b9
2) work upstream to establish consensus on the broader issue of whether the DM ioctl interface should just implicitly allow ioctls, as in allow device cgroups permissions, to underlying devices if they are issued to a top-level device that covers the entire underlying device (as is the case with DM multipath.. though a multipath device can be partitioned with linear dm devices ontop)


Sadly both of these options imply a 0day kernel change is needed no matter what.
Given that, I'm inclined to go with option A because we have a libvirt workaround; but could users just update to the RHEL7.5 kernel but _not_ update libvirt?  If so then they'd get boot failures for the virt guest config in question, so option A may not be acceptable.

Comment 47 Alasdair Kergon 2018-03-31 00:55:20 UTC

My preference is option A
- a minimal/easy/safe kernel fix to correct the committed patch
- long-term userspace change that accepts that all layers must be validated

B(2) seems wrong to me - while disk partitions are very tightly defined and the underlying 'whole disk' is merely an in-kernel implementation detail and so you can make a coherent argument that permission is necessarily implied, when you use dm, the device stacking is completely arbitrary - and dynamically changeable - and so I think it's wrong to infer that permission to use a top layer automatically implies permission to use whatever happens to be underneath.

Comment 48 Michal Privoznik 2018-03-31 06:54:38 UTC

Well, even though we have libvirt workaround if we go with option A we will need workaround for every other app that relies on CGroups and is using DM. This potentially includes customer written applications. A change in behaviour like this is undesired IMO between minor releases, therefore I vote for option B.

Comment 49 Vivek Goyal 2018-03-31 12:24:18 UTC

For doing IO to underlying device, we don't have to add that device to device cgroup and adding top level device is enough. But for issuing ioctl, one has to add underlying device, that feels like a contradiction to me.

Comment 50 Vivek Goyal 2018-03-31 12:28:05 UTC

Device cgroup seems to be able to control 3 types of permissions. read (r), write (w) and mknod (m). So by adding top level device, one automatically gets permissions to do r/w on underlying device (Through dm device). I am wondering why ioctls should be any different.

Comment 51 Mike Snitzer 2018-03-31 15:07:01 UTC

(In reply to Michal Privoznik from comment #48)
> Well, even though we have libvirt workaround if we go with option A we will
> need workaround for every other app that relies on CGroups and is using DM.
> This potentially includes customer written applications. A change in
> behaviour like this is undesired IMO between minor releases, therefore I
> vote for option B.

We need to deal with what we know not be paranoid about the unknown.

Reality is that there are very few applications that are using cgroups and ioctls.  If there were more it wouldn't have taken until the 11th hour for us to become aware of this 7.5 problem.

(In reply to Vivek Goyal from comment #49)
> For doing IO to underlying device, we don't have to add that device to
> device cgroup and adding top level device is enough. But for issuing ioctl,
> one has to add underlying device, that feels like a contradiction to me.

OK but ioctls aren't normal IO.  An ioctl is inherently out-of-band and (potentially) invasive.  SO while this may feel like a contradiction they are completely disjoint capabilities.

Comment 52 Vivek Goyal 2018-04-02 12:54:49 UTC

(In reply to Mike Snitzer from comment #51)
> OK but ioctls aren't normal IO.  An ioctl is inherently out-of-band and
> (potentially) invasive.  SO while this may feel like a contradiction they
> are completely disjoint capabilities.

Sure, if that's the desire then it should be implemented in device cgroup. That is a separate control for ioctls. 

But as of now there are only 3 controls. read, write and mknod. And any restrictions on ioctls are pure side affects of how code has been implemented.

In the absence of any explicit control for ioctl in device cgroup, I would think that ioctl fall into same category as read/write operation and should be treated accordingly.

Comment 53 Mike Snitzer 2018-04-02 13:21:03 UTC

(In reply to Vivek Goyal from comment #52)
> (In reply to Mike Snitzer from comment #51)
> > OK but ioctls aren't normal IO.  An ioctl is inherently out-of-band and
> > (potentially) invasive.  SO while this may feel like a contradiction they
> > are completely disjoint capabilities.
> 
> Sure, if that's the desire then it should be implemented in device cgroup.
> That is a separate control for ioctls. 
> 
> But as of now there are only 3 controls. read, write and mknod. And any
> restrictions on ioctls are pure side affects of how code has been
> implemented.
> 
> In the absence of any explicit control for ioctl in device cgroup, I would
> think that ioctl fall into same category as read/write operation and should
> be treated accordingly.

As is DM calls blkdev_get_by_dev() for each underlying device listed in the top-level multipath device's DM table.  So I'm struggling to appreciate how the virt team isn't hitting the same device cgroup permission issue on DM multipath table load (initial open for read/write) that they are for this ioctl case.

But I'll look closer.

Comment 54 Mike Snitzer 2018-04-02 17:35:44 UTC

(In reply to Mike Snitzer from comment #53)

> As is DM calls blkdev_get_by_dev() for each underlying device listed in the
> top-level multipath device's DM table.  So I'm struggling to appreciate how
> the virt team isn't hitting the same device cgroup permission issue on DM
> multipath table load (initial open for read/write) that they are for this
> ioctl case.
> 
> But I'll look closer.

Jeff Moyer helped me reason through the difference: the initial DM multipath table load (or the reproducer's linear device creation/load) is done using the root cgroup.

Whereas the guest's ioctl is being issued from within, or using, the created cgroup (which only has the multipath device being "allowed").  It just so happens that the DM passthrough ioctl code in 7.5's implementation now does a blkdev_get().

But in the end this isn't DM's cgroup inconsistency.  It is the cgroup user's inconsistency (in this case: libvirt).  Basically the guest has _never_ been allowed, on a device cgroup level, to issue ioctls or read/write IO to the underlying DM devices.  Just that the device cgroup permission check was never performed until now (via dm's extra blkdev_get()).

And furthermore: normal IO is being issued to the multipath device, from within the restricted cgroup, without the need to blkdev_get() the multipath's underlying device(s).  Therefore, even though a future open of the underlying devices would fail within the guest: the guest is blissfully unaware that DM multipath is actually issuing IO to the underlying devices _without_ validated cgroup permission.

Comment 55 Mike Snitzer 2018-04-02 18:54:39 UTC

Just spoke with Linda Wang: a 7.5 0day kernel is _not_ possible.
So that leaves us with having to execute on a revised "option A" from comment#46:

1) go with the libvirt 0day that adds all underlying devices (comment#23 and comment#24) to the cgroup
2) _and_ fix the return code issue detailed in comment#40 via z-stream
   - this serves to preserve the fix, rhel7.git commit 8a589be04b9, that went in to address bug#1513037 (customer escalation issue).

In addition, there is another z-stream fix that is needed for DM, this upstream commit needs backporting to various RHEL7 z-streams:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.17&id=e26a42c55b08ddaeac284ceea951ad379453473c

Comment 57 Michal Privoznik 2018-04-03 09:04:36 UTC

v3:

https://www.redhat.com/archives/libvir-list/2018-April/msg00083.html

BTW: I was surprised that this bug did not reproduce on my 4.15-vanilla. But after upgrading to 4.16-vanilla it started to reproduce, so this is not RHEL specific anymore.

Comment 58 Mike Snitzer 2018-04-03 16:12:18 UTC

(In reply to Michal Privoznik from comment #57)
> v3:
> 
> https://www.redhat.com/archives/libvir-list/2018-April/msg00083.html
> 
> BTW: I was surprised that this bug did not reproduce on my 4.15-vanilla. But
> after upgrading to 4.16-vanilla it started to reproduce, so this is not RHEL
> specific anymore.

Right, the blkdev_get() change only just went upstream during the 4.16 merge window.

Comment 60 Michal Privoznik 2018-04-05 08:19:13 UTC

v4:

https://www.redhat.com/archives/libvir-list/2018-April/msg00321.html

Comment 61 Michal Privoznik 2018-04-05 14:58:24 UTC

I've just pushed the patches upstream:

ommit cd9bbb7fad5102013b202a8a066798ef23eb15ac
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Mar 26 07:11:42 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Apr 5 16:53:19 2018 +0200

    news: Document device mapper fix
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit 6dd84f6850ca4379203d1e7b999430ed59041208
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Apr 5 09:34:25 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Apr 5 16:52:55 2018 +0200

    qemu_cgroup: Handle device mapper targets properly
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1557769
    
    Problem with device mapper targets is that there can be several
    other devices 'hidden' behind them. For instance, /dev/dm-1 can
    consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when
    setting up devices CGroup and namespaces we have to take this
    into account.
    
    This bug was exposed after Linux kernel was fixed. Initially,
    kernel used different functions for getting block device in
    open() and ioctl(). While CGroup permissions were checked in the
    former case, due to a bug in kernel they were not checked in the
    latter case. This changed with the upstream commit of
    519049afead4f7c3e6446028c41e99fde958cc04 (v4.16-rc5~11^2~4).
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit fd9d1e686db64fa9481b9eab4dabafa46713e2cf
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Mar 26 14:48:07 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Apr 5 09:58:44 2018 +0200

    util: Introduce virDevMapperGetTargets
    
    This helper fetches dependencies for given device mapper target.
    
    At the same time, we need to provide a dummy log function because
    by default libdevmapper prints out error messages to stderr which
    we need to suppress.
    
    Signed-off-by: Michal Privoznik <mprivozn>


v4.2.0-48-gcd9bbb7fad

Comment 63 Elad 2018-04-16 12:00:57 UTC

Tal, do we have a clone for this for consuming the fix in RHV?

Comment 64 Tal Nisan 2018-04-17 13:09:33 UTC

Seems like we don't, you've opened this bug on RHEL directly, the correct way IMO was to open a RHV bug and a RHEL bug that clocks him, can you please clone?

Comment 65 Elad 2018-04-17 13:26:34 UTC

No, I opened the bug on RHV and it was moved to RHEL in comment #12.
Anyway, I was asked in https://bugzilla.redhat.com/show_bug.cgi?id=1564996#c7 to test it on RHV with the fix and I think it would be better if we have a clone for RHV to consume the fix

Comment 66 Tal Nisan 2018-04-17 14:00:12 UTC

OK done - bug 1568441

Comment 70 yisun 2018-06-22 08:44:43 UTC


1. Prepare a multipath device as follow:
# multipath -ll
mpathb (3600140520321d9fc74c4a79bb492bd37) dm-3 LIO-ORG ,device.logical- 
size=2.0G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 4:0:0:0 sdb 8:16 active ready running
Verified with

1. Having a multipath device
[root@amd-9600b-8-1 ~]# multipath -ll
mpathb (3600140520321d9fc74c4a79bb492bd37) dm-3 LIO-ORG ,device.logical- 
size=2.0G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 4:0:0:0 sdb 8:16 active ready running


2. Having a shutdown vm with following xml
# virsh dumpxml avocado-vt-vm1
...
    <disk type='block' device='lun' sgio='filtered' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/dev/mapper/mpathb'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

3. start the vm and check the image is in use by it
# virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started


# virsh domblklist avocado-vt-vm1
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
sda        /dev/mapper/mpathb

4. check the same steps as above with disk hotplug
# cat disk
    <disk type='block' device='lun' sgio='filtered' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/dev/mapper/mpathb'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
    </disk>

# virsh attach-device avocado-vt-vm1 disk
Device attached successfully

# virsh domblklist avocado-vt-vm1
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
sda        /dev/mapper/mpathb

Comment 72 errata-xmlrpc 2018-10-30 09:53:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113

Note You need to log in before you can comment on or make changes to this bug.

adevolder
agk
areis
bmarzins
bugs
chorn
coughlan
dyuan
ebenahar
famz
jbrassow
jdenemar
jherrman
jiyan
jmoyer
jsuchane
knoel
lmen
michal.skrivanek
minlei
mprivozn
msnitzer
mtessun
pbonzini
ratamir
redhat
salmy
skozina
srodrigu
tnisan
vgoyal
xuzhang
yafu