RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1754241 - Unable to start a VM with a Tape Library passed through from the hypervisor
Summary: Unable to start a VM with a Tape Library passed through from the hypervisor
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 7.8
Assignee: Michal Privoznik
QA Contact: gaojianan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-22 06:39 UTC by Miguel Martin
Modified: 2020-03-31 19:59 UTC (History)
19 users (show)

Fixed In Version: libvirt-4.5.0-27.el7
Doc Type: If docs needed, set a value
Doc Text:
Cause: When libvirt starts a domain with a SCSI device it sets so called unprivileged SGIO, which enables an unprivileged process to issue privileged SCSI commands. This is done by writing a value somewhere into sysfs. However, libvirt used very device specific path where it also assumed that SCSI device is a block device. This is not true, there are other types of SCSI devices which are not block device, for instance a SCSI tape. Consequence: Libvirt failed to enable unpriv_sgio and so it also failed to start qemu. Fix: Libvirt now uses more generic path to unpriv_sgio file which doesn't assume any type of the device and consists solely from the device address. Result: Libvirt is now able to enable unpriv_sgio and start domain start.
Clone Of:
Environment:
Last Closed: 2020-03-31 19:59:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:1094 0 None None None 2020-03-31 19:59:44 UTC

Description Miguel Martin 2019-09-22 06:39:26 UTC
Description of problem:

After attaching a tape library to a VM as described at the official documentation [1], the VM fails to start.

Version-Release number of selected component (if applicable):
OS Version:
RHEL - 7.7 - 3.el7ev
OS Description:
Red Hat Virtualization Host 4.3.5 (el7.7)
Kernel Version:
3.10.0 - 1062.el7.x86_64
KVM Version:
2.12.0 - 33.el7
LIBVIRT Version:
libvirt-4.5.0-23.el7
VDSM Version:
vdsm-4.30.24-2.el7ev
SPICE Version:
0.14.0 - 7.el7
GlusterFS Version:
glusterfs-3.12.2-47.2.el7rhgs
CEPH Version:
librbd1-10.2.5-4.el7
Open vSwitch Version:
openvswitch-2.11-4.el7ev
Kernel Features:
PTI: 1, IBRS: 0, RETP: 1, SSBD: 3
VNC Encryption:
Disabled

How reproducible:
Always

Steps to Reproduce:
1.Attach a tape library to a VM as described in the documentation [1]
2.Try to start the VM

Actual results:
The VM fails to start

Expected results:
The VM starts successfully

Comment 25 Paolo Bonzini 2019-09-27 16:45:17 UTC
Yes, it's a libvirt bug.  The problem happens when libvirt tries to detect shared devices; the virSCSIDeviceGetDevName function lacks a fallback for non-disk devices.

If acceptable, a workaround is to remove rawio="yes", which could be unnecessary anyway.

Comment 26 Ryan Barry 2019-09-27 18:23:09 UTC
Is this bug being tracked?

Comment 31 Paolo Bonzini 2019-10-01 13:11:48 UTC
> This property of rawio="yes" is present in the VM XML, but this section of the XML is derived from the device
> which is passthrough to the VM. Therefore to disable this, have we to modify something from the RHV end or the
> customer has to modify something on the device which they are using to passthrough?

rawio is added by vdsm to the XML.  This is generally unnecessary, so if it can be removed with a hook that
might fix the bug.  It should be easy to test it before giving the workaround/hook to the customer.

Comment 32 Michal Privoznik 2019-10-02 10:28:33 UTC
I'll just put my findings here while I investigate further. I've ran git bisect and looks like this is the commit that's causing the problem:

5b24ffe0ec9bd2fb18d26e6261b84556097067b7 is the first bad commit
commit 5b24ffe0ec9bd2fb18d26e6261b84556097067b7
Author: John Ferlan <jferlan>
Date:   Wed Dec 5 08:49:31 2018 -0500

    RHEL: qemu: Alter qemuSetUnprivSGIO hostdev shareable logic
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1656360
    
    RHEL-only
    
    Fix the logic to handle the case where if the <shareable/> element
    was removed from the domain <hostdev.../>, then we have to reset the
    SGIO value back to 0. Without this patch the check for not shareable
    and return 0 would bypass resetting the value back to 0.
    
    Signed-off-by: John Ferlan <jferlan>
    Reviewed-by: Ján Tomko <jtomko>

http://post-office.corp.redhat.com/archives/rhvirt-patches/2018-December/msg00121.html

The problem is, even without <shareable/> for <hostdev/> libvirt is trying to figure out major:minor for the SCSI device so that unpriv_sgio can be set:

Thread 3 "libvirtd" hit Breakpoint 1, virSCSIDeviceGetDevName (sysfs_prefix=0x0, adapter=0x7f57e41c2460 "scsi_host1", bus=0, target=0, unit=0) at util/virscsi.c:155
155     {
#0  virSCSIDeviceGetDevName (sysfs_prefix=0x0, adapter=0x7f57e41c2460 "scsi_host1", bus=0, target=0, unit=0) at util/virscsi.c:155
#1  0x00007f5815e8ef39 in qemuGetHostdevPath (hostdev=0x7f57e41c37b0) at qemu/qemu_conf.c:1454
#2  0x00007f5815e8f5fb in qemuSetUnprivSGIO (dev=0x7f5838005720) at qemu/qemu_conf.c:1670
#3  0x00007f5815e77def in qemuHostdevPrepareSCSIDevices (driver=0x7f57e402b240, name=0x7f57e417ada0 "fedora", hostdevs=0x7f57e41c2810, nhostdevs=1) at qemu/qemu_hostdev.c:287
#4  0x00007f5815e78083 in qemuHostdevPrepareDomainDevices (driver=0x7f57e402b240, def=0x7f57e41c0ef0, qemuCaps=0x7f58300020c0, flags=3) at qemu/qemu_hostdev.c:356
#5  0x00007f5815e9e72a in qemuProcessPrepareHost (driver=0x7f57e402b240, vm=0x7f57e41aefd0, flags=17) at qemu/qemu_process.c:6138
#6  0x00007f5815ea085d in qemuProcessStart (conn=0x7f5808001120, driver=0x7f57e402b240, vm=0x7f57e41aefd0, updatedCPU=0x0, asyncJob=QEMU_ASYNC_JOB_START, migrateFrom=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6717
#7  0x00007f5815ef3a15 in qemuDomainObjStart (conn=0x7f5808001120, driver=0x7f57e402b240, vm=0x7f57e41aefd0, flags=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7288
#8  0x00007f5815ef3c83 in qemuDomainCreateWithFlags (dom=0x7f5830001010, flags=0) at qemu/qemu_driver.c:7341
#9  0x00007f5815ef3d06 in qemuDomainCreate (dom=0x7f5830001010) at qemu/qemu_driver.c:7360
#10 0x00007f583c8c50cc in virDomainCreate (domain=0x7f5830001010) at libvirt-domain.c:6531
#11 0x000055def87cc65e in remoteDispatchDomainCreate (server=0x55def9365cb0, client=0x55def93b69e0, msg=0x55def93b6ae0, rerr=0x7f5838005ae0, args=0x7f5830000e40) at remote/remote_daemon_dispatch_stubs.h:4434
#12 0x000055def87cc577 in remoteDispatchDomainCreateHelper (server=0x55def9365cb0, client=0x55def93b69e0, msg=0x55def93b6ae0, rerr=0x7f5838005ae0, args=0x7f5830000e40, ret=0x7f5830000e70) at remote/remote_daemon_dispatch_stubs.h:4410
#13 0x00007f583c7aaf11 in virNetServerProgramDispatchCall (prog=0x55def93a75f0, server=0x55def9365cb0, client=0x55def93b69e0, msg=0x55def93b6ae0) at rpc/virnetserverprogram.c:437
#14 0x00007f583c7aaa74 in virNetServerProgramDispatch (prog=0x55def93a75f0, server=0x55def9365cb0, client=0x55def93b69e0, msg=0x55def93b6ae0) at rpc/virnetserverprogram.c:304
#15 0x00007f583c7b1b7e in virNetServerProcessMsg (srv=0x55def9365cb0, client=0x55def93b69e0, prog=0x55def93a75f0, msg=0x55def93b6ae0) at rpc/virnetserver.c:143
#16 0x00007f583c7b1c42 in virNetServerHandleJob (jobOpaque=0x55def9365490, opaque=0x55def9365cb0) at rpc/virnetserver.c:164
#17 0x00007f583c6a4780 in virThreadPoolWorker (opaque=0x55def9365180) at util/virthreadpool.c:167
#18 0x00007f583c6a3d0d in virThreadHelper (data=0x55def9375a30) at util/virthread.c:206
#19 0x00007f583b8ee458 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f583b81c6ef in clone () from /lib64/libc.so.6

qemuGetHostdevPath 3 # p *scsisrc 
$7 = {protocol = 0, sgio = 0, rawio = 1, u = {host = {adapter = 0x7f57e41c2460 "scsi_host1", bus = 0, target = 0, unit = 0}, iscsi = {src = 0x7f57e41c2460}}}


The path which libvirt tries to use to set unpriv_sgio is /sys/dev/block/MAJ:MIN/device/unpriv_sgio. John, since you are the original author of the referenced patch, do you perhaps have any idea where libvirt could learn those numbers for a SCSI device that's not a block device (i.e. /sys/bus/scsi/devices/X:X:X:X/block dir doesn't exist)? Is such device even represented in devtmpfs and/or does it make sense to enable unpriv_sgio for it?

Comment 33 John Ferlan 2019-10-02 12:59:38 UTC
Well it's been over a year for those patches and much longer for the overall sgio concept, let's see how much I remember - I do recall this series of changes things quite botched and resulted in a succession of changes to fix the overall problem.

BTW: It's not clear why this bz is filed under RHEL AV since it's pointing out a RHEL7 issue...  Also the 'sgio' code is purely a downstream option. I think it needs to be set to RHEL7... If a patch is generated, then a RHEL8 clone for this should be created too which would also set the zstream flag to cover those conditions. 

Anyway, qemuIsSharedHostdev was moved as a result of trying to cover an edge condition when the shareable attribute was removed from the hostdev that previous may have had the "sgio" set (in which case, the desire was to reset it to 0), see bz 1656360. As you can tell from the subsequent patches/commits to that the logic was not done properly. 

In re-thinking about that decision in light of what's shown here, perhaps I should have gone with the wontfix option for that edge condition. The decision to remove the shared hostdev logic checking could cause a problem for non block devices (see the comment before the if conditions you're in).

In hindsight the logic should have involved the VIR_DOMAIN_DEVICE_SGIO_UNFILTERED check as well. As in, if not shared, but yet set, then assume someone just removed the shared attribute and we need to reset the sgio @val to 0 - perhaps something like:

    if shared hostdev
        get hostdev_path
    else
        if hostdev->source.subsys.u.scsi.sgio == VIR_DOMAIN_DEVICE_SGIO_UNFILTERED
            get hostdev_path
        else
            return 0
 

Another option, but not fully thought out is use virDirOpenIfExists instead of virDirOpen in virSCSIDeviceGetDevName. That way we return NULL if the block device is not found and we don't error out.  I didn't check all callers and assumptions on that though.

A third option, revert the patch and the 3 subsequent ones, and declare the someone removed shared when sgio was previously set problem to be undesired to fix.


FWIW: 
details on issues caused because of the change - IOW other patches that wouldn't be necessary if 5b24ffe0ec9bd2fb18d26e6261b84556097067b7 was reverted:

3560b106745b8d1ed16203858b4a9434de4d79cf
eec80321b1066ea326746fb70e99575e5d2f2954
e6623828b4e41fab3df1593a670341a55b3f6a71

If one digs into the rhvirt-patches archives, they'll find the sordid history:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2019-January/msg00796.html

which was based on what was pointed out during RHEL 8.0 testing and subsequent patches:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2018-December/msg00613.html

and of course an obligatory original RHEL 7.7 series:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2018-December/msg00119.html

Comment 34 Jaroslav Suchanek 2019-10-03 08:30:53 UTC
(In reply to John Ferlan from comment #33)
> 
> BTW: It's not clear why this bz is filed under RHEL AV since it's pointing
> out a RHEL7 issue...  Also the 'sgio' code is purely a downstream option. I
> think it needs to be set to RHEL7... If a patch is generated, then a RHEL8
> clone for this should be created too which would also set the zstream flag
> to cover those conditions. 
> 

You're right. Moving to RHEL7. It's under your radar now... ;)

Comment 38 Michal Privoznik 2019-10-03 15:03:47 UTC
I've cooked some patches and here's a scratch build with them on top:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23843040

If the build expires then RPMs are available here:

https://mprivozn.fedorapeople.org/scsi/

@Miguel, can we get them to the customer to test it?
Meanwhile, I'll polish them and send to the upstream list.

Comment 49 gaojianan 2019-11-20 06:06:55 UTC
Verified version:
libvirt-4.5.0-28.el7.x86_64

Reproduce steps:
1.Prepare a tape class scsi device
# lshw -c tape -businfo
Bus info          Device      Class          Description
========================================================
scsi@2:0.0.0      /dev/nst0   tape           scsi_debug

2.Edit guest xml and start it 
    <hostdev mode='subsystem' type='scsi' managed='no'>        
      <source>
        <adapter name='scsi_host2'/>
        <address bus='0' target='0' unit='0'/>
      </source>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </hostdev>

# virsh start demo
Domain demo started

Work as expected

Comment 51 errata-xmlrpc 2020-03-31 19:59:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1094


Note You need to log in before you can comment on or make changes to this bug.