Bug 1754241
Summary: | Unable to start a VM with a Tape Library passed through from the hypervisor | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Miguel Martin <mmartinv> |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
Status: | CLOSED ERRATA | QA Contact: | gaojianan <jgao> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.7 | CC: | dyuan, jdenemar, jferlan, jgao, jsuchane, knoel, lmen, lsurette, mavital, michal.skrivanek, mprivozn, njajodia, pbonzini, pvilayat, rbarry, sirao, srevivo, xuzhang, ycui |
Target Milestone: | rc | ||
Target Release: | 7.8 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-4.5.0-27.el7 | Doc Type: | If docs needed, set a value |
Doc Text: |
Cause:
When libvirt starts a domain with a SCSI device it sets so called unprivileged SGIO, which enables an unprivileged process to issue privileged SCSI commands. This is done by writing a value somewhere into sysfs. However, libvirt used very device specific path where it also assumed that SCSI device is a block device. This is not true, there are other types of SCSI devices which are not block device, for instance a SCSI tape.
Consequence:
Libvirt failed to enable unpriv_sgio and so it also failed to start qemu.
Fix:
Libvirt now uses more generic path to unpriv_sgio file which doesn't assume any type of the device and consists solely from the device address.
Result:
Libvirt is now able to enable unpriv_sgio and start domain start.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-31 19:59:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Miguel Martin
2019-09-22 06:39:26 UTC
Yes, it's a libvirt bug. The problem happens when libvirt tries to detect shared devices; the virSCSIDeviceGetDevName function lacks a fallback for non-disk devices. If acceptable, a workaround is to remove rawio="yes", which could be unnecessary anyway. Is this bug being tracked? > This property of rawio="yes" is present in the VM XML, but this section of the XML is derived from the device
> which is passthrough to the VM. Therefore to disable this, have we to modify something from the RHV end or the
> customer has to modify something on the device which they are using to passthrough?
rawio is added by vdsm to the XML. This is generally unnecessary, so if it can be removed with a hook that
might fix the bug. It should be easy to test it before giving the workaround/hook to the customer.
I'll just put my findings here while I investigate further. I've ran git bisect and looks like this is the commit that's causing the problem: 5b24ffe0ec9bd2fb18d26e6261b84556097067b7 is the first bad commit commit 5b24ffe0ec9bd2fb18d26e6261b84556097067b7 Author: John Ferlan <jferlan> Date: Wed Dec 5 08:49:31 2018 -0500 RHEL: qemu: Alter qemuSetUnprivSGIO hostdev shareable logic https://bugzilla.redhat.com/show_bug.cgi?id=1656360 RHEL-only Fix the logic to handle the case where if the <shareable/> element was removed from the domain <hostdev.../>, then we have to reset the SGIO value back to 0. Without this patch the check for not shareable and return 0 would bypass resetting the value back to 0. Signed-off-by: John Ferlan <jferlan> Reviewed-by: Ján Tomko <jtomko> http://post-office.corp.redhat.com/archives/rhvirt-patches/2018-December/msg00121.html The problem is, even without <shareable/> for <hostdev/> libvirt is trying to figure out major:minor for the SCSI device so that unpriv_sgio can be set: Thread 3 "libvirtd" hit Breakpoint 1, virSCSIDeviceGetDevName (sysfs_prefix=0x0, adapter=0x7f57e41c2460 "scsi_host1", bus=0, target=0, unit=0) at util/virscsi.c:155 155 { #0 virSCSIDeviceGetDevName (sysfs_prefix=0x0, adapter=0x7f57e41c2460 "scsi_host1", bus=0, target=0, unit=0) at util/virscsi.c:155 #1 0x00007f5815e8ef39 in qemuGetHostdevPath (hostdev=0x7f57e41c37b0) at qemu/qemu_conf.c:1454 #2 0x00007f5815e8f5fb in qemuSetUnprivSGIO (dev=0x7f5838005720) at qemu/qemu_conf.c:1670 #3 0x00007f5815e77def in qemuHostdevPrepareSCSIDevices (driver=0x7f57e402b240, name=0x7f57e417ada0 "fedora", hostdevs=0x7f57e41c2810, nhostdevs=1) at qemu/qemu_hostdev.c:287 #4 0x00007f5815e78083 in qemuHostdevPrepareDomainDevices (driver=0x7f57e402b240, def=0x7f57e41c0ef0, qemuCaps=0x7f58300020c0, flags=3) at qemu/qemu_hostdev.c:356 #5 0x00007f5815e9e72a in qemuProcessPrepareHost (driver=0x7f57e402b240, vm=0x7f57e41aefd0, flags=17) at qemu/qemu_process.c:6138 #6 0x00007f5815ea085d in qemuProcessStart (conn=0x7f5808001120, driver=0x7f57e402b240, vm=0x7f57e41aefd0, updatedCPU=0x0, asyncJob=QEMU_ASYNC_JOB_START, migrateFrom=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6717 #7 0x00007f5815ef3a15 in qemuDomainObjStart (conn=0x7f5808001120, driver=0x7f57e402b240, vm=0x7f57e41aefd0, flags=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7288 #8 0x00007f5815ef3c83 in qemuDomainCreateWithFlags (dom=0x7f5830001010, flags=0) at qemu/qemu_driver.c:7341 #9 0x00007f5815ef3d06 in qemuDomainCreate (dom=0x7f5830001010) at qemu/qemu_driver.c:7360 #10 0x00007f583c8c50cc in virDomainCreate (domain=0x7f5830001010) at libvirt-domain.c:6531 #11 0x000055def87cc65e in remoteDispatchDomainCreate (server=0x55def9365cb0, client=0x55def93b69e0, msg=0x55def93b6ae0, rerr=0x7f5838005ae0, args=0x7f5830000e40) at remote/remote_daemon_dispatch_stubs.h:4434 #12 0x000055def87cc577 in remoteDispatchDomainCreateHelper (server=0x55def9365cb0, client=0x55def93b69e0, msg=0x55def93b6ae0, rerr=0x7f5838005ae0, args=0x7f5830000e40, ret=0x7f5830000e70) at remote/remote_daemon_dispatch_stubs.h:4410 #13 0x00007f583c7aaf11 in virNetServerProgramDispatchCall (prog=0x55def93a75f0, server=0x55def9365cb0, client=0x55def93b69e0, msg=0x55def93b6ae0) at rpc/virnetserverprogram.c:437 #14 0x00007f583c7aaa74 in virNetServerProgramDispatch (prog=0x55def93a75f0, server=0x55def9365cb0, client=0x55def93b69e0, msg=0x55def93b6ae0) at rpc/virnetserverprogram.c:304 #15 0x00007f583c7b1b7e in virNetServerProcessMsg (srv=0x55def9365cb0, client=0x55def93b69e0, prog=0x55def93a75f0, msg=0x55def93b6ae0) at rpc/virnetserver.c:143 #16 0x00007f583c7b1c42 in virNetServerHandleJob (jobOpaque=0x55def9365490, opaque=0x55def9365cb0) at rpc/virnetserver.c:164 #17 0x00007f583c6a4780 in virThreadPoolWorker (opaque=0x55def9365180) at util/virthreadpool.c:167 #18 0x00007f583c6a3d0d in virThreadHelper (data=0x55def9375a30) at util/virthread.c:206 #19 0x00007f583b8ee458 in start_thread () from /lib64/libpthread.so.0 #20 0x00007f583b81c6ef in clone () from /lib64/libc.so.6 qemuGetHostdevPath 3 # p *scsisrc $7 = {protocol = 0, sgio = 0, rawio = 1, u = {host = {adapter = 0x7f57e41c2460 "scsi_host1", bus = 0, target = 0, unit = 0}, iscsi = {src = 0x7f57e41c2460}}} The path which libvirt tries to use to set unpriv_sgio is /sys/dev/block/MAJ:MIN/device/unpriv_sgio. John, since you are the original author of the referenced patch, do you perhaps have any idea where libvirt could learn those numbers for a SCSI device that's not a block device (i.e. /sys/bus/scsi/devices/X:X:X:X/block dir doesn't exist)? Is such device even represented in devtmpfs and/or does it make sense to enable unpriv_sgio for it? Well it's been over a year for those patches and much longer for the overall sgio concept, let's see how much I remember - I do recall this series of changes things quite botched and resulted in a succession of changes to fix the overall problem. BTW: It's not clear why this bz is filed under RHEL AV since it's pointing out a RHEL7 issue... Also the 'sgio' code is purely a downstream option. I think it needs to be set to RHEL7... If a patch is generated, then a RHEL8 clone for this should be created too which would also set the zstream flag to cover those conditions. Anyway, qemuIsSharedHostdev was moved as a result of trying to cover an edge condition when the shareable attribute was removed from the hostdev that previous may have had the "sgio" set (in which case, the desire was to reset it to 0), see bz 1656360. As you can tell from the subsequent patches/commits to that the logic was not done properly. In re-thinking about that decision in light of what's shown here, perhaps I should have gone with the wontfix option for that edge condition. The decision to remove the shared hostdev logic checking could cause a problem for non block devices (see the comment before the if conditions you're in). In hindsight the logic should have involved the VIR_DOMAIN_DEVICE_SGIO_UNFILTERED check as well. As in, if not shared, but yet set, then assume someone just removed the shared attribute and we need to reset the sgio @val to 0 - perhaps something like: if shared hostdev get hostdev_path else if hostdev->source.subsys.u.scsi.sgio == VIR_DOMAIN_DEVICE_SGIO_UNFILTERED get hostdev_path else return 0 Another option, but not fully thought out is use virDirOpenIfExists instead of virDirOpen in virSCSIDeviceGetDevName. That way we return NULL if the block device is not found and we don't error out. I didn't check all callers and assumptions on that though. A third option, revert the patch and the 3 subsequent ones, and declare the someone removed shared when sgio was previously set problem to be undesired to fix. FWIW: details on issues caused because of the change - IOW other patches that wouldn't be necessary if 5b24ffe0ec9bd2fb18d26e6261b84556097067b7 was reverted: 3560b106745b8d1ed16203858b4a9434de4d79cf eec80321b1066ea326746fb70e99575e5d2f2954 e6623828b4e41fab3df1593a670341a55b3f6a71 If one digs into the rhvirt-patches archives, they'll find the sordid history: http://post-office.corp.redhat.com/archives/rhvirt-patches/2019-January/msg00796.html which was based on what was pointed out during RHEL 8.0 testing and subsequent patches: http://post-office.corp.redhat.com/archives/rhvirt-patches/2018-December/msg00613.html and of course an obligatory original RHEL 7.7 series: http://post-office.corp.redhat.com/archives/rhvirt-patches/2018-December/msg00119.html (In reply to John Ferlan from comment #33) > > BTW: It's not clear why this bz is filed under RHEL AV since it's pointing > out a RHEL7 issue... Also the 'sgio' code is purely a downstream option. I > think it needs to be set to RHEL7... If a patch is generated, then a RHEL8 > clone for this should be created too which would also set the zstream flag > to cover those conditions. > You're right. Moving to RHEL7. It's under your radar now... ;) I've cooked some patches and here's a scratch build with them on top: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23843040 If the build expires then RPMs are available here: https://mprivozn.fedorapeople.org/scsi/ @Miguel, can we get them to the customer to test it? Meanwhile, I'll polish them and send to the upstream list. Verified version: libvirt-4.5.0-28.el7.x86_64 Reproduce steps: 1.Prepare a tape class scsi device # lshw -c tape -businfo Bus info Device Class Description ======================================================== scsi@2:0.0.0 /dev/nst0 tape scsi_debug 2.Edit guest xml and start it <hostdev mode='subsystem' type='scsi' managed='no'> <source> <adapter name='scsi_host2'/> <address bus='0' target='0' unit='0'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </hostdev> # virsh start demo Domain demo started Work as expected Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1094 |