Bug 908888
Summary: | qemu now requires CAP_COMPROMISE_KERNEL for working PCI passthrough | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Laine Stump <laine> | ||||
Component: | kernel | Assignee: | Kyle McMartin <kmcmartin> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | ajia, alex.williamson, berrange, clalancette, crobinso, cwei, dyuan, fweimer, gansalmon, honzhang, itamar, jason, jforbes, jonathan, jwboyer, jyang, kernel-maint, kmcmartin, laine, libvirt-maint, madhu.chinakonda, mathis.gavillon, mjg59, pbonzini, peterm, rjones, sassmann, veillard, virt-maint | ||||
Target Milestone: | --- | Keywords: | FutureFeature | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Enhancement | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-11-17 18:15:22 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Laine Stump
2013-02-07 19:07:42 UTC
I have a set of patches for libvirt that manage to set CAP_COMPROMISE_KERNEL for non-root qemu-kvm, and that does indeed solve the problem. During discussion in IRC, Dan Berrange pointed out that the need for this cap bit is likely due to the kernel checking for the cap at every read/write of the file, rather than just on open (as seems to be the case most of the rest of the time). While we could grudgingly patch libvirt to set CAP_COMPROMISE_KERNEL, that opens the host system up to a much wider range of attack from rogue guests than would exist if the kernel check for CAP_COMPROMISE_KERNEL happened only when the sysfs file was opened. (As aside: yes, we're aware that this means PCI passthrough via qemu's "pci-assign" device will by definition not work on any system using UEFI Secure Boot (since this capability bit is merely preventing us from doing anything that would anyway be prevented at a lower level by SB). newer qemu (starting with 1.3) will have vfio-pci to replace that, and libvirt will hopefully have that soon too. But Fedora 18 is using qemu-1.2, and as far as I know, qemu never rebases within a Fedora release (similar to libvirt). I just realized that the initial bug description was missing crucial information in the 2nd paragraph: The current implementation of CAPS_COMPROMISE_KERNEL (which is a patch to the F18 kernel that hasn't yet been applied upstream) checks this capability bit not only when open() is called on the sysfs file in question (/sys/bus/pci/devices/nnnn:nn:nn.nn/config) but also when read/write is called. Since libvirt (which has CAP_COMPROMISE_KERNEL) opens the file and passes the open fd to qemu, it's okay to restrict open(), but qemu (which has *no* cap bits set) needs to read/write this file, so it fails. Note that there was also an selinux issue: Bug 895161. selinux was essentially duplicating the checks of CAP_COMPROMISE_KERNEL, but the selinux policy has now been modified to allow that for all processes that are svirt_t (In reply to comment #2) > I just realized that the initial bug description was missing crucial > information in the 2nd paragraph: > > The current implementation of CAPS_COMPROMISE_KERNEL (which is a patch to > the F18 kernel that hasn't yet been applied upstream) checks this capability > bit not only when open() is called on the sysfs file in question > (/sys/bus/pci/devices/nnnn:nn:nn.nn/config) but also when read/write is > called. Since libvirt (which has CAP_COMPROMISE_KERNEL) opens the file and > passes the open fd to qemu, it's okay to restrict open(), but qemu (which > has *no* cap bits set) needs to read/write this file, so it fails. Actually, it doesn't check the capability on open() at all. That is essentially the problem. The sysfs filesystem shares a single open function for most of the files under /sys. It's part of the core sysfs functionality, and there are no existing hooks to override the open function so you cannot easily place per-file cap checks on files for open(). The cap checks in place are in read/write because those files are bin_attr types, and those do have hooks for read/write. Kyle was looking at providing a capability hook so that if it existed it the core sysfs open function would check it, and if not nothing additional was done. Not sure how that wound up. Created attachment 699527 [details]
sysfs cap check on open
Attaching unfinished diff for history reasons. (This was what was in the kernel build Laine tested.)
I see these patches are now posted upstream http://thread.gmane.org/gmane.linux.kernel/1459165 Can anyone confirm whether the upstream posting suffers the same flaw of requiring CAP_COMPROMISE_KERNEL for write(), instead of only for open() ? (In reply to comment #8) > I see these patches are now posted upstream > http://thread.gmane.org/gmane.linux.kernel/1459165 > > Can anyone confirm whether the upstream posting suffers the same flaw of > requiring CAP_COMPROMISE_KERNEL for write(), instead of only for open() ? Yes, the patches posted upstream are identical to what is currently in Fedora. (In reply to comment #9) > (In reply to comment #8) > > I see these patches are now posted upstream > > http://thread.gmane.org/gmane.linux.kernel/1459165 > > > > Can anyone confirm whether the upstream posting suffers the same flaw of > > requiring CAP_COMPROMISE_KERNEL for write(), instead of only for open() ? > > Yes, the patches posted upstream are identical to what is currently in > Fedora. Urgh. Can someone involved in LKML upstream highlight that these kernel patches cause a serious regression in userspace virt functionality & should not be merged without first fixing the sysfs code to check on open(). (In reply to comment #10) > (In reply to comment #9) > > (In reply to comment #8) > > > I see these patches are now posted upstream > > > http://thread.gmane.org/gmane.linux.kernel/1459165 > > > > > > Can anyone confirm whether the upstream posting suffers the same flaw of > > > requiring CAP_COMPROMISE_KERNEL for write(), instead of only for open() ? > > > > Yes, the patches posted upstream are identical to what is currently in > > Fedora. > > Urgh. Can someone involved in LKML upstream highlight that these kernel > patches cause a serious regression in userspace virt functionality & should > not be merged without first fixing the sysfs code to check on open(). Noted. Adding the check in open isn't trivial. I doubt those patches are going upstream anytime soon anyway. They're on the third round of posting, and the capability they rely on isn't merged yet and its name is still being bikeshedded. Laine, Daniel, just so I'm clear on things, you expect pci passthrough to break in the SB case right? Laine said as much in comment #1, and said that vfio will eventually be used. So if we're mostly just concerned about the cap check breaking the non-SB case, then there is a somewhat quick fix that can be done here. I doubt such a fix will be suitable as an upstream solution, but we can just replace that cap checks in the read/write functions with efi_enabled(EFI_SECURE_BOOT). That will evaluate to false on non-SB machines. (In reply to comment #12) > Laine, Daniel, just so I'm clear on things, you expect pci passthrough to > break in the SB case right? Laine said as much in comment #1, and said that > vfio will eventually be used. In the immediate term concerned with making sure we get working PCI passthrough when SB is not enabled, to solve the regression wrt previous Fedora's. vfio is our long term new architecture for PCI passthrough, which we're expecting to be compatible with SB enabled machines. I'm not fussed if traditional PCI passthrough does not work when SB is enabled, but it would be a nice bonus if it did, to ease the transition to vfio. > So if we're mostly just concerned about the cap check breaking the non-SB > case, then there is a somewhat quick fix that can be done here. I doubt > such a fix will be suitable as an upstream solution, but we can just replace > that cap checks in the read/write functions with > efi_enabled(EFI_SECURE_BOOT). That will evaluate to false on non-SB > machines. So you're saying replace if (!capable(CAP_COMPROMISE_KERNEL)) ...EPERM.. with if (efi_enabled(EFI_SECURE_BOOT) && !capable(CAP_COMPROMISE_KERNEL) ...EPERM.. if so that would certainly be fine with us as a immediate fix for Fedora, while upstream figures out the long term fix. (In reply to comment #13) > (In reply to comment #12) > > Laine, Daniel, just so I'm clear on things, you expect pci passthrough to > > break in the SB case right? Laine said as much in comment #1, and said that > > vfio will eventually be used. > > In the immediate term concerned with making sure we get working PCI > passthrough when SB is not enabled, to solve the regression wrt previous > Fedora's. > > vfio is our long term new architecture for PCI passthrough, which we're > expecting to be compatible with SB enabled machines. > > I'm not fussed if traditional PCI passthrough does not work when SB is > enabled, but it would be a nice bonus if it did, to ease the transition to > vfio. > > > So if we're mostly just concerned about the cap check breaking the non-SB > > case, then there is a somewhat quick fix that can be done here. I doubt > > such a fix will be suitable as an upstream solution, but we can just replace > > that cap checks in the read/write functions with > > efi_enabled(EFI_SECURE_BOOT). That will evaluate to false on non-SB > > machines. > > So you're saying replace > > if (!capable(CAP_COMPROMISE_KERNEL)) > ...EPERM.. > > with > > if (efi_enabled(EFI_SECURE_BOOT) && !capable(CAP_COMPROMISE_KERNEL) > ...EPERM.. Well, or just: if (efi_enabled(EFI_SECURE_BOOT)) ...EPERM... there's no reason to have the cap check still if you're using efi_enabled, since you already know the cap is unavailable in SB mode (the kernel refuses to grant it to anyone). > > So you're saying replace
> >
> > if (!capable(CAP_COMPROMISE_KERNEL))
> > ...EPERM..
> >
> > with
> >
> > if (efi_enabled(EFI_SECURE_BOOT) && !capable(CAP_COMPROMISE_KERNEL)
> > ...EPERM..
>
> Well, or just:
>
> if (efi_enabled(EFI_SECURE_BOOT))
> ...EPERM...
>
> there's no reason to have the cap check still if you're using efi_enabled,
> since you already know the cap is unavailable in SB mode (the kernel refuses
> to grant it to anyone).
Ah ok, that makes sense.
So I built a kernel with Kyle's patch, and assigned an ethernet controller to the guest I have here. After booting, it can clearly see the device including the config space: 00:06.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) Subsystem: Lenovo Device 21ce Physical Slot: 6 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 11 Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 00: 86 80 02 15 03 00 10 00 04 00 00 02 00 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 ce 21 30: 00 00 00 00 c8 00 00 00 00 00 00 00 0a 01 00 00 The e1000e driver doesn't load in the guest, failing with a -5. I'm not sure why that is, but it seems the file in question is accessible to qemu after it is started and /proc/<pid>/status clearly does not have any capabilities set. Laine, you said you had tested Kyle's patch and it didn't work at all. I'm seeing different results. Could you or Daniel explain what might possibly be going wrong here, since the config space is readable by qemu? What I had seen was that the guest booted successfully (with no errors encountered by either libvirt or qemu), but in the end there was no device visible in the guest; it seemed as if all the [reads/writes/whatever qemu is doing] were successful, but apparently either a read returned some different data, or perhaps a write accepted data from qemu but didn't actually do anything with it. If you have a new kernel handy, I can try it again; it's possible it was some strange anomaly. (BTW, I'm using an Intel 82576 card (sr-iov using igb driver) on AMD hardware.) I got the patch working. Or at least the device both shows up and the e1000e driver loads in my VM. Kyle's patch added the cap check for the <sysfs device path>/resourceX files, and the vpd file (if it exists). Kyle noticed yesterday that while QEMU is passed the fd for the config space, it opens the resourceX files by itself. With the checks he added there, those open calls were likely failing. I guess the question becomes is putting the cap check only on config sufficient, or do the resourceX files need it as well? (The patch also adds the check for legacy_io and legacy_mem files, but that seems to work fine). Matthew, any thoughts here? Oh. So one other thing I just noticed. If you are an unprivileged user, you can no longer run lspci because pcilib can't open the config file. That's probably worse than pci-passthru not working to be honest. Certainly more people would notice. [jwboyer@zod ~]$ lspci pcilib: Cannot open /sys/bus/pci/devices/0000:0e:00.0/config lspci: Unable to read the standard configuration space header of device 0000:0e:00.0 pcilib: Cannot open /sys/bus/pci/devices/0000:0d:00.0/config lspci: Unable to read the standard configuration space header of device 0000:0d:00.0 <etc etc> 1) if the resourceX files were previously readable by an unprivileged qemu, then they need to remain readable by unprivileged qemu. Adding in the checks at open would mean that either libvirt and qemu needed to come up with an mechanism for having them opened by libvirt and passed to qemu, or that libvirt would need to enable CAPS_COMPROMISE_KERNEL for qemu (which is now an *even worse* idea in light of learning that this name is probably not what will end up upstream). 2) can you make a scratch build for me to try out on my F18 system? 3) lspci works from a non-root shell for me here. Is the failure you note in Comment 19 something that's new with the patch that fixes our problem? The resourceX files *need* to be constrained by this capability, so if there's an expectation that an unprivileged qemu can open and read/write these files then qemu is going to need the capability. (In reply to comment #20) > 1) if the resourceX files were previously readable by an unprivileged qemu, > then they need to remain readable by unprivileged qemu. Adding in the checks > at open would mean that either libvirt and qemu needed to come up with an > mechanism for having them opened by libvirt and passed to qemu, or that > libvirt would need to enable CAPS_COMPROMISE_KERNEL for qemu (which is now > an *even worse* idea in light of learning that this name is probably not > what will end up upstream). Kyle moved the check to open instead of write as it was in the original patch (note: read did not have a check) after you suggested moving the check to open in comment #1. The resource files we covered via write in the previous patch, and I'm assuming Matthew did it that way on purpose. I doubt it's acceptable to drop the check for resourceX files. Unless a new idea presents itself, libvirt/qemu would have to make some change either way. > 2) can you make a scratch build for me to try out on my F18 system? I can. I doubt it's going to do anything more than what the scratch build you already tried did though. > 3) lspci works from a non-root shell for me here. Is the failure you note in > Comment 19 something that's new with the patch that fixes our problem? Yes. Namely, if you do capabilities checks on file open instead of file write, then lspci can't open the file at all and it fails. > > 2) can you make a scratch build for me to try out on my F18 system? > > I can. I doubt it's going to do anything more than what the scratch build > you already tried did though. Here you go (when it finishes building): http://koji.fedoraproject.org/koji/taskinfo?taskID=5185791 It doesn't include the change I described about dropping the check on the resourcesX files because those checks are needed. So it is a build with Kyle's patch. (In reply to comment #22) > (In reply to comment #20) > > 1) if the resourceX files were previously readable by an unprivileged qemu, > > then they need to remain readable by unprivileged qemu. Adding in the checks > > at open would mean that either libvirt and qemu needed to come up with an > > mechanism for having them opened by libvirt and passed to qemu, or that > > libvirt would need to enable CAPS_COMPROMISE_KERNEL for qemu (which is now > > an *even worse* idea in light of learning that this name is probably not > > what will end up upstream). > > Kyle moved the check to open instead of write as it was in the original > patch (note: read did not have a check) after you suggested moving the check > to open in comment #1. The resource files we covered via write in the > previous patch, and I'm assuming Matthew did it that way on purpose. I > doubt it's acceptable to drop the check for resourceX files. > > Unless a new idea presents itself, libvirt/qemu would have to make some > change either way. Especially in light of the fact that upstream has now refused the name CAPS_COMPROMISE_KERNEL, and is now considering a different name, it's completely not acceptable for libvirt to add CAPS_COMPROMISE_KERNEL to the qemu process. So that would leave us with only the choice of requiring libvirt to open these files and pass their fd's to qemu. There is no method in qemu to do that though. My understanding was that these kernel patches which were supposedly put in to enable Secure Boot must not (on a non-Secure Boot machine) prohibit *anything* that was previously allowed; doing otherwise breaks the ABI and applications that depend on it. (In reply to comment #24) > (In reply to comment #22) > > (In reply to comment #20) > > > 1) if the resourceX files were previously readable by an unprivileged qemu, > > > then they need to remain readable by unprivileged qemu. Adding in the checks > > > at open would mean that either libvirt and qemu needed to come up with an > > > mechanism for having them opened by libvirt and passed to qemu, or that > > > libvirt would need to enable CAPS_COMPROMISE_KERNEL for qemu (which is now > > > an *even worse* idea in light of learning that this name is probably not > > > what will end up upstream). > > > > Kyle moved the check to open instead of write as it was in the original > > patch (note: read did not have a check) after you suggested moving the check > > to open in comment #1. The resource files we covered via write in the > > previous patch, and I'm assuming Matthew did it that way on purpose. I > > doubt it's acceptable to drop the check for resourceX files. > > > > Unless a new idea presents itself, libvirt/qemu would have to make some > > change either way. > > Especially in light of the fact that upstream has now refused the name > CAPS_COMPROMISE_KERNEL, and is now considering a different name, it's > completely not acceptable for libvirt to add CAPS_COMPROMISE_KERNEL to the > qemu process. Yeah, the name is irrelevant. Some are even advocating for just using CAP_SYS_RAWIO everywhere, which at least eliminates a new cap. > So that would leave us with only the choice of requiring libvirt to open > these files and pass their fd's to qemu. There is no method in qemu to do > that though. > > My understanding was that these kernel patches which were supposedly put in > to enable Secure Boot must not (on a non-Secure Boot machine) prohibit > *anything* that was previously allowed; doing otherwise breaks the ABI and > applications that depend on it. So, yes the Secure Boot patches aren't supposed to impact things not using Secure Boot. However, this is really more of a general capabilities issue than anything. If you introduce a new capability, it is almost impossible not to change how things work. It should be noted that how libvirt and QEMU are getting by today might very well be a mistake/oversight upstream. The pciconfig_{read,write} syscalls actually require CAP_SYS_ADMIN. The fact that the equivalent sysfs files don't have these checks seems rather odd to me. (In reply to comment #25) > (In reply to comment #24) > > So that would leave us with only the choice of requiring libvirt to open > > these files and pass their fd's to qemu. There is no method in qemu to do > > that though. > > > > My understanding was that these kernel patches which were supposedly put in > > to enable Secure Boot must not (on a non-Secure Boot machine) prohibit > > *anything* that was previously allowed; doing otherwise breaks the ABI and > > applications that depend on it. > > So, yes the Secure Boot patches aren't supposed to impact things not using > Secure Boot. However, this is really more of a general capabilities issue > than anything. If you introduce a new capability, it is almost impossible > not to change how things work. > > It should be noted that how libvirt and QEMU are getting by today might very > well be a mistake/oversight upstream. The pciconfig_{read,write} syscalls > actually require CAP_SYS_ADMIN. The fact that the equivalent sysfs files > don't have these checks seems rather odd to me. If there is a security issue with the way caps checking is done on these sysfs files, then that changes the discussion somewhat. Such an issue probably ought to be dealt with / fixed separately from the SecureBoot work, so we can figure out the best solution without getting distracted by the more controversial SecureBoot proposals. > The pciconfig_{read,write} syscalls actually require CAP_SYS_ADMIN. The fact
> that the equivalent sysfs files don't have these checks seems rather odd to me.
The syscalls are equivalent to opening the sysfs file, doing a read/write and closing the it. Requiring CAP_SYS_ADMIN or CAP_SYS_RAWIO to open the sysfs file would make sense. Right now sysfs does not have that functionality (no ->open callback), but adding that infrastructure would help fixing this bug as well.
Of course, this would be a backwards-incompatible change. I think it would not be acceptable when opening for reads. It could be acceptable when opening for writes.
I've started a scratch build with the workaround from comment #12 here: http://koji.fedoraproject.org/koji/taskinfo?taskID=5231855 It isn't fully complete yet, but the x86_64 kernel is done. Please test and let us know if that allows device assignment to work for you in the non-SB case. I grabbed the build that Josh points to in Comment 28 and tried it out. It does definitely solve the problem; I can now configure qemu to run as non-root and drop all capabilities (i.e. I don't have to add CAP_COMPROMISE_KERNEL, or anything else), and PCI passthrough works as it did in the past. Thumbs up from me to push this into F18. Thanks! (In reply to comment #29) > I grabbed the build that Josh points to in Comment 28 and tried it out. It > does definitely solve the problem; I can now configure qemu to run as > non-root and drop all capabilities (i.e. I don't have to add > CAP_COMPROMISE_KERNEL, or anything else), and PCI passthrough works as it > did in the past. Thanks for testing. I'd say it works around the problem more than solves it. The work around isn't going to be acceptable to upstream, so the root of the issues still remain. > Thumbs up from me to push this into F18. I plan on doing that today. I'm moving this bug to rawhide and leaving it open until some kind of final solution presents itself though. As mentioned, the workaround is in this F18 update: https://admin.fedoraproject.org/updates/kernel-3.8.6-203.fc18 However, I did not mark it as containing the fix for this bug because bodhi would close it out. Did these patches not make it into F19? (and rawhide?) I just got F19 installed on my SRIOV machine and found that, once again, I have to run qemu as root with no capabilities dropped in order for PCI passthrough to work. This needinfo is here to remind kyle that he was going to provide an F19 kernel build for me to try. Sorry for the hold up, been busy with an internal transition... http://koji.fedoraproject.org/koji/taskinfo?taskID=5539911 Can you test if this fixes the "secure boot disabled but passthru is still broken" case with this changeset? Unfortunately that kernel actually makes matters worse :-(. When I boot with kernel 3.9.7-300.bz908888, PCI device assignment will not work at all, even if the qemu process is running as root with no capabilities dropped. When I reboot with the previous kernel 3.9.6-301, I am back to the previous situation - PCI device assignment *does* work as long as the qemu process is running as root, with no capabilities dropped. If I run qemu as a non-privileged user, with no extra capabilities, then it still fails. If there's some info you'd like me to gather, I'd be happy to do that, or if you'd like access to the machine we can figure out how to do that as well. (Do you have access to a machine running F19 that doesn't use secureboot and supports device assignment? I can help you setup libvirt for a quick and simple test so you don't have to keep waiting for me to get results) Can you be more specific with "will not work at all"? Did config space access work? http://koji.fedoraproject.org/koji/taskinfo?taskID=5549005 How about this build? Sorry for the lack of detail in my description. I really am coming at this from the outside. Below you'll see the qemu commandline and the error messages that qemu produces. Is there something I can do manually to get the answer to the question "Did config space access work"? I'll try out the new build and let you know the result. LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -name RHEL6 -S -machine pc-1.2,accel=kvm,usb=off -m 3907 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid cf9c00de-8615-9213-2e0d-5edc89e3b369 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/RHEL6.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/var/lib/libvirt/images/RHEL6.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:f9:c5:5e,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device pci-assign,configfd=25,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 Domain id=2 is tainted: high-privileges char device redirected to /dev/pts/1 (label charserial0) qemu-system-x86_64: -device pci-assign,configfd=25,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x3: assigned_dev_register_regions: Error: Couldn't mmap 0xfcba0000! qemu-system-x86_64: -device pci-assign,configfd=25,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x3: Device initialization failed. qemu-system-x86_64: -device pci-assign,configfd=25,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x3: Device 'kvm-pci-assign' could not be initialized 2013-06-26 18:25:10.670+0000: shutting down With the 3.9.7-300.bz908888.2 build it fails in a different way: (commandline is same as above, qemu is run as root, with all capabilities. selinux is permissive) I did reboot with 3.9.6-301 again to verify that nothing else has changed. qemu-system-x86_64: -device pci-assign,configfd=25,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x3: pci-assign: Cannot read from host /sys/bus/pci/devices/0000:02:00.0/rom Device option ROM contents are probably invalid (check dmesg). Skip option ROM probe with rombar=0, or load from file with romfile= ((null):2743): SpiceWorker-Warning **: red_worker.c:11477:dev_destroy_primary_surface: double destroy of primary surface ((null):2743): SpiceWorker-Warning **: red_worker.c:9663:red_create_surface: condition `surface->context.canvas' reached qemu: hardware error: pci write failed, ret = -1 errno = 1 CPU #0: EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 2013-06-28 00:33:10.266+0000: shutting down OK. So comment37 is the BARs failing to get MMAP'd because qemu opens them. So we need to fix qemu to get those passed in from the privileged libvirt. Comment38 is strange, because the kernel you tested there doesn't check the resource files at all, so I have no idea why that's failing. I'm on PTO Monday & Friday next week, but maybe we can figure out some easier way to test this on Tuesday. The updated F18 kernel-3.10.4-100.fc18.x86_64 seems to re-introduce the issue again, the previous kernel-3.9.11-200.fc18.x86_64 works fine. (i.e. PCI passthrough can be used without being root) Alex, this is a lingering bug with PCI passthrough in Fedora, maybe there's something you can add or help with debugging. *** Bug 999482 has been marked as a duplicate of this bug. *** Since this bug has been silent for a while, and rawhide libvirt now uses VFIO by default which works with out of the box configuration, closing. If someone is still affected, please reopen. FYI CAP_COMPROMISE_KERNEL has been killed in F20, so this should work even with legacy PCI device assignment. |