Description of problem: KubeVirt currently runs libvirt as root in a container. However, this container doesn't have SYS_ADMIN capability. When KuebVirt assigns PCIe devices these are being placed on a pcie-pci-bridge controller and as a result, some guest drivers fail to load. Version-Release number of selected component (if applicable): libvirt-daemon-driver-qemu-6.6.0-3 How reproducible: Assign a device using kubevirt. For example: kind: VirtualMachineInstance spec: domain: devices: gpus: - deviceName: nvidia.com/TU104GL_Tesla_T4 name: gpu1 Expected results: PCIe host device will not be placed on the pcie-pci-bridge controller.
Created attachment 1733489 [details] example of a domain xml with a pcie device placed on a pcie-pci-bridge controller
I'm working on a patch to do the determination of contventional PCI vs PCIe differently. Right now we do this: if (libvirt is running unprivileged (non-root)) check the length of the PCI config file for the device in sysfs if it is 256 bytes long, then it's conventional PCI, else PCIe DONE Otherwise assume that if we're running as root, that we have full capabilities, and: Try to read pci config file for device from sysfs. If there is any error, default to conventional PCI. Only decide it is PCIe if we can successfully read the entire PCI config file, and find the bit that says "this is PCIe". Instead, what we need to do is this: Try to read the pci config file. if (there is an EPERM error while trying to read) if (the config file is 256 bytes or less) ---> conventional PCI else ---> PCIe DONE check the "this is PCIe" bit read from the config and determine PCIe vs conventional PCI according to that. (or maybe we should just check the length of the file in sysfs, although that seems like a bit of a cop-out). All the stuff that reads the config file is currently setup to fail on a read error, so it was a "more than 10 minute" job and I couldn't do it today. I'll hopefully have a patch early next week.
Sent this series upstream, which solves the problem: https://www.redhat.com/archives/libvir-list/2020-December/msg00527.html
Pushed a slightly modified version of the patches in Comment 5 to upstream (just added some internal documentation): commit 01e421c16abbb5e9d0e1773c48f09dca1332251f Author: Laine Stump <laine> Date: Tue Nov 24 16:56:52 2020 -0500 qemu: use g_autoptr for a virPCIDevice commit 47ccca4fd3d37e47e0cc304e963980c5b6cb15a5 Author: Laine Stump <laine> Date: Sun Dec 6 16:05:03 2020 -0500 util: simplify calling of virPCIDeviceDetectFunctionLevelReset() commit b7a1eb6c65b8ebb8b3133fa9d12861c6948069bd Author: Laine Stump <laine> Date: Sun Dec 6 16:20:23 2020 -0500 util: simplify call to virPCIDeviceDetectPowerManagementReset() commit 0003f5808ff12ca529ed52016dbf346195880c78 Author: Laine Stump <laine> Date: Sun Dec 6 20:26:01 2020 -0500 util: make read error of PCI config file more detailed commit 4b8245653d47dd0f875e3693db3d781f113a095c Author: Laine Stump <laine> Date: Mon Dec 7 15:46:54 2020 -0500 util: change call sequence for virPCIDeviceFindCapabilityOffset() commit c00b6b1ae3bea8b5e6407b0991f0b1f16f129645 Author: Laine Stump <laine> Date: Tue Dec 8 14:34:19 2020 -0500 util: make virPCIDeviceIsPCIExpress() more intelligent commit cd338954b7056ba5c98fa860ce120358cbb74566 Author: Laine Stump <laine> Date: Tue Dec 8 14:44:30 2020 -0500 qemu: remove redundant check for file length when determining PCIe vs. PCI
Reproduced with libvirt-daemon-6.6.0-2.el8.x86_64. Steps: 1.Create a normal user: #useradd test #passwd test 2.Add permission of libvirtd dir for test user: #chown -R test:test /etc/libvirt #chown -R test:test /var/lib/libvirt #chown -R test:test /var/cache/libvirt #chown -R test:test /run/libvirt/ 3.Switch to test user: #su - test 4.Start a container with test user and then start virtqemud process with root user in container: $unshare --map-root-user -U -- virtqemud --verbose 5.Open another terminal, set libvirt.conf to connect to virtqemud: $cat /etc/libvirt/libvirt.conf uri_default = "qemu:///system" remote_mode = "direct" 5.After virtqemud started, open another terminal to add hostdev device for domain: $unshare --map-root-user -U -- virsh -c qemu:///system edit avocado-vt-vm1 ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/> </source> </hostdev> ... 6.Check the controller attached hostdev device: $unshare --map-root-user -U -- virsh -c qemu:///system dumpxml avocado-vt-vm1 <domain> ... <controller type='pci' index='9' model='pcie-to-pci-bridge'> <model name='pcie-pci-bridge'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </controller> ... <device> ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x09' slot='0x01' function='0x0'/> </hostdev> ... </device> ... </domain>
Verified with libvirt-daemon-6.6.0-11.module+el8.3.1+9196+74a80ca4.x86_64. Test steps: 1.Stop libvirtd service: #systemctl stop libvirtd 2.Create a normal user: #useradd test #passwd test 3.Add permission of libvirt related dirs for user test: #chown -R test:test /etc/libvirt #chown -R test:test /var/lib/libvirt #chown -R test:test /var/cache/libvirt #chown -R test:test /run/ #chown -R test:test /var/run 4.Start a container with test user and then start virtqemud process with root user in container: $unshare --map-root-user -U -- virtqemud --verbose 5.Open another terminal, set libvirt.conf to connect to virtqemud: $cat /etc/libvirt/libvirt.conf uri_default = "qemu:///system" remote_mode = "direct" 6.After virtqemud started, open another terminal to add hostdev device for domain: $unshare --map-root-user -U -- virsh -c qemu:///system edit avocado-vt-vm1 ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/> </source> </hostdev> ... 6.Check the controller attached hostdev device: <domain> ... <controller type='pci' index='9' model='pcie-to-pci-bridge'> <model name='pcie-pci-bridge'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </controller> ... <controller type='pci' index='10' model='pcie-root-port'> ****** <model name='pcie-root-port'/> <target chassis='10' port='0x18'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> ... <device> ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/> ****** </hostdev> ... </device> ... </domain>
Also test libvirtd started by systemctl and libvirtd in session mode, both work well.
Hi Vladik, I verified the bug by starting libvirtd with root user in container which started by non-root user, does it meet the requirement of cnv? if not, would you help to confirm whether the issue is fixed in cnv env please? Thanks, Yan Fu
(In reply to yafu from comment #13) > Hi Vladik, > > I verified the bug by starting libvirtd with root user in container which > started by non-root user, does it meet the requirement of cnv? if not, would > you help to confirm whether the issue is fixed in cnv env please? > > Thanks, > Yan Fu Great, thanks. I think that should be sufficient.
Meina Li <meili> pointed out in email that I had forgotten to backport a patch that fixes a regression caused by the earlier patches listed in Comment 6. hcommit 49b5ebad9c597a8caa767d55987201d3fb022056 Author: Laine Stump <laine> Date: Wed Jan 6 15:42:47 2021 -0500 util: validate pcie_cap_pos != 0 in virDeviceHasPCIExpressLink() Without that patch, some users will see harmless yet bothersome error messages (as described in the commit message) when they restart libvirtd.
Verified with libvirt-6.6.0-12.module+el8.3.1+9458+e57b3fac.x86_64. Test steps: 1.Stop libvirtd service: #systemctl stop libvirtd 2.Modify qemu.conf to start qemu process with root: #cat /etc/libvirt/qemu.conf user = "root" group = "root" 3.Create a normal user: #useradd test #passwd test 4.Add permission of libvirt related dirs for user test: #chown -R test:test /etc/libvirt #chown -R test:test /var/lib/libvirt #chown -R test:test /var/cache/libvirt #chown -R test:test /run/ #chown -R test:test /var/run 5.Start a container with test user and then start virtqemud process with root user in container: $unshare --map-root-user -U -- virtqemud --verbose 6.Open another terminal, set libvirt.conf to connect to virtqemud: $cat /etc/libvirt/libvirt.conf uri_default = "qemu:///system" remote_mode = "direct" 7.After virtqemud started, open another terminal to add hostdev device for domain: $unshare --map-root-user -U -- virsh -c qemu:///system edit avocado-vt-vm1 ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/> </source> </hostdev> ... 8.Check the controller attached hostdev device: <domain> ... <controller type='pci' index='9' model='pcie-to-pci-bridge'> <model name='pcie-pci-bridge'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </controller> ... <controller type='pci' index='10' model='pcie-root-port'> ****** <model name='pcie-root-port'/> <target chassis='10' port='0x18'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> ... <device> ... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/> ****** </hostdev> ... </device> ... </domain>
Hi Laine, As the issue fixed by patch in comment 19 is hard to reproduce, do we need to verify the patch in this bug? thanks, Yan Fu
In my opinion it's not necessary to reproduce the behavior described in that patch, only that when libvirtd starts there is no error message similar to the error described there. I only saw it on very specific hardware (that is sitting on the shelf next to my desk) where I've verified the bug is fixed, and it only happened when the other patches of this BZ were applied but not the final patch, which will NEVER happen in real life. Even if it turned out that there was other hardware where the same regression occurred and that patch didn't fix it (which I seriously doubt), presence of the regression doesn't cause anything other than an innocuous error message when libvirt starts - there is no functional consequence.
(In reply to Laine Stump from comment #24) > In my opinion it's not necessary to reproduce the behavior described in that > patch, only that when libvirtd starts there is no error message similar to > the error described there. I only saw it on very specific hardware (that is > sitting on the shelf next to my desk) where I've verified the bug is fixed, > and it only happened when the other patches of this BZ were applied but not > the final patch, which will NEVER happen in real life. > > Even if it turned out that there was other hardware where the same > regression occurred and that patch didn't fix it (which I seriously doubt), > presence of the regression doesn't cause anything other than an innocuous > error message when libvirt starts - there is no functional consequence. Thanks for your detailed clarification. I tried on several hosts and reproduced the issue in comment 19 on one host. And the issue can not be reproduced with libvirt-daemon-6.6.0-13.el8.x86_64. So move the bug to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0639
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days