Bug 1901685 - PCIE Host devices end up on pcie-pci-bridge in KubeVirt
Summary: PCIE Host devices end up on pcie-pci-bridge in KubeVirt
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.3
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: 8.4
Assignee: Laine Stump
QA Contact: yafu
URL:
Whiteboard:
Depends On:
Blocks: 1917827
TreeView+ depends on / blocked
 
Reported: 2020-11-25 19:59 UTC by Vladik Romanovsky
Modified: 2023-09-15 00:51 UTC (History)
16 users (show)

Fixed In Version: libvirt-6.6.0-12.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1917827 (view as bug list)
Environment:
Last Closed: 2021-02-22 15:39:41 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
example of a domain xml with a pcie device placed on a pcie-pci-bridge controller (6.35 KB, text/html)
2020-11-25 20:01 UTC, Vladik Romanovsky
no flags Details

Description Vladik Romanovsky 2020-11-25 19:59:17 UTC
Description of problem:
KubeVirt currently runs libvirt as root in a container. However, this container doesn't have SYS_ADMIN capability.
When KuebVirt assigns PCIe devices these are being placed on a pcie-pci-bridge controller and as a result, some guest drivers fail to load.

Version-Release number of selected component (if applicable):
libvirt-daemon-driver-qemu-6.6.0-3


How reproducible:
Assign a device using kubevirt. 
For example:

kind: VirtualMachineInstance
spec:
  domain:
    devices:
      gpus:
      - deviceName: nvidia.com/TU104GL_Tesla_T4
        name: gpu1


Expected results:
PCIe host device will not be placed on the pcie-pci-bridge controller.

Comment 1 Vladik Romanovsky 2020-11-25 20:01:32 UTC
Created attachment 1733489 [details]
example of a domain xml with a pcie device placed on a pcie-pci-bridge controller

Comment 2 Laine Stump 2020-11-26 03:19:41 UTC
I'm working on a patch to do the determination of contventional PCI vs PCIe differently. Right now we do this:


  if (libvirt is running unprivileged (non-root))
       check the length of the PCI config file for the device in sysfs
       if it is 256 bytes long, then it's conventional PCI, else PCIe
       DONE

  Otherwise assume that if we're running as root, that we have full capabilities, and:
  Try to read pci config file for device from sysfs. If there is any error, default to conventional PCI.
  Only decide it is PCIe if we can successfully read the entire PCI config file, and find the bit that
  says "this is PCIe".


Instead, what we need to do is this:

Try to read the pci config file.
if (there is an EPERM error while trying to read)
    if (the config file is 256 bytes or less)
       ---> conventional PCI
    else
       ---> PCIe
    DONE

check the "this is PCIe" bit read from the config and determine PCIe vs conventional PCI according to that.


(or maybe we should just check the length of the file in sysfs, although that seems like a bit of a cop-out).

All the stuff that reads the config file is currently setup to fail on a read error, so it was a "more than 10 minute" job and I couldn't do it today. I'll hopefully have a patch early next week.

Comment 5 Laine Stump 2020-12-09 01:39:01 UTC
Sent this series upstream, which solves the problem:

https://www.redhat.com/archives/libvir-list/2020-December/msg00527.html

Comment 6 Laine Stump 2020-12-13 02:24:15 UTC
Pushed a slightly modified version of the patches in Comment 5 to upstream (just added some internal documentation):

commit 01e421c16abbb5e9d0e1773c48f09dca1332251f
Author: Laine Stump <laine>
Date:   Tue Nov 24 16:56:52 2020 -0500

    qemu: use g_autoptr for a virPCIDevice   

commit 47ccca4fd3d37e47e0cc304e963980c5b6cb15a5
Author: Laine Stump <laine>
Date:   Sun Dec 6 16:05:03 2020 -0500

    util: simplify calling of virPCIDeviceDetectFunctionLevelReset()
    
commit b7a1eb6c65b8ebb8b3133fa9d12861c6948069bd
Author: Laine Stump <laine>
Date:   Sun Dec 6 16:20:23 2020 -0500

    util: simplify call to virPCIDeviceDetectPowerManagementReset()
    
commit 0003f5808ff12ca529ed52016dbf346195880c78
Author: Laine Stump <laine>
Date:   Sun Dec 6 20:26:01 2020 -0500

    util: make read error of PCI config file more detailed
    
commit 4b8245653d47dd0f875e3693db3d781f113a095c
Author: Laine Stump <laine>
Date:   Mon Dec 7 15:46:54 2020 -0500

    util: change call sequence for virPCIDeviceFindCapabilityOffset()
    
commit c00b6b1ae3bea8b5e6407b0991f0b1f16f129645
Author: Laine Stump <laine>
Date:   Tue Dec 8 14:34:19 2020 -0500

    util: make virPCIDeviceIsPCIExpress() more intelligent
    
commit cd338954b7056ba5c98fa860ce120358cbb74566
Author: Laine Stump <laine>
Date:   Tue Dec 8 14:44:30 2020 -0500

    qemu: remove redundant check for file length when determining PCIe vs. PCI

Comment 10 yafu 2020-12-21 01:13:04 UTC
Reproduced with libvirt-daemon-6.6.0-2.el8.x86_64.

Steps:
1.Create a normal user:
#useradd test
#passwd test

2.Add permission of libvirtd dir for test user:
#chown -R test:test /etc/libvirt
#chown -R test:test /var/lib/libvirt
#chown -R test:test /var/cache/libvirt
#chown -R test:test /run/libvirt/

3.Switch to test user:
#su - test

4.Start a container with test user and then start virtqemud process with root user in container:
$unshare --map-root-user -U -- virtqemud --verbose


5.Open another terminal, set libvirt.conf to connect to virtqemud:
$cat /etc/libvirt/libvirt.conf
uri_default = "qemu:///system"
remote_mode = "direct"


5.After virtqemud started, open another terminal to add hostdev device for domain:
$unshare --map-root-user -U -- virsh -c qemu:///system edit avocado-vt-vm1
...
<hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/>
      </source>
    </hostdev>
...


6.Check the controller attached hostdev device:
$unshare --map-root-user -U -- virsh -c qemu:///system dumpxml avocado-vt-vm1
<domain>
...
<controller type='pci' index='9' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </controller>
...
<device>
...
<hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x01' function='0x0'/>
    </hostdev>
...
</device>
...
</domain>

Comment 11 yafu 2020-12-21 03:41:02 UTC
Verified with libvirt-daemon-6.6.0-11.module+el8.3.1+9196+74a80ca4.x86_64.

Test steps:
1.Stop libvirtd service:
#systemctl stop libvirtd

2.Create a normal user:
#useradd test
#passwd test

3.Add permission of libvirt related dirs for user test:
#chown -R test:test /etc/libvirt
#chown -R test:test /var/lib/libvirt
#chown -R test:test /var/cache/libvirt
#chown -R test:test /run/
#chown -R test:test /var/run

4.Start a container with test user and then start virtqemud process with root user in container:
$unshare --map-root-user -U -- virtqemud --verbose


5.Open another terminal, set libvirt.conf to connect to virtqemud:
$cat /etc/libvirt/libvirt.conf
uri_default = "qemu:///system"
remote_mode = "direct"


6.After virtqemud started, open another terminal to add hostdev device for domain:
$unshare --map-root-user -U -- virsh -c qemu:///system edit avocado-vt-vm1
...
<hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/>
      </source>
    </hostdev>
...


6.Check the controller attached hostdev device:
<domain>
...
 <controller type='pci' index='9' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </controller>
  ...
    <controller type='pci' index='10' model='pcie-root-port'>   ******
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x18'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>

...
<device>
...
<hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>  ******
    </hostdev>
...
</device>
...
</domain>

Comment 12 yafu 2020-12-21 07:41:57 UTC
Also test libvirtd started by systemctl and libvirtd in session mode, both work well.

Comment 13 yafu 2020-12-21 07:55:26 UTC
Hi Vladik,

I verified the bug by starting libvirtd with root user in container which started by non-root user, does it meet the requirement of cnv? if not, would you help to confirm whether the issue is fixed in cnv env please? 

Thanks,
Yan Fu

Comment 16 Vladik Romanovsky 2021-01-08 18:32:31 UTC
(In reply to yafu from comment #13)
> Hi Vladik,
> 
> I verified the bug by starting libvirtd with root user in container which
> started by non-root user, does it meet the requirement of cnv? if not, would
> you help to confirm whether the issue is fixed in cnv env please? 
> 
> Thanks,
> Yan Fu

Great, thanks. I think that should be sufficient.

Comment 19 Laine Stump 2021-01-14 18:43:35 UTC
Meina Li <meili> pointed out in email that I had forgotten to backport a patch that fixes a regression caused by the earlier patches listed in Comment 6.

hcommit 49b5ebad9c597a8caa767d55987201d3fb022056
Author: Laine Stump <laine>
Date:   Wed Jan 6 15:42:47 2021 -0500

    util: validate pcie_cap_pos != 0 in virDeviceHasPCIExpressLink()
    
Without that patch, some users will see harmless yet bothersome error messages (as described in the commit message) when they restart libvirtd.

Comment 21 yafu 2021-01-19 01:42:03 UTC
Verified with libvirt-6.6.0-12.module+el8.3.1+9458+e57b3fac.x86_64.

Test steps:
1.Stop libvirtd service:
#systemctl stop libvirtd

2.Modify qemu.conf to start qemu process with root:
#cat /etc/libvirt/qemu.conf
user = "root"
group = "root"

3.Create a normal user:
#useradd test
#passwd test

4.Add permission of libvirt related dirs for user test:
#chown -R test:test /etc/libvirt
#chown -R test:test /var/lib/libvirt
#chown -R test:test /var/cache/libvirt
#chown -R test:test /run/
#chown -R test:test /var/run

5.Start a container with test user and then start virtqemud process with root user in container:
$unshare --map-root-user -U -- virtqemud --verbose


6.Open another terminal, set libvirt.conf to connect to virtqemud:
$cat /etc/libvirt/libvirt.conf
uri_default = "qemu:///system"
remote_mode = "direct"

7.After virtqemud started, open another terminal to add hostdev device for domain:
$unshare --map-root-user -U -- virsh -c qemu:///system edit avocado-vt-vm1
...
<hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/>
      </source>
    </hostdev>
...


8.Check the controller attached hostdev device:
<domain>
...
 <controller type='pci' index='9' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </controller>
  ...
    <controller type='pci' index='10' model='pcie-root-port'>   ******
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x18'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>

...
<device>
...
<hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x5b' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>  ******
    </hostdev>
...
</device>
...
</domain>

Comment 23 yafu 2021-01-22 09:04:34 UTC
Hi Laine,

As the issue fixed by patch in comment 19 is hard to reproduce, do we need to verify the patch in this bug?


thanks,
Yan Fu

Comment 24 Laine Stump 2021-01-22 16:02:58 UTC
In my opinion it's not necessary to reproduce the behavior described in that patch, only that when libvirtd starts there is no error message similar to the error described there. I only saw it on very specific hardware (that is sitting on the shelf next to my desk) where I've verified the bug is fixed, and it only happened when the other patches of this BZ were applied but not the final patch, which will NEVER happen in real life.

Even if it turned out that there was other hardware where the same regression occurred and that patch didn't fix it (which I seriously doubt), presence of the regression doesn't cause anything other than an innocuous error message when libvirt starts - there is no functional consequence.

Comment 25 yafu 2021-01-24 03:35:50 UTC
(In reply to Laine Stump from comment #24)
> In my opinion it's not necessary to reproduce the behavior described in that
> patch, only that when libvirtd starts there is no error message similar to
> the error described there. I only saw it on very specific hardware (that is
> sitting on the shelf next to my desk) where I've verified the bug is fixed,
> and it only happened when the other patches of this BZ were applied but not
> the final patch, which will NEVER happen in real life.
> 
> Even if it turned out that there was other hardware where the same
> regression occurred and that patch didn't fix it (which I seriously doubt),
> presence of the regression doesn't cause anything other than an innocuous
> error message when libvirt starts - there is no functional consequence.

Thanks for your detailed clarification.

I tried on several hosts and reproduced the issue in comment 19 on one host. And the issue can not be reproduced with libvirt-daemon-6.6.0-13.el8.x86_64.
So move the bug to VERIFIED.

Comment 27 errata-xmlrpc 2021-02-22 15:39:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0639

Comment 28 Red Hat Bugzilla 2023-09-15 00:51:55 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.