Bug 1758330

Summary: open PCI config file in read-only mode when possible
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Ihar Hrachyshka <ihrachys>
Component: libvirtAssignee: Ján Tomko <jtomko>
Status: CLOSED ERRATA QA Contact: jiyan <jiyan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.1CC: chhu, jdenemar, jgao, jiyan, jsuchane, knoel, lmen, xuzhang, yalzhang
Target Milestone: rc   
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-5.6.0-10.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-04 18:28:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1758964    

Description Ihar Hrachyshka 2019-10-03 20:46:57 UTC
Description of problem:

Currently, libvirt always opens PCI config files in write mode, which is a problem for environments where the daemon doesn't have write access to the files. This happens in CNV where libvirt is running inside a kubernetes pod that has /sys/ host path mounted into its containers as read-only.

In most cases, opening the file in write mode is not necessary, e.g. reset requests for VFIO registered devices are no-ops. But regardless, because libvirt always opens these files in write mode, kubevirt has to mount host /sys/devices path into virt-launcher (libvirt) pod to accommodate libvirt. This decision has security implications, and we would like to avoid allocating write access to /sys/devices subtree to these pods that are under control of users.

There is a series of patches already merged in upstream libvirt tree and released as part of 5.7.0 release that implements the enhancement request. Those are:

Author: Ján Tomko <jtomko>
Date:   Tue Aug 13 14:58:25 2019 +0200

    util: introduce virPCIDeviceConfigOpenInternal

    A thin wrapper to allow creating new functions.

    Signed-off-by: Ján Tomko <jtomko>
    Reviewed-by: Michal Privoznik <mprivozn>

Author: Ján Tomko <jtomko>
Date:   Tue Aug 13 15:07:53 2019 +0200

    util: Introduce virPCIDeviceConfigOpenWrite

    Only a handful of function need write access to the PCI config
    space. Create a wrapper function for those so that we can
    open it read only by default.

    Signed-off-by: Ján Tomko <jtomko>
    Reviewed-by: Michal Privoznik <mprivozn>

Author: Ján Tomko <jtomko>
Date:   Tue Aug 13 15:11:14 2019 +0200

    util: introduce readonly attribute to virPCIDeviceConfigOpenInternal

    Allow wrappers to open PCI config as read-only.

    Signed-off-by: Ján Tomko <jtomko>
    Reviewed-by: Michal Privoznik <mprivozn>

Author: Ján Tomko <jtomko>
Date:   Tue Aug 13 15:14:05 2019 +0200

    util: introduce virPCIDeviceConfigOpenTry

    For callers that only need read-only access and don't want
    an error reported.

    Signed-off-by: Ján Tomko <jtomko>
    Reviewed-by: Michal Privoznik <mprivozn>

commit e95f9459d3ae875d36df1699d919f0651b840109
Author: Ján Tomko <jtomko>
Date:   Tue Aug 13 15:17:44 2019 +0200

    util: default to read-only in virPCIDeviceConfigOpen

    All the callers left require virPCIDeviceConfigOpen to be fatal
    and only use read-only access to the config file.

    Signed-off-by: Ján Tomko <jtomko>
    Reviewed-by: Michal Privoznik <mprivozn>

I've tried libvirt 5.7.0 that includes these patches with kubevirt SR-IOV attached VMIs that use VFIO for SR-IOV VFs, and it seems to work.

Obviously, we can't just bump libvirt version in RHEL. So this bug is to ask if we can get these patches backported into RHEL libvirt version, so that in CNV we could remove /sys/devices mount from virt-launcher pod containers.

Comment 5 jiyan 2019-12-27 09:51:20 UTC
Reproduced this issue on libvirt-5.6.0-9.module+el8.1.1+4955+f0b25565.x86_64.

Version:
libvirt-5.6.0-9.module+el8.1.1+4955+f0b25565.x86_64
qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64
kernel-4.18.0-147.3.1.el8_1.x86_64

Steps:
1. Configure vfs number and driver under the rw mode of sys
# mount |grep sysfs
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)

# lspci |grep 82599
82:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
82:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

# echo 3 > /sys/devices/pci0000\:80/0000\:80\:02.0/0000\:82\:00.1/sriov_numvfs 

# echo "vfio-pci" > /sys/devices/pci0000\:80/0000\:80\:02.0/0000\:82\:00.1/virtfn0/driver_override

2. Check vf info through "lspci" and "virsh nodedev-dumpxml"
# lspci |grep 82599
82:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
82:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
82:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
82:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
82:10.5 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)

# virsh nodedev-dumpxml pci_0000_82_00_1 
<device>
  <name>pci_0000_82_00_1</name>
  <path>/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.1</path>
  <parent>pci_0000_80_02_0</parent>
...
    <capability type='virt_functions' maxCount='63'>
      <address domain='0x0000' bus='0x82' slot='0x10' function='0x1'/>
      <address domain='0x0000' bus='0x82' slot='0x10' function='0x3'/>
      <address domain='0x0000' bus='0x82' slot='0x10' function='0x5'/>
    </capability>
...
</device>

3. Detach a vf which will be used to attaching operation
# virsh nodedev-detach pci_0000_82_10_1 
Device pci_0000_82_10_1 detached

4. Change the sys with ro mode and restart libvirtd
# mount /sys -o remount,ro

# mount | grep sysfs
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime,seclabel)

# systemctl restart libvirtd

5. Check pf dumpxml through "virsh nodedev-dumpxml" ==> Failed
# virsh nodedev-dumpxml pci_0000_82_00_1 
error: Could not find matching device 'pci_0000_82_00_1'
error: Node device not found: no node device with matching name 'pci_0000_82_00_1'

6. Prepare a running VM and vf xml, then trying to attach vf to VM ==> Failed
# virsh domstate test811 
running

# cat vf.xml 
<interface type='hostdev'>
  <source>
    <address type='pci' domain='0x0000' bus='0x82' slot='0x10' function='0x1'/>
  </source>
  <target dev='test'/>
  <mac address='52:54:00:98:c4:a8'/>
  <model type='virtio'/>
</interface>

# virsh attach-device test811 vf.xml 
error: Failed to attach device from vf.xml
error: Failed to open config space file '/sys/bus/pci/devices/0000:82:10.1/config': Read-only file system



Verified this bug on libvirt-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64
7. Update libvirt to libvirt-5.6.0-10 and restart libvirtd.
# yum update libvirt* -y

# rpm -qa libvirt
libvirt-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64

# systemctl restart libvirtd

# mount |grep sysfs
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime,seclabel)

8. Check vf through "virsh nodedev-dumpxml"
# virsh nodedev-dumpxml pci_0000_82_00_1 ==> Succseeded witout err in step-5
<device>
  <name>pci_0000_82_00_1</name>
  <path>/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.1</path>
  <parent>pci_0000_80_02_0</parent>
...
    <product id='0x10fb'>82599ES 10-Gigabit SFI/SFP+ Network Connection</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='virt_functions' maxCount='63'>
      <address domain='0x0000' bus='0x82' slot='0x10' function='0x1'/>
      <address domain='0x0000' bus='0x82' slot='0x10' function='0x3'/>
      <address domain='0x0000' bus='0x82' slot='0x10' function='0x5'/>
    </capability>
...
</device>

9. Attach the vf to VM again, check related info ==> Succeeded without err in step-6
# virsh attach-device test811 vf.xml 
Device attached successfully

# virsh dumpxml test811 |grep "<interface" -A10
...
    <interface type='hostdev'>
      <mac address='52:54:00:98:c4:a8'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x82' slot='0x10' function='0x1'/>
      </source>
      <target dev='test'/>
      <model type='virtio'/>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>

# virsh console test811 
Connected to domain test811
Escape character is ^]

Red Hat Enterprise Linux 8.1 (Ootpa)
Kernel 4.18.0-147.el8.x86_64 on an x86_64

localhost login: root
Password: 
Last login: Fri Dec 27 14:28:53 on ttyS0
[root@localhost ~]# lspci |grep Eth
...
07:00.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)

Comment 6 jiyan 2020-01-02 08:07:49 UTC
Triggered related auto jobs, and no problems found.
Mark this bug as verified.

Comment 8 errata-xmlrpc 2020-02-04 18:28:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0404