Bug 1408810 - PCIe: Add an option to PCIe ports to disable IO port space support
Summary: PCIe: Add an option to PCIe ports to disable IO port space support
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: 8.0
Assignee: Andrea Bolognani
QA Contact: Meina Li
URL:
Whiteboard:
: 1504111 (view as bug list)
Depends On: 1344299
Blocks: 1410577 1410578
TreeView+ depends on / blocked
 
Reported: 2016-12-27 12:11 UTC by Marcel Apfelbaum
Modified: 2021-07-18 07:30 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1344299
: 1410577 1410578 (view as bug list)
Environment:
Last Closed: 2021-07-18 07:30:08 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
lmn.xml for testing 15 e1000 devices (17.54 KB, text/plain)
2019-01-10 08:45 UTC, Meina Li
no flags Details

Description Marcel Apfelbaum 2016-12-27 12:11:03 UTC
+++ This bug was initially created as a clone of Bug #1344299 +++

Even if the firmware skips assigning IO ranges to PCIe ports (root
ports/downstream ports), Linux guests will still try to assign them IO.

We can to add a parameter "disable-io" to PCIe ports to disable IO support.
It will work by making IO base/limit registers read-only so both firmware
and guest OSes will comply.

--- Additional comment from RHEL Product and Program Management on 2016-06-09 08:04:38 EDT ---

Since this bug report was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Marcel Apfelbaum on 2016-06-23 06:07:07 EDT ---

We intend to provide a 'thin' version of Q35 for 7.3 to be used mainly with
virtio devices which are PCIe, the IO limitation will not be an issue.


Add libvirt support for the device command line parameters.

Comment 5 Jaroslav Suchanek 2017-11-15 14:41:50 UTC
*** Bug 1504111 has been marked as a duplicate of this bug. ***

Comment 7 Meina Li 2019-01-10 08:42:53 UTC
According to the understanding for this bug, I reproduced the following scenario:

SC1: Boot a guest with 15 e1000 devices which plugged into 15 pcie-to-pci-bridge controllers, the pcie topology: 15 e1000 device --> 15 pcie-to-pci-bridge --> 15 pcie-root-port
SC2: Using 10 e1000 devices to test        ---The guest start successfully in this scenario.

Reproduced version:
qemu-kvm-rhev-2.12.0-19.el7_6.2.x86_64
libvirt-4.5.0-10.el7_6.3.x86_64
kernel-3.10.0-986.el7.x86_64

Reproduced steps:

1. Prepare a guest xml with 15 e1000 devices which plugged into 15 pcie-to-pci-bridge controllers.
(can refer to the lmn.xml in attachment)

2. Define and start the guest.
# virsh define lmn.xml 
Domain lmn defined from lmn.xml
# virsh start lmn
Domain lmn started
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 16    lmn                            running

3. Check the boot process by virt-manager.
The screen doesn't display properly, the guest may panic and halt.

Comment 8 Meina Li 2019-01-10 08:44:23 UTC
(In reply to Meina Li from comment #7)
> According to the understanding for this bug, I reproduced the following
> scenario:
> 
> SC1: Boot a guest with 15 e1000 devices which plugged into 15
> pcie-to-pci-bridge controllers, the pcie topology: 15 e1000 device --> 15
> pcie-to-pci-bridge --> 15 pcie-root-port
> SC2: Using 10 e1000 devices to test        ---The guest start successfully
> in this scenario.

Correct SC2: Using 9 e1000 devices to test

> Reproduced version:
> qemu-kvm-rhev-2.12.0-19.el7_6.2.x86_64
> libvirt-4.5.0-10.el7_6.3.x86_64
> kernel-3.10.0-986.el7.x86_64
> 
> Reproduced steps:
> 
> 1. Prepare a guest xml with 15 e1000 devices which plugged into 15
> pcie-to-pci-bridge controllers.
> (can refer to the lmn.xml in attachment)
> 
> 2. Define and start the guest.
> # virsh define lmn.xml 
> Domain lmn defined from lmn.xml
> # virsh start lmn
> Domain lmn started
> # virsh list --all
>  Id    Name                           State
> ----------------------------------------------------
>  16    lmn                            running
> 
> 3. Check the boot process by virt-manager.
> The screen doesn't display properly, the guest may panic and halt.

Comment 9 Meina Li 2019-01-10 08:45:45 UTC
Created attachment 1519673 [details]
lmn.xml for testing 15 e1000 devices

Comment 10 Andrea Bolognani 2019-06-07 11:45:06 UTC
Some relevant discussion happening on qemu-devel.

  https://lists.nongnu.org/archive/html/qemu-devel/2019-06/msg01093.html

Comment 14 Andrea Bolognani 2021-01-14 22:24:39 UTC
I spent some time trying to figure out whether this is something that
makes sense to expose in libvirt after all.

As I understand it, the situation that lead to the QEMU option being
implemented is as such:

  * PCI I/O port space is a fairly limited resource on q35 (in the
    order of 64 KiB if I'm not mistaken);

  * some devices, often those that are assigned from the host,
    require I/O space to work;

  * other devices, notably virtio 1.0 ones, don't;

  * since QEMU / the firmware / the kernel can't know in advance what
    kind of device will end up being hotplugged into a PCI slot, the
    only reasonable course of action is to set aside some I/O space
    "just in case";

  * however, with 4 KiB reserved for each pcie-root-port and only a
    single PCI function available on it, I/O space runs out
    relatively quickly, leading to various issues.

The plan was apparently to make *not* reserving I/O space the
default, but somehow that hasn't happened so far and each
pcie-root-port still reserves 4 KiB of I/O space unless told
otherwise.

To see this in action, create a q35 VM with 15 of these

  <interface type='network'>
    <source network='default'/>
    <model type='virtio'/>
  </interface>

in addition to the usual virtio devices: libvirt will automatically
allocate a pcie-root-port for each of them, and the result will look
like

  $ lspci -vt
  -[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
             +-01.0-[01]----00.0  Red Hat, Inc. Virtio network device
             +-01.1-[02]----00.0  Red Hat, Inc. Virtio console
             +-01.2-[03]----00.0  Red Hat, Inc. Virtio block device
             +-01.3-[04]----00.0  Red Hat, Inc. Virtio memory balloon
             +-01.4-[05]----00.0  Red Hat, Inc. Virtio RNG
             +-01.5-[06]----00.0  Red Hat, Inc. Virtio network device
             +-01.6-[07]----00.0  Red Hat, Inc. Virtio network device
             +-01.7-[08]----00.0  Red Hat, Inc. Virtio network device
             +-02.0-[09]----00.0  Red Hat, Inc. Virtio network device
             +-02.1-[0a]----00.0  Red Hat, Inc. Virtio network device
             +-02.2-[0b]----00.0  Red Hat, Inc. Virtio network device
             +-02.3-[0c]----00.0  Red Hat, Inc. Virtio network device
             +-02.4-[0d]----00.0  Red Hat, Inc. Virtio network device
             +-02.5-[0e]----00.0  Red Hat, Inc. Virtio network device
             +-02.6-[0f]----00.0  Red Hat, Inc. Virtio network device
             +-02.7-[10]----00.0  Red Hat, Inc. Virtio network device
             +-03.0-[11]----00.0  Red Hat, Inc. Virtio network device
             +-03.1-[12]----00.0  Red Hat, Inc. Virtio network device
             +-03.2-[13]----00.0  Red Hat, Inc. Virtio network device
             +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
             +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
             \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller

If we look at the kernel messages, we can see that I/O space
allocation wasn't successful for all devices:

  $ sudo dmesg | grep BAR
  [    0.681478] pci 0000:00:01.0: BAR 13: assigned [io  0x1000-0x1fff]
  [    0.681479] pci 0000:00:01.1: BAR 13: assigned [io  0x2000-0x2fff]
  [    0.681480] pci 0000:00:01.2: BAR 13: assigned [io  0x3000-0x3fff]
  [    0.681481] pci 0000:00:01.3: BAR 13: assigned [io  0x4000-0x4fff]
  [    0.681482] pci 0000:00:01.4: BAR 13: assigned [io  0x5000-0x5fff]
  [    0.681482] pci 0000:00:01.5: BAR 13: assigned [io  0x6000-0x6fff]
  [    0.681483] pci 0000:00:01.6: BAR 13: assigned [io  0x7000-0x7fff]
  [    0.681484] pci 0000:00:01.7: BAR 13: assigned [io  0x8000-0x8fff]
  [    0.681485] pci 0000:00:02.0: BAR 13: assigned [io  0x9000-0x9fff]
  [    0.681486] pci 0000:00:02.1: BAR 13: assigned [io  0xa000-0xafff]
  [    0.681486] pci 0000:00:02.2: BAR 13: assigned [io  0xb000-0xbfff]
  [    0.681487] pci 0000:00:02.3: BAR 13: assigned [io  0xd000-0xdfff]
  [    0.681488] pci 0000:00:02.4: BAR 13: assigned [io  0xe000-0xefff]
  [    0.681489] pci 0000:00:02.5: BAR 13: assigned [io  0xf000-0xffff]
  [    0.681490] pci 0000:00:02.6: BAR 13: no space for [io  size 0x1000]
  [    0.681491] pci 0000:00:02.6: BAR 13: failed to assign [io  size 0x1000]
  [    0.681492] pci 0000:00:02.7: BAR 13: no space for [io  size 0x1000]
  [    0.681492] pci 0000:00:02.7: BAR 13: failed to assign [io  size 0x1000]
  [    0.681493] pci 0000:00:03.0: BAR 13: no space for [io  size 0x1000]
  [    0.681494] pci 0000:00:03.0: BAR 13: failed to assign [io  size 0x1000]
  [    0.681495] pci 0000:00:03.1: BAR 13: no space for [io  size 0x1000]
  [    0.681495] pci 0000:00:03.1: BAR 13: failed to assign [io  size 0x1000]
  [    0.681496] pci 0000:00:03.2: BAR 13: no space for [io  size 0x1000]
  [    0.681497] pci 0000:00:03.2: BAR 13: failed to assign [io  size 0x1000]
  [    0.681498] pci 0000:00:03.2: BAR 13: assigned [io  0x1000-0x1fff]
  [    0.681499] pci 0000:00:03.1: BAR 13: assigned [io  0x2000-0x2fff]
  [    0.681500] pci 0000:00:03.0: BAR 13: assigned [io  0x3000-0x3fff]
  [    0.681501] pci 0000:00:02.7: BAR 13: assigned [io  0x4000-0x4fff]
  [    0.681502] pci 0000:00:02.6: BAR 13: assigned [io  0x5000-0x5fff]
  [    0.681502] pci 0000:00:02.5: BAR 13: assigned [io  0x6000-0x6fff]
  [    0.681503] pci 0000:00:02.4: BAR 13: assigned [io  0x7000-0x7fff]
  [    0.681504] pci 0000:00:02.3: BAR 13: assigned [io  0x8000-0x8fff]
  [    0.681505] pci 0000:00:02.2: BAR 13: assigned [io  0x9000-0x9fff]
  [    0.681506] pci 0000:00:02.1: BAR 13: assigned [io  0xa000-0xafff]
  [    0.681507] pci 0000:00:02.0: BAR 13: assigned [io  0xb000-0xbfff]
  [    0.681507] pci 0000:00:01.7: BAR 13: assigned [io  0xd000-0xdfff]
  [    0.681508] pci 0000:00:01.6: BAR 13: assigned [io  0xe000-0xefff]
  [    0.681509] pci 0000:00:01.5: BAR 13: assigned [io  0xf000-0xffff]
  [    0.681510] pci 0000:00:01.4: BAR 13: no space for [io  size 0x1000]
  [    0.681511] pci 0000:00:01.4: BAR 13: failed to assign [io  size 0x1000]
  [    0.681511] pci 0000:00:01.3: BAR 13: no space for [io  size 0x1000]
  [    0.681512] pci 0000:00:01.3: BAR 13: failed to assign [io  size 0x1000]
  [    0.681513] pci 0000:00:01.2: BAR 13: no space for [io  size 0x1000]
  [    0.681513] pci 0000:00:01.2: BAR 13: failed to assign [io  size 0x1000]
  [    0.681514] pci 0000:00:01.1: BAR 13: no space for [io  size 0x1000]
  [    0.681515] pci 0000:00:01.1: BAR 13: failed to assign [io  size 0x1000]
  [    0.681516] pci 0000:00:01.0: BAR 13: no space for [io  size 0x1000]
  [    0.681516] pci 0000:00:01.0: BAR 13: failed to assign [io  size 0x1000]

If we check the amount of I/O space reserved by each controller and
used by each device, we can see that a few of the pcie-root-ports
don't have any:

  $ sudo lspci -vv | grep -E '^[0-9]|I/O '
  00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
  00:01.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  00:01.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  00:01.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  00:01.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  00:01.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  00:01.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 0000f000-0000ffff [size=4K]
  00:01.6 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 0000e000-0000efff [size=4K]
  00:01.7 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 0000d000-0000dfff [size=4K]
  00:02.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 0000b000-0000bfff [size=4K]
  00:02.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 0000a000-0000afff [size=4K]
  00:02.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00009000-00009fff [size=4K]
  00:02.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00008000-00008fff [size=4K]
  00:02.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00007000-00007fff [size=4K]
  00:02.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00006000-00006fff [size=4K]
  00:02.6 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00005000-00005fff [size=4K]
  00:02.7 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00004000-00004fff [size=4K]
  00:03.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00003000-00003fff [size=4K]
  00:03.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00002000-00002fff [size=4K]
  00:03.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: 00001000-00001fff [size=4K]
  00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
  00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02) (prog-if 01 [AHCI 1.0])
  	Region 4: I/O ports at c040 [size=32]
  00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
  	Region 4: I/O ports at 0700 [size=64]
  01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  02:00.0 Communication controller: Red Hat, Inc. Virtio console (rev 01)
  03:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
  04:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon (rev 01)
  05:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio RNG (rev 01)
  06:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  07:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  08:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  09:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  0a:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  0b:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  0c:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  0d:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  0e:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  0f:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  10:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  11:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  12:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
  13:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)

We can also see that none of the virtio-net devices are using any:
this explains why, despite the kernel messages, the system is okay
and the network interfaces all work fine: in particular

  $ ip addr show dev enp1s0
  2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
      link/ether 52:54:00:7a:15:9b brd ff:ff:ff:ff:ff:ff
      inet 192.168.122.65/24 brd 192.168.122.255 scope global dynamic enp1s0
         valid_lft 3442sec preferred_lft 3442sec
      inet6 fe80::5054:ff:fe7a:159b/64 scope link
         valid_lft forever preferred_lft forever
  $ ls -l /sys/class/net/enp1s0
  lrwxrwxrwx 1 root root 0 Jan 14 19:43 /sys/class/net/enp1s0 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/virtio0/net/enp1s0

So enp1s0, aka 0000:01:00.0, is plugged into 0000:00:01.0, which
according to the output above has no I/O space reserved to it, and
yet that's the very interface that was used when I ssh'd into the VM.

So far so good! If you try, however, to replace all virtio-net
devices with e1000e devices, which apparently need a tiny amount of
I/O space to operate, then the guest will no longer boot even to the
point where the kernel is loaded. I guess SeaBIOS can't figure out
how to assign I/O space to all devices that want some, and so it
simply gives up?

Note that adding io-reserve=0 to all pcie-root-port does exactly
*nothing* to address this situation: whether or not the flag is
present, boot will always get stuck before the kernel is even loaded.

Another interesting / confusing thing that I noticed: if I disable
I/O space for all pcie-root-ports and then attach an e1000e network
device to one, I end up with this situation:

  -[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
             +-01.0-[01]----00.0  Intel Corporation 82574L Gigabit Network Connection

  00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
  00:01.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
  	Region 2: I/O ports at 1000 [size=32]

However, I also have

  16: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
      link/ether 52:54:00:7a:15:9b brd ff:ff:ff:ff:ff:ff
      inet 192.168.122.65/24 brd 192.168.122.255 scope global dynamic enp1s0
         valid_lft 3474sec preferred_lft 3474sec
      inet6 fe80::5054:ff:fe7a:159b/64 scope link 
         valid_lft forever preferred_lft forever

and guest <-> host communication works perfectly.

I remember learning from someone, at some point, that PCI Express
devices are required to work without I/O space, so maybe the e1000e
would use it if available, but can cope with it being absent? That
would explain it... However, I also tried using e1000, which is a
conventional PCI device, instead of e1000e, and it still worked.

Comment 15 RHEL Program Management 2021-01-15 07:29:32 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 16 Laszlo Ersek 2021-01-15 08:34:28 UTC
(In reply to Andrea Bolognani from comment #14)

> So far so good! If you try, however, to replace all virtio-net
> devices with e1000e devices, which apparently need a tiny amount of
> I/O space to operate, then the guest will no longer boot even to the
> point where the kernel is loaded. I guess SeaBIOS can't figure out
> how to assign I/O space to all devices that want some, and so it
> simply gives up?

Basically: yes. If you capture the seabios log, you'll see more details, I believe.

> Note that adding io-reserve=0 to all pcie-root-port does exactly
> *nothing* to address this situation: whether or not the flag is
> present, boot will always get stuck before the kernel is even loaded.

It seems plausible that SeaBIOS's resource allocation for a PCIe root port is primarily driven by the needs of the device behind that port. After all, the property "io-reserve" says "reservation" in the name, and if there's a cold-plugged device in the port, it makes sense that you can't "un-reserve" the IO space that the device is actively asking for. IOW, I think io-reserve=0 might only make a difference if the port is empty, at boot. The property decides about reserving vs. not reserving for hotplug purposes; it likely cannot override actual resource needs coming from the downstream side of the port.

(I'm saying "likely" because I'm not familiar with the SeaBIOS internals.)

I think it does make sense to expose this property in the domain XML. You could want to have 20 *empty* PCIe root ports at boot, with a plan to hot-plug up to 20 PCI Express devices, with none of those devices needing any IO space.

Comment 18 Andrea Bolognani 2021-01-18 19:08:43 UTC
(In reply to Laszlo Ersek from comment #16)
> (In reply to Andrea Bolognani from comment #14)
> > So far so good! If you try, however, to replace all virtio-net
> > devices with e1000e devices, which apparently need a tiny amount of
> > I/O space to operate, then the guest will no longer boot even to the
> > point where the kernel is loaded. I guess SeaBIOS can't figure out
> > how to assign I/O space to all devices that want some, and so it
> > simply gives up?
> 
> Basically: yes. If you capture the seabios log, you'll see more details, I
> believe.

You were right: when SeaBIOS gets stuck, it prints

  PCI: out of I/O address space

to the log.

> > Note that adding io-reserve=0 to all pcie-root-port does exactly
> > *nothing* to address this situation: whether or not the flag is
> > present, boot will always get stuck before the kernel is even loaded.
> 
> It seems plausible that SeaBIOS's resource allocation for a PCIe root port
> is primarily driven by the needs of the device behind that port. After all,
> the property "io-reserve" says "reservation" in the name, and if there's a
> cold-plugged device in the port, it makes sense that you can't "un-reserve"
> the IO space that the device is actively asking for. IOW, I think
> io-reserve=0 might only make a difference if the port is empty, at boot. The
> property decides about reserving vs. not reserving for hotplug purposes; it
> likely cannot override actual resource needs coming from the downstream side
> of the port.
> 
> (I'm saying "likely" because I'm not familiar with the SeaBIOS internals.)

This makes sense, and testing at least partially confirms this: if I
boot with an e1000e plugged into a pcie-root-port,io-reserve=0, I get

  00:01.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
  	Region 2: I/O ports at 1000 [size=32]

but if I leave the PCI port empty and hotplug the network adapter,
the result is

  00:01.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
  	Region 2: I/O ports at 1000 [disabled] [size=32]

The network connection works in both cases.

I still can't understand how the e1000 can work in this scenario:
regardless of whether coldplug or hotplug are used, it still shows up
as

  00:01.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
  	I/O behind bridge: [disabled]
  01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
  	Region 1: I/O ports at 1000 [size=64]

and works fine...

> I think it does make sense to expose this property in the domain XML. You
> could want to have 20 *empty* PCIe root ports at boot, with a plan to
> hot-plug up to 20 PCI Express devices, with none of those devices needing
> any IO space.

Yeah, I can see that being a reasonable use case, at least in theory.
In practice, it looks like whether or not I/O space is reserved,
required or used is not such a clear-cut question... But it probably
still makes sense to expose this know at the libvirt level.

Reopening the bug. Thank you for all your useful input, Laszlo! :)

Comment 22 RHEL Program Management 2021-07-18 07:30:08 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.