1365623 – [PCI] The default MMIO range reserved by firmware for PCI bridges is not enough to hotplug virtio-1 devices

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1365623 - [PCI] The default MMIO range reserved by firmware for PCI bridges is not enough to hotplug virtio-1 devices

Summary: [PCI] The default MMIO range reserved by firmware for PCI bridges is not enou...

Keywords:
Status:	CLOSED DUPLICATE of bug 1365613
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	seabios
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Marcel Apfelbaum
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	1365613
Blocks:	1365619
TreeView+	depends on / blocked

Reported:	2016-08-09 17:31 UTC by Marcel Apfelbaum
Modified:	2016-10-31 16:52 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1365613
Environment:
Last Closed:	2016-09-08 09:35:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1365613	0	high	CLOSED	[PCI] The default MMIO range reserved by firmware for PCI bridges is not enough to hotplug virtio-1 devices	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1390316	0	medium	CLOSED	PCIe: Add Generic PCIe Root Ports	2021-02-22 00:41:40 UTC

Internal Links: 1365613 1390316

Description Marcel Apfelbaum 2016-08-09 17:31:05 UTC

+++ This bug was initially created as a clone of Bug #1365613 +++

The virtio 1.0 devices need 8MB MMIO space while SeaBIOS(OVMF?) reserves only 2MB.

The reason the firmware reserves 'only' 2MB MMIO is because it is the minimum allowed by the PCI Spec. It will be unreasonable to default in firmware to a larger MMIO window because it is related only 2 virtio. What about assigned devices that may need more?

I propose the following solution. Add a mmio-window-size parameter to pci bridges and pass all the information to firmware using a para-virt channel like fwconfig.

The firmware can use this info to increase the mmio range for our devices.

We can default the mmio-window-size to 8MB for PCIe ports (which are seen by the firmware as PCI bridges). This will allow hot-plugging virtio-1 devices into PCIe ports with no problem.

Regarding the legacy pci-bridges, the default size is not so clear. I propose to
leave it as is with the following reasoning: On pc 'legacy' machines it is likely we don't need to hotplug virtio-1 'modern' devices, just hotplug virtio 0.9 devices instead.  

Side-note:
Some guests, Linux for example, will try to re-balance the PCI root bus MMIO space and assign to pci-bridges/pcie-ports more memory space. Sometimes the guest OS will succeed and sometimes not.

Comment 1 Marcel Apfelbaum 2016-08-09 17:32:10 UTC

If the mechanism will be accepted upstream, SeaBIOS should use the corresponding fw_cfg to set the size of the pci bridge ranges.

Comment 2 Gerd Hoffmann 2016-08-10 08:57:41 UTC

  Hi,

> The reason the firmware reserves 'only' 2MB MMIO is because it is the
> minimum allowed by the PCI Spec.

Well, mmio bars are typically small (mmio as in "register space", not device memory exported as pci bar).  2M is plenty usually.  Picked it because large pages (on x86) are 2M too.

> It will be unreasonable to default in
> firmware to a larger MMIO window because it is related only 2 virtio.

Why?  I don't think it is a big deal to make them larger by default.

Or simply don't assign anything to bridges without devices and expect the guest os to setup that on hotplug.  Would avoid address space fragmentation.

> What about assigned devices that may need more?

Let the guest os handle it?

> I propose the following solution. Add a mmio-window-size parameter to pci
> bridges and pass all the information to firmware using a para-virt channel
> like fwconfig.

No fw_cfg please.

We want generic pcie ports anyway.  If we *really* have to make this configurable we could add a vendor-specific pci capability to these ports,
with recommended window sizes (io/mem/prefmem) for the firmware.

Comment 3 Marcel Apfelbaum 2016-08-14 15:42:43 UTC

(In reply to Gerd Hoffmann from comment #2)
>   Hi,
> 

Hi Gerd,

> > The reason the firmware reserves 'only' 2MB MMIO is because it is the
> > minimum allowed by the PCI Spec.
> 
> Well, mmio bars are typically small (mmio as in "register space", not device
> memory exported as pci bar).  2M is plenty usually.  Picked it because large
> pages (on x86) are 2M too.
> 

Looked OK to me. (until virtio 1.0)

> > It will be unreasonable to default in
> > firmware to a larger MMIO window because it is related only 2 virtio.
> 
> Why?  I don't think it is a big deal to make them larger by default.
> 

Because we shouldn't change the firmware just because a virtio specific issue.
We'll say 8MB because of virtio, somebody else will say 10MB because of their
hardware, where will it end?

Also 8MB would be OK for a PCIe port, but what about a pci-bridge?
Should we reserve (32 - existing at boot time devices) * 8MB ? 

> Or simply don't assign anything to bridges without devices and expect the
> guest os to setup that on hotplug.  Would avoid address space fragmentation.
> 

This would be a problem for legacy PC (i440fx) machines. I encountered a few  scenarios where for the i440fx machine the linux Guest OS will not trigger
pci resource rebalancing, while for Q35 it will.

> > What about assigned devices that may need more?
> 
> Let the guest os handle it?
> 

Q35 will likely succeed to make room if you want to plug a device into
a PCIe root port or latest switch downstream port. What about the other
switch ports? What about more complicate configurations like nested switches? I see this is as a limitation.

> > I propose the following solution. Add a mmio-window-size parameter to pci
> > bridges and pass all the information to firmware using a para-virt channel
> > like fwconfig.
> 
> No fw_cfg please.
> 

Understood.

> We want generic pcie ports anyway.  If we *really* have to make this
> configurable we could add a vendor-specific pci capability to these ports,
> with recommended window sizes (io/mem/prefmem) for the firmware.

I think vendor-specific capability is a good idea.

I don't 'really' want to do anything :), I just think
the above limitations deserve some attention. 

Before going forward, if you still think enlarging the default MMIO window
to 8MB is enough, I could live with that. (pc machines being less usable)

Laslzo, do you have any opinion on this regarding OVMF?

Thanks,
Marcel

Comment 4 Michael S. Tsirkin 2016-08-14 17:41:39 UTC

I am inclined to say, we can just make virtio use less MMIO memory for now.

Two ways to do this:


1. We currently allocate 4K per possible VQ, the idea was to enable
split drivers within guests. No one seems to be interested
in this, so I think we could simply add a property e.g.
"page-per-vq" and default it to off for new machine types.

When off, reserve something like 4 bytes per VQ.

2. Another approach is figuring out how many VQs we
might need and not reserving memory for these that won't
ever be used.

Comment 5 Michael S. Tsirkin 2016-08-14 21:42:48 UTC

3. Reduce max # of VQs by 1 (1023 and not 1014).
   Now we can cut the BAR size by half, since
   the config space will fit in the 1st 4K,
   so we can make do with a 4M BAR.

Comment 6 Laszlo Ersek 2016-08-15 18:51:31 UTC

For OVMF:

- fw_cfg would work, as a hint, but Gerd doesn't like it :)
- vendor capabilities might work, but I can't promise it until I try to
  implement them in the EFI PCI Hotplug Init Protocol
- statically raising the size from 2MB to 4MB (or 8MB) would be simplest, but
  IIRC my test results, edk2 forced the reservations into 32-bit address space
  (despite my code's explicit request to put them into 64-bit address space), so
  that's not great.

My vote is "whatever is simplest".

Comment 7 Gerd Hoffmann 2016-09-05 13:02:05 UTC

  Hi,

> 2. Another approach is figuring out how many VQs we
> might need and not reserving memory for these that won't
> ever be used.

I like that one.  Only virtio devices with multiqueue support will ever come even close to 1024, and those have a num_queues property and can calculate the number of queues needed based on that.  For all other devices something like 16 queues is probably more than enough, which would be 32 pages / 128k of mmio space.

Comment 8 Marcel Apfelbaum 2016-09-05 20:06:42 UTC

(In reply to Michael S. Tsirkin from comment #5)
> 3. Reduce max # of VQs by 1 (1023 and not 1014).
>    Now we can cut the BAR size by half, since
>    the config space will fit in the 1st 4K,
>    so we can make do with a 4M BAR.

Still not enough. The firmware reserves 2M today.

Comment 9 Marcel Apfelbaum 2016-09-05 20:09:46 UTC

(In reply to Gerd Hoffmann from comment #7)
>   Hi,
> 
> > 2. Another approach is figuring out how many VQs we
> > might need and not reserving memory for these that won't
> > ever be used.
> 
> I like that one.  Only virtio devices with multiqueue support will ever come
> even close to 1024, and those have a num_queues property and can calculate
> the number of queues needed based on that.  For all other devices something
> like 16 queues is probably more than enough, which would be 32 pages / 128k
> of mmio space.


Michael, can you please advice on how to actually do that?
I don't think I have enough virtio knowledge to start approaching this.

Comment 10 Michael S. Tsirkin 2016-09-06 01:40:35 UTC

Sorry, I'm not sure how to implement 2) exactly.
One would have to refactor code significantly since
queues are added in virtio device realize
callback, while pci regions are initialized earlier,
during bus initialization.

For 1) it's easy - we currently have:

#define QEMU_VIRTIO_PCI_QUEUE_MEM_MULT 0x1000

You'll have to make it 4 for new machine types,
and keep it at 4K for old ones, by converting
it from a macro to a property.

similar for 3).

Comment 11 Gerd Hoffmann 2016-09-06 13:08:27 UTC

(In reply to Michael S. Tsirkin from comment #10)
> Sorry, I'm not sure how to implement 2) exactly.
> One would have to refactor code significantly since
> queues are added in virtio device realize
> callback, while pci regions are initialized earlier,
> during bus initialization.

All multiqueue devices have device properties you can use to calculate the number of virt-queues needed in advance.  For the other devices some fixed
limit should do the trick.

Comment 12 Marcel Apfelbaum 2016-09-08 08:40:14 UTC

Patch posted upstream:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg395607.html

Comment 13 Laszlo Ersek 2016-09-08 08:51:19 UTC

So how does this change the MMIO BAR size in concrete numbers? Thanks.

Comment 14 Gerd Hoffmann 2016-09-08 08:58:33 UTC

8M -> 16k

Comment 15 Laszlo Ersek 2016-09-08 09:19:15 UTC

Okay, thanks. Are we considering this a temporary mitigation until the

  [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

discussion completes?

In other words, do we expect this QEMU patch to make virtio-1.0 devices hotpluggable *before* the PCIe guidelines materialize?

My question is motivated by OVMF's BZ-Buddy bug 1365619. The drop in the MMIO BAR size (from the QEMU size) will make the 2MB allocation from OVMF suffice, but there's another pending change (for hotplug in general) that IO allocation should be disabled for PCI Express downstream ports and rooot ports. Otherwise any kind of PCI Express hotplug might not work due to (independent) IO space exhaustion.

Is the IO allocation size also in scope, for this family of BZs? Because I've been planning to delay any OVMF patches until the guidelines materialize.

If the IO allocation tweaking is out of scope, then I think:
- this bug should be closed as NOTABUG,
- the OVMF bug 1365619 should also be closed as NOTABUG,
- the common clone-origin bug 1365613 should track Marcel's upstream patch
  (noted in comment 12 here).

Thanks.

Comment 17 Marcel Apfelbaum 2016-09-08 09:35:56 UTC

(In reply to Laszlo Ersek from comment #15)
> Okay, thanks. Are we considering this a temporary mitigation until the
> 
>   [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines
> 
> discussion completes?
> 
> In other words, do we expect this QEMU patch to make virtio-1.0 devices
> hotpluggable *before* the PCIe guidelines materialize?
> 

This solution is independent from the PCIe guidelines (we need it for 7.3, no much time left). Virtio 1.0 will be hot-pluggable after this patch, of course.

I do intend to implement the  PCI capability of expected pre-allocated MMIO/IO range, but within the scope of the new PCIe controllers (not Intel emulated)

> My question is motivated by OVMF's BZ-Buddy bug 1365619. The drop in the
> MMIO BAR size (from the QEMU size) will make the 2MB allocation from OVMF
> suffice, but there's another pending change (for hotplug in general) that IO
> allocation should be disabled for PCI Express downstream ports and rooot
> ports. Otherwise any kind of PCI Express hotplug might not work due to
> (independent) IO space exhaustion.
> 
> Is the IO allocation size also in scope, for this family of BZs? Because
> I've been planning to delay any OVMF patches until the guidelines
> materialize.
> 
> If the IO allocation tweaking is out of scope, then I think:
> - this bug should be closed as NOTABUG,
> - the OVMF bug 1365619 should also be closed as NOTABUG,
> - the common clone-origin bug 1365613 should track Marcel's upstream patch
>   (noted in comment 12 here).
> 

Agreed, closing this BZ since we found a QEMU only solution.

> Thanks.

*** This bug has been marked as a duplicate of bug 1365613 ***

Note You need to log in before you can comment on or make changes to this bug.