Bug 2006409

Summary: SeaBIOS fails with "PCI: out of I/O address space" after switch to ACPI based hotplug
Product: Red Hat Enterprise Linux 9 Reporter: David Vallee Delisle <dvd>
Component: qemu-kvmAssignee: Julia Suvorova <jusual>
qemu-kvm sub component: PCI QA Contact: jingzhao <jinzhao>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: ailan, apevec, berrange, bstinson, chayang, coli, jinzhao, jusual, juzhang, jwboyer, kchamart, kraxel, laine, michele, virt-maint, xuwei, yiwei
Version: CentOS StreamKeywords: Regression, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-02 14:33:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1744438    
Attachments:
Description Flags
Seabios debug logs
none
instance pc-q35-rhel8.3.0 xml
none
qemu instance logs
none
instance pc-i440fx machine none

Description David Vallee Delisle 2021-09-21 16:54:12 UTC
Created attachment 1824993 [details]
Seabios debug logs

Description of problem:
When nova spins a VM on CentOS 9 Stream, seabios fails with "PCI: out of I/O address space"

Version-Release number of selected component (if applicable):
qemu-img-6.1.0-2.el9.x86_64
qemu-kvm-tools-6.1.0-2.el9.x86_64
seabios-bin-1.14.0-6.el9.noarch
qemu-virtiofsd-6.1.0-2.el9.x86_64
qemu-pr-helper-6.1.0-2.el9.x86_64
qemu-kvm-docs-6.1.0-2.el9.x86_64
ipxe-roms-qemu-20200823-7.git4bd064de.el9.noarch
qemu-kvm-common-6.1.0-2.el9.x86_64
qemu-kvm-block-ssh-6.1.0-2.el9.x86_64
qemu-kvm-block-curl-6.1.0-2.el9.x86_64
qemu-kvm-hw-usbredir-6.1.0-2.el9.x86_64
qemu-kvm-ui-opengl-6.1.0-2.el9.x86_64
qemu-kvm-audio-pa-6.1.0-2.el9.x86_64
qemu-kvm-block-rbd-6.1.0-2.el9.x86_64
qemu-kvm-core-6.1.0-2.el9.x86_64
libvirt-libs-7.6.0-2.el9.x86_64
libvirt-daemon-7.6.0-2.el9.x86_64
libvirt-daemon-driver-nwfilter-7.6.0-2.el9.x86_64
libvirt-daemon-driver-qemu-7.6.0-2.el9.x86_64
libvirt-daemon-driver-secret-7.6.0-2.el9.x86_64
libvirt-daemon-driver-storage-core-7.6.0-2.el9.x86_64
libvirt-daemon-driver-nodedev-7.6.0-2.el9.x86_64
libvirt-client-7.6.0-2.el9.x86_64
python3-libvirt-7.6.0-1.el9.x86_64
libvirt-daemon-config-nwfilter-7.6.0-2.el9.x86_64
qemu-kvm-6.1.0-2.el9.x86_64
qemu-kvm-debugsource-6.1.0-2.el9.x86_64
qemu-kvm-debuginfo-6.1.0-2.el9.x86_64

How reproducible:
All the time

Steps to Reproduce:
1. Deploy TripleO upstream master branch using CentOS 9
2. Spin a simple VM

Actual results:
VM is not booting. When enabling seabios debug


Expected results:
Instance should boot.

Additional info:
- Possible related discussion in bz1408810#c16
- When setting /etc/nova/nova.conf libvirt num_pcie_ports 4, there's no issue. (default value is 16)
- When using pc-i440fx-rhel7.6.0 and num_pcie_ports=16, there's no issue
- Reproduced with pc-q35-rhel8.3.0 and pc-q35-rhel8.5.0

Comment 1 David Vallee Delisle 2021-09-21 16:55:30 UTC
Created attachment 1824994 [details]
instance pc-q35-rhel8.3.0 xml

Comment 2 David Vallee Delisle 2021-09-21 17:00:04 UTC
Created attachment 1824995 [details]
qemu instance logs

Comment 3 Daniel Berrangé 2021-09-21 17:21:36 UTC
I'm moving this to qemu-kvm component and marking a regression, since the config provided works fine in QEMU 6.0.0 RPMs and fails with 6.1.0

Comment 4 David Vallee Delisle 2021-09-21 17:23:48 UTC
Created attachment 1825000 [details]
instance pc-i440fx machine

For what it's worth, it looks like i440fx machine type doesn't have any pcie-root-port as opposed to pc-q35 ones [1].

[1]
~~~
$ grep -c "model='pcie-root-port'" instance*.xml 
instance-pci440fx.xml:0
instance.xml:17
~~~

Comment 5 Daniel Berrangé 2021-09-21 17:33:28 UTC
This is fixed by setting

  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off'/>
  </qemu:commandline>

aka

  -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off


IOW, looks like the change in PCI hotplug impl has broken the ability to run guests with many pcie-root-ports.

Comment 6 Daniel Berrangé 2021-09-21 17:49:02 UTC
> - When setting /etc/nova/nova.conf libvirt num_pcie_ports 4, there's no issue. (default value is 16)

The threshold for seeing trouble appears to be '15' - ie it works with 14 pcie-root-ports, fails with 15 pcie-root-ports .

Comment 7 Laine Stump 2021-09-21 20:16:18 UTC
I had been told in a discussion a couple weeks ago that the QEMU change in defaults to enable ACPI hotplug on Q35 only affected new q35 machinetypes, not existing q35 machinetypes (I asked because, although I'm completely unfamiliar with QEMU source, I didn't see anything in the patch switching the default to indicate it was only on new machinetypes; I wrote it off to my ignorance of the source).

Since the config specifically says "pc-q35-rhel8.3.0", I guess that's not the case, and the change in default takes effect for *all* Q35 machinetypes? Isn't that a violation of guest ABI, which could potentially break cross-version migration?

Comment 8 Daniel Berrangé 2021-09-22 09:18:42 UTC
(In reply to Laine Stump from comment #7)
> I had been told in a discussion a couple weeks ago that the QEMU change in
> defaults to enable ACPI hotplug on Q35 only affected new q35 machinetypes,
> not existing q35 machinetypes (I asked because, although I'm completely
> unfamiliar with QEMU source, I didn't see anything in the patch switching
> the default to indicate it was only on new machinetypes; I wrote it off to
> my ignorance of the source).

What you were told is correct for *upstream* machine types. The
pc-q35-6.1.0 machine type references pc_compat_6_0 which contains
the setting { "ICH9-LPC", "acpi-pci-hotplug-with-bridge-support", "off" },

It would be correct for RHEL machine types too, except that we haven't
actually done the RHEL machine types when rebasing and I don't seee
anything referencing pc_compat_6_0 which is needed to ensure old hotplug
for old machine types.

> Since the config specifically says "pc-q35-rhel8.3.0", I guess that's not
> the case, and the change in default takes effect for *all* Q35 machinetypes?
> Isn't that a violation of guest ABI, which could potentially break
> cross-version migration?

The test was originally using pc-q35-rhel8.5.0 and we suspected this hotplug 
change, so switched to pc-q35-rhel8.3.0. This is when we discovered that the
 6.1.0 rebase had broken all existing machine types because it didn't wire 
up back compat yet.

Comment 9 Gerd Hoffmann 2021-09-23 08:03:35 UTC
(In reply to Daniel Berrangé from comment #6)
> > - When setting /etc/nova/nova.conf libvirt num_pcie_ports 4, there's no issue. (default value is 16)
> 
> The threshold for seeing trouble appears to be '15' - ie it works with 14
> pcie-root-ports, fails with 15 pcie-root-ports .

Yep.  Each pci bridge gets 1k of io address space, we have 16k in total,
some of that is used for isa io and pci root bus, leaving enough io address
space for 14 pci bridges.

seabios assigns io address space to bridges only in case it finds a device
with io ports behind the bridge, or qemu asks for it via reserve-io hint.
The latter happens unconditionally now with acpi hotplug.

commit e2a6290aab578b2170c1f5909fa556385dc0d820
Author: Marcel Apfelbaum <marcel.apfelbaum>
Date:   Mon Aug 2 12:00:57 2021 +0300

    hw/pcie-root-port: Fix hotplug for PCI devices requiring IO
    
    Q35 has now ACPI hotplug enabled by default for PCI(e) devices.
    As opposed to native PCIe hotplug, guests like Fedora 34
    will not assign IO range to pcie-root-ports not supporting
    native hotplug, resulting into a regression.
    
    Reproduce by:
        qemu-bin -M q35 -device pcie-root-port,id=p1 -monitor stdio
        device_add e1000,bus=p1
    In the Guest OS the respective pcie-root-port will have the IO range
    disabled.
    
    Fix it by setting the "reserve-io" hint capability of the
    pcie-root-ports so the firmware will allocate the IO range instead.
    
    Acked-by: Igor Mammedov <imammedo>
    Signed-off-by: Marcel Apfelbaum <marcel>
    Message-Id: <20210802090057.1709775-1-marcel>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>

Comment 10 David Vallee Delisle 2021-09-27 14:14:08 UTC
(In reply to Gerd Hoffmann from comment #9)
> (In reply to Daniel Berrangé from comment #6)
> > > - When setting /etc/nova/nova.conf libvirt num_pcie_ports 4, there's no issue. (default value is 16)
> > 
> > The threshold for seeing trouble appears to be '15' - ie it works with 14
> > pcie-root-ports, fails with 15 pcie-root-ports .
> 
> Yep.  Each pci bridge gets 1k of io address space, we have 16k in total,
> some of that is used for isa io and pci root bus, leaving enough io address
> space for 14 pci bridges.
> 
> seabios assigns io address space to bridges only in case it finds a device
> with io ports behind the bridge, or qemu asks for it via reserve-io hint.
> The latter happens unconditionally now with acpi hotplug.
> 
> commit e2a6290aab578b2170c1f5909fa556385dc0d820
> Author: Marcel Apfelbaum <marcel.apfelbaum>
> Date:   Mon Aug 2 12:00:57 2021 +0300
> 
>     hw/pcie-root-port: Fix hotplug for PCI devices requiring IO
>     
>     Q35 has now ACPI hotplug enabled by default for PCI(e) devices.
>     As opposed to native PCIe hotplug, guests like Fedora 34
>     will not assign IO range to pcie-root-ports not supporting
>     native hotplug, resulting into a regression.
>     
>     Reproduce by:
>         qemu-bin -M q35 -device pcie-root-port,id=p1 -monitor stdio
>         device_add e1000,bus=p1
>     In the Guest OS the respective pcie-root-port will have the IO range
>     disabled.
>     
>     Fix it by setting the "reserve-io" hint capability of the
>     pcie-root-ports so the firmware will allocate the IO range instead.
>     
>     Acked-by: Igor Mammedov <imammedo>
>     Signed-off-by: Marcel Apfelbaum <marcel>
>     Message-Id: <20210802090057.1709775-1-marcel>
>     Reviewed-by: Michael S. Tsirkin <mst>
>     Signed-off-by: Michael S. Tsirkin <mst>

We have a concern which we couldn't validate yet, but what if operators are adding PCI-PT devices. Is that counting toward this limitation? If an operator needs 20 PCI-PT, is it going to work?

Comment 11 Daniel Berrangé 2021-09-27 14:50:25 UTC
(In reply to David Vallee Delisle from comment #10)
> We have a concern which we couldn't validate yet, but what if operators are
> adding PCI-PT devices. Is that counting toward this limitation? If an
> operator needs 20 PCI-PT, is it going to work?

The limit comes from the pcie-root-ports, and essentially every device needs its own pcie-root-port, so yes, PCI passthrough devices will count towards this limit.

QEMU needs to be fixed to get rid of this regression in consumption of I/O space.

Comment 12 John Ferlan 2021-09-30 11:49:23 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 14 Alan Pevec 2021-11-18 13:44:41 UTC
Is there a CS9 version of a patch/build from RHEL8 bug 2007129 ?

Comment 24 Chao Yang 2022-02-08 03:50:27 UTC
Hi David,

Can you please confirm the latest CentOS stream compose has fixed the issue? The fix should be already included in qemu-kvm-6.2.0-1.el9 and newer. Please close this bug if it works for you. Thank you.

Comment 25 Julia Suvorova 2022-06-02 14:33:07 UTC
Yes, it's fixed in the latest release.