RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1967187 - [aarch64][libvirt] Add Support of pcie-expander-bus controllers for ARM
Summary: [aarch64][libvirt] Add Support of pcie-expander-bus controllers for ARM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: aarch64
OS: Linux
medium
medium
Target Milestone: beta
: ---
Assignee: Andrea Bolognani
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1967502
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-02 15:10 UTC by Eric Auger
Modified: 2022-05-17 13:02 UTC (History)
13 users (show)

Fixed In Version: libvirt-7.7.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-17 12:45:04 UTC
Type: Feature Request
Target Upstream Version: 7.7.0
Embargoed:


Attachments (Terms of Use)
debug guest xml (6.72 KB, text/plain)
2021-09-13 10:34 UTC, Yiding Liu (Fujitsu)
no flags Details
qemu-cmdline (4.43 KB, text/plain)
2021-09-14 08:28 UTC, Yiding Liu (Fujitsu)
no flags Details
The new guest xml (6.72 KB, text/plain)
2021-09-14 08:30 UTC, Yiding Liu (Fujitsu)
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2022:2390 0 None None None 2022-05-17 12:45:18 UTC

Description Eric Auger 2021-06-02 15:10:50 UTC
pcie-expander-bus controllers do not seem to be support on ARM virt machine.

Attempting to add:
<controller type='pci' index='1' model='pcie-expander-bus'>
  <model name='pxb-pcie'/>
  <target busNr='10'/>
</controller>
 
leads to the following error message:
error: unsupported configuration: pcie-expander-bus controllers are only supported on q35-based machinetypes

This BZ to track the support of this feature on ARM, at libvirt level. This can be useful for NUMA and device assignment.

Comment 1 Eric Auger 2021-06-03 09:22:51 UTC
Besides the qemu dependency, rhel9 guest seems to crash with expander bridge/root port and virtio-net-pci. A fedora guest works though. the page size seem to be the cause.

Comment 2 Eric Auger 2021-06-24 08:06:05 UTC
As per the discussion held in BZ1967502, it turns out that we would need to prevent EDK2 from allocating Io16 for root ports attached to any PXBs.

For instance this must translate at qemu level into something like
-device pxb-pcie,bus_nr=4,id=bridge,bus=pcie.0 -device pcie-root-port,bus=bridge,chassis=4,id=pcie.11,io-reserve=0

Note io-reserve=0 could be set globally as well (-global pcie-root-port.io-reserve=0) but this may possibly impact existing use cases so I would prefer we set the limitation to root ports plugged onto the PXB.

Comment 3 Andrea Bolognani 2021-06-28 15:19:32 UTC
(In reply to Eric Auger from comment #0)
> pcie-expander-bus controllers do not seem to be support on ARM virt machine.
> 
> Attempting to add:
> <controller type='pci' index='1' model='pcie-expander-bus'>
>   <model name='pxb-pcie'/>
>   <target busNr='10'/>
> </controller>
>  
> leads to the following error message:
> error: unsupported configuration: pcie-expander-bus controllers are only
> supported on q35-based machinetypes

This restriction should be easy to lift.

(In reply to Eric Auger from comment #2)
> As per the discussion held in BZ1967502,

I believe you meant Bug 1967494 here?

> it turns out that we would need to
> prevent EDK2 from allocating Io16 for root ports attached to any PXBs.
>
> For instance this must translate at qemu level into something like
> -device pxb-pcie,bus_nr=4,id=bridge,bus=pcie.0 -device
> pcie-root-port,bus=bridge,chassis=4,id=pcie.11,io-reserve=0
>
> Note io-reserve=0 could be set globally as well (-global
> pcie-root-port.io-reserve=0) but this may possibly impact existing use cases
> so I would prefer we set the limitation to root ports plugged onto the PXB.

Several years ago, Bug 1408810 was opened to track exposing the
io-reserve option (which didn't yet exist in QEMU at the time) via
libvirt; earlier this year, it was decided that the usefulness of
that option was too limited to warrant the necessary development and
QE effort.

If using the option is the only way to get pxb-pcie devices to work
on aarch64, I think that changes the calculus and possibly requires
us to revisit the decision.

Note however that, even if we decide to expose the io-reserve option
at the libvirt level after all, it will be something that users and
management applications have to explicitly opt into on a per root
port basis rather than something that gets automatically added by
libvirt for root ports that are behind a pxb-pcie. Mechanism vs
policy and all that :)

Comment 5 Eric Auger 2021-06-28 15:54:01 UTC
> I believe you meant Bug 1967494 here?

yes sorry

> Several years ago, Bug 1408810 was opened to track exposing the
> io-reserve option (which didn't yet exist in QEMU at the time) via
> libvirt; earlier this year, it was decided that the usefulness of
> that option was too limited to warrant the necessary development and
> QE effort.
> 
> If using the option is the only way to get pxb-pcie devices to work
> on aarch64, I think that changes the calculus and possibly requires
> us to revisit the decision.
> 
> Note however that, even if we decide to expose the io-reserve option
> at the libvirt level after all, it will be something that users and
> management applications have to explicitly opt into on a per root
> port basis rather than something that gets automatically added by
> libvirt for root ports that are behind a pxb-pcie. Mechanism vs
> policy and all that :)

I would rather not expose the io-reserve option at libvirt level. I would rather hardcode it for root ports downstream to the PXB. Anyway allowing the end-user to set its value to something different would lead to a guest kernel crash on aarch64.

Thanks

Eric

Comment 8 Andrea Bolognani 2021-06-29 16:21:41 UTC
(In reply to Eric Auger from comment #5)
> > Note however that, even if we decide to expose the io-reserve option
> > at the libvirt level after all, it will be something that users and
> > management applications have to explicitly opt into on a per root
> > port basis rather than something that gets automatically added by
> > libvirt for root ports that are behind a pxb-pcie. Mechanism vs
> > policy and all that :)
> 
> I would rather not expose the io-reserve option at libvirt level. I would
> rather hardcode it for root ports downstream to the PXB.

I don't think this approach would fly upstream. libvirt feels very
strongly about only offering mechanisms, and this smells an awful lot
like policy.

I'm also failing to understand why this was never a problem until
now. If edk2 will only do 4k alignment (because of the EFI
specification), and the kernel can't deal with anything but 64k
alignment, why is the presence of a PXB necessary to trigger the
issue?

As I understand it, the value passed to io-reserve is merely used as
a hint by the firmware: if the root port has io-reserve=0 but the
device that's plugged into it requests I/O space, then the firmware
will still dutifully comply with such a request.

So io-reserve=0 only really makes a difference for a root port that's
left empty for future hotplug purposes, which without it would get
some I/O space allocated to it. I would expect this issue to show up
even in basic hotplugging scenarios then... How come that was not the
case?

One last note: with q35, adding too many empty root ports to a
machine without disabling I/O space reservation for them results in
some of those ports not working correctly, but never in a kernel
crash. Can the aarch64 kernel be taught to deal with this scenario
in a similarly graceful manner?

Comment 11 Laszlo Ersek 2021-06-29 17:52:58 UTC
(In reply to Andrea Bolognani from comment #8)
> (In reply to Eric Auger from comment #5)
> > > Note however that, even if we decide to expose the io-reserve
> > > option at the libvirt level after all, it will be something that
> > > users and management applications have to explicitly opt into on a
> > > per root port basis rather than something that gets automatically
> > > added by libvirt for root ports that are behind a pxb-pcie.
> > > Mechanism vs policy and all that :)
> >
> > I would rather not expose the io-reserve option at libvirt level. I
> > would rather hardcode it for root ports downstream to the PXB.
>
> I don't think this approach would fly upstream. libvirt feels very
> strongly about only offering mechanisms, and this smells an awful lot
> like policy.

"Yes and no", as much as I can tell; for example the "virtio" display
model means virtio-vga for x86, but virtio-gpu-pci for aarch64.

> I'm also failing to understand why this was never a problem until now.
> If edk2 will only do 4k alignment (because of the EFI specification),
> and the kernel can't deal with anything but 64k alignment, why is the
> presence of a PXB necessary to trigger the issue?

The GPEX host bridge emulates the total IO Port aperture for the board
in the following MMIO guest-phys address window:

hw/arm/virt.c:    [VIRT_PCIE_PIO] =           { 0x3eff0000, 0x00010000 },

The base address is a whole multiple of 64KB.

In case the board configuration contains only one root bridge (bus=0),
the above is the base address to which the IO region of bridge#0 will
effectively be programmed. The guest kernel can map that MMIO area with
64KB page size fine.

The effective region *size* is usually smaller than the full 0x00010000,
dependent on the IO BAR needs of the legacy (non-express) endpoints
integrated into the root complex (that is, residing on bus 0). For
example, the logs that Eric sent to me earlier indicate a size of
0x3000, for three endpoints together.

If you add at least one extra root bridge, then the edk2 PCI bus driver
(PciBusDxe) will group the IO BARs assigned behind *that* bridge at
(0x3eff0000 + 0x3000), which triggers the guest kernel's mapping
failure.

> As I understand it, the value passed to io-reserve is merely used as a
> hint by the firmware:

That's correct.

> if the root port has io-reserve=0 but the device that's plugged into
> it requests I/O space, then the firmware will still dutifully comply
> with such a request.

Indeed; however, PCI Express devices are required to function without
any IO space. Furthermore, legacy PCI devices cannot (should not) be
plugged into PCIe root ports.

The "docs/pcie.txt" file in the QEMU tree treats this topic extensively.

(In reply to Andrea Bolognani from comment #8)
> So io-reserve=0 only really makes a difference for a root port that's
> left empty for future hotplug purposes,

That's not correct; it also makes a difference for such PCIe root ports
whose subordinate hierarchies will only contain PCIe endpoints -- and
that's just a different way to say "almost all PCIe root ports".

> which without it would get some I/O space allocated to it. I would
> expect this issue to show up even in basic hotplugging scenarios
> then... How come that was not the case?

In the subordinate hierarchy of a PCIe root port, you'd normally hotplug
PCIe endpoints, which would not require IO BAR.

If you were to hotplug a legacy PCI endpoint (with IO port needs), you'd
do that behind a PCIe-to-PCI bridge. (The IO reservation would propagate
up from the PCIe-to-PCI bridge.) But even in that case, the PCIe-to-PCI
bridge itself would be integrated into the Root Complex (i.e., be on
bus=0), and therefore the symptom would still not be triggered. The
cumulative IO Port needs of the PCIe-to-PCI bridge (summed up from its
subordinate hierarchy) would be grouped together with the IO Port needs
of those legacy PCI endpoints that were directly integrated into the
Root Complex (that is, "siblings" of the PCIe-to-PCI bridge), and the
resultant base address would remain 0x3eff0000. Only the size would
change.

See again "docs/pcie.txt".

> One last note: with q35, adding too many empty root ports to a machine
> without disabling I/O space reservation for them results in some of
> those ports not working correctly, but never in a kernel crash.

That's technically correct: a guest kernel crash will not occur in this
case.

The reason for that fact is quite specific though: the (edk2-based)
guest firmware will reject booting, due to resource assignment failure
:)

> Can the aarch64 kernel be taught to deal with this scenario in a
> similarly graceful manner?

As far as I recall, Ard confirmed in the past that the x86 and aarch64
PCI(e) host driver code in Linux are separate beasts. I honestly forgot
the reason, but there *was* a reason.

That said, IMO, it's not a guest kernel job; the PCIe and PCI
hierarchies need to be built sensibly in the board configuration.
"docs/pcie.txt" is supposed to help with that.

Thanks
Laszlo

Comment 13 Eric Auger 2021-06-29 19:33:54 UTC
../..

> > > I would rather not expose the io-reserve option at libvirt level. I
> > > would rather hardcode it for root ports downstream to the PXB.
> >
> > I don't think this approach would fly upstream. libvirt feels very
> > strongly about only offering mechanisms, and this smells an awful lot
> > like policy.
> 
> "Yes and no", as much as I can tell; for example the "virtio" display
> model means virtio-vga for x86, but virtio-gpu-pci for aarch64.
> 
To me if you expose the io-reserve in the xml, you give a chance to the end-user to let the guest kernel crash. It is a known issue you cannot instantiate a PXB + root port with IO space on some guests, depending on their page size config: and without being able to detect this latter it looks safer to hardcode it. This limitation would only apply to the topology downstream to the PXB, not affecting the primary root bridge. This can be documented.

Comment 18 Andrea Bolognani 2021-07-23 16:08:01 UTC
Upstream patches posted.

  https://listman.redhat.com/archives/libvir-list/2021-July/msg00755.html

Comment 19 Andrea Bolognani 2021-08-04 08:18:51 UTC
Patches merged upstream.

  commit f225ef2a04baa2d875c4bb6358958b8a6f82d58c
  Author: Andrea Bolognani <abologna>
  Date:   Thu Jul 22 15:37:25 2021 +0200

    qemu: Allow pcie-expander-bus for aarch64/virt guests
    
    Starting with QEMU 6.0, this controller is enabled by default
    on aarch64.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1967187
    
    Signed-off-by: Andrea Bolognani <abologna>
    Reviewed-by: Michal Privoznik <mprivozn>

  v7.6.0-14-gf225ef2a04

Comment 20 Yiding Liu (Fujitsu) 2021-09-09 13:11:21 UTC
I built the latest libvirt and it works.

upstream libvirt
```
commit 5a3c35dc83f2918afa9fb5d36267e1bad80ada85 (HEAD -> master, origin/master, origin/HEAD)
Author: Peter Krempa <pkrempa>
Date:   Tue Sep 7 09:28:51 2021 +0200

    qemuxml2argvtest: Add test case for missing disk '<target>'
    
    Cover the case of missing disk target to cover the case fixed by
    previous commit.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Michal Privoznik <mprivozn>

```

Env:
[root@ampere-hr330a-04 aarch64]# rpm -q libvirt qemu-kvm kernel-core
libvirt-7.8.0-1.el9.aarch64
qemu-kvm-6.1.0-1.el9.aarch64
kernel-core-5.14.0-1.el9.aarch64

```
# virsh start fj-kvm-vm
Domain 'fj-kvm-vm' started

# virsh dumpxml fj-kvm-vm | grep pcie-expander-bus
    <controller type='pci' index='12' model='pcie-expander-bus'>
# rpm -q libvirt
libvirt-7.8.0-1.el9.aarch64
```

Comment 21 Yiding Liu (Fujitsu) 2021-09-13 02:47:26 UTC
Verified on RHEL9 + libvirt-7.7.0-1.el9.aarch64
```
# rpm -q libvirt
libvirt-7.7.0-1.el9.aarch64

# virsh dumpxml fj-kvm-vm | grep -A4 pcie-expander-bus
    <controller type='pci' index='12' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>

# virsh start fj-kvm-vm
Domain 'fj-kvm-vm' started


```

Comment 22 Yiding Liu (Fujitsu) 2021-09-13 10:32:48 UTC
I assigned 2 valid numa nodes to pcie-expander-bus (busNr 100, busNr200)
But only 1 pcie-expander-bus works.	
		
# virsh dumpxml fj-kvm-vm | xmllint --xpath //numa -
<numa>
      <cell id="0" cpus="0-15" memory="14680064" unit="KiB" memAccess="shared"/>
      <cell id="1" cpus="16-31" memory="14680064" unit="KiB" memAccess="shared"/>
    </numa>


# virsh dumpxml fj-kvm-vm | grep -A4 pcie-expander-bus
    <controller type='pci' index='8' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='200'>
        <node>1</node>
      </target>
--
    <controller type='pci' index='12' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='100'>
        <node>0</node>
      </target>

# virsh dumpxml fj-kvm-vm | grep -E -B3 "0x08|0x0c"
    <controller type='pci' index='13' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='13' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
--
    <controller type='pci' index='14' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='14' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x02' function='0x2'/>


I attached memballoon to 0x0e (node 0) controller, random to 0x0d (node1) controller
```
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
    </rng>
```

Login Guest and check numa

The address of pcie-expander-bus(node 1) with busNr 200 should be c8.00.0 (Device Rng)
The address of pcie-expander-bus(node 0) with busNr 100 should be 64.00.0 (Device Memballon)

I only got RNG device
```
[root@localhost ~]# lspci
00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
00:01.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:01.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:01.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:01.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:01.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:01.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:01.6 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:02.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:03.0 Host bridge: Red Hat, Inc. QEMU PCIe Expander bridge
00:04.0 Host bridge: Red Hat, Inc. QEMU PCIe Expander bridge
02:00.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
03:00.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI (rev 01)
04:00.0 Communication controller: Red Hat, Inc. Virtio console (rev 01)
05:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
06:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
c8:00.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
c9:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio RNG (rev 01)
[root@localhost ~]# cat /sys/devices/pci0000\:c8/0000\:c8\:00.0/0000\:c9\:00.0/numa_node 
1

```

Comment 23 Yiding Liu (Fujitsu) 2021-09-13 10:34:44 UTC
Created attachment 1822724 [details]
debug guest xml

Assign 2 valid numa nodes to pcie-expander-bus

Comment 24 Eric Auger 2021-09-13 12:22:50 UTC
Hi Yiding, please can you paste the qemu cmd line generated by libvirt? Thank you in advance. BR. Eric

Comment 25 Yiding Liu (Fujitsu) 2021-09-14 08:28:53 UTC
Created attachment 1822942 [details]
qemu-cmdline

qemu-cmdline got from ' virsh domxml-to-native qemu-argv --domain fj-kvm-vm'

Comment 26 Yiding Liu (Fujitsu) 2021-09-14 08:30:33 UTC
Created attachment 1822943 [details]
The new guest xml

The new guest xml

Comment 27 Yiding Liu (Fujitsu) 2021-09-14 08:38:59 UTC
(In reply to Eric Auger from comment #24)
> Hi Yiding, please can you paste the qemu cmd line generated by libvirt?
> Thank you in advance. BR. Eric

Hi Eric. I upload the qemu cmd line and the guest xml (The old env is missed)

I format the qemu cmdline for quick check. I can't get memballoon in guest.

/usr/libexec/qemu-kvm -name guest=fj-kvm-vm,debug-threads=on \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-fj-kvm-vm/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/Test3_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine virt-rhel9.0.0,accel=kvm,usb=off,dump-guest-core=off,gic-version=3,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-cpu host,kvm-no-adjvtime=on -m 28672 -overcommit mem-lock=off -smp 32,sockets=1,dies=1,cores=32,threads=1 \
-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/var/lib/libvirt/qemu/ram/-1-fj-kvm-vm/ram-node0","share":true,"size":15032385536,"host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=0,cpus=0-15,memdev=ram-node0 -object '{"qom-type":"memory-backend-file","id":"ram-node1","mem-path":"/var/lib/libvirt/qemu/ram/-1-fj-kvm-vm/ram-node1","share":true,"size":15032385536,"host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=1,cpus=16-31,memdev=ram-node1 -uuid 2d7762a5-1506-4c6b-b42d-095cbd481850 \
-display none -no-user-config -nodefaults \
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain--1-fj-kvm-vm/monitor.sock,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc -no-shutdown -boot strict=on \
-device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
-device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \
-device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \
-device pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 \
-device pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 \
-device pxb-pcie,bus_nr=200,id=pci.8,numa_node=1,bus=pcie.0,addr=0x4 \
-device pcie-root-port,port=0x10,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=10,id=pci.10,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=11,id=pci.11,bus=pcie.0,addr=0x2.0x2 \
-device pxb-pcie,bus_nr=100,id=pci.12,numa_node=0,bus=pcie.0,addr=0x3 \
-device pcie-root-port,port=0x0,chassis=13,id=pci.13,bus=pci.8,addr=0x0 \
-device pcie-root-port,port=0x12,chassis=14,id=pci.14,bus=pci.12,addr=0x2.0x2 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \
-device virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/fj-kvm-vm.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage"}' \
-device virtio-blk-pci,bus=pci.5,addr=0x0,drive=libvirt-2-format,id=virtio-disk0,bootindex=1 \
-device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
-netdev tap,fd=26,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c9:96:ed,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 -serial chardev:charserial0 \
-chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain--1-fj-kvm-vm/org.qemu.guest_agent.0,server=on,wait=off \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-audiodev id=audio1,driver=none -device virtio-balloon-pci,id=balloon0,bus=pci.14,addr=0x0 -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.13,addr=0x0 -device vmcoreinfo -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

Comment 28 Eric Auger 2021-09-16 10:05:29 UTC
Hi Yiding,

I have installed a new machine with rhel9 latest bits and libvirt does not feature the right pxb commits yet (returns pxb only is supported on q35). So I have just tested at qemu level and it works for me with a rhel9-beta guest:
[root@localhost ~]# lspci -tv
-+-[0000:c8]---00.0-[c9]----00.0  Red Hat, Inc. Virtio RNG
 +-[0000:64]---00.0-[65]----00.0  Red Hat, Inc. Virtio memory balloon
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
             +-02.0-[01]--
             +-03.0-[02]----00.0  Red Hat, Inc. Virtio network device
             +-04.0-[03]----00.0  Red Hat, Inc. Virtio block device
             \-05.0  Red Hat, Inc. QEMU PCIe Expander bridge

I have tested with the followng qemu options:
OTHER_DEVICE_OPTIONS="-object memory-backend-file,id=mem0,size=2G,mem-path=/dev/shm,share=on"
OTHER_DEVICE_OPTIONS=$OTHER_DEVICE_OPTIONS" -object memory-backend-file,id=mem1,size=2G,mem-path=/dev/shm,share=on"
OTHER_DEVICE_OPTIONS=$OTHER_DEVICE_OPTIONS" -numa node,memdev=mem0,cpus=0-3,nodeid=0 -numa node,memdev=mem1,cpus=4-7,nodeid=1"
OTHER_DEVICE_OPTIONS=$OTHER_DEVICE_OPTIONS" -device pxb-pcie,bus_nr=200,id=bridge2,bus=pcie.0,numa_node=1 -device pcie-root-port,bus=bridge2,chassis=5,id=pcie.12 -object rng-random,id=obj0,filename=/dev/urandom  -device virtio-rng-pci,bus=pcie.12,rng=obj0,id=rng0"
OTHER_DEVICE_OPTIONS=$OTHER_DEVICE_OPTIONS" -device pxb-pcie,bus_nr=100,id=bridge,bus=pcie.0,numa_node=0 -device pcie-root-port,bus=bridge,chassis=4,id=pcie.11 -device virtio-balloon-pci,bus=pcie.11"

What guest did you run?

Comment 29 Yiding Liu (Fujitsu) 2021-09-22 07:32:12 UTC
Hi Eric

(In reply to Eric Auger from comment #28)
> Hi Yiding,
> 
> I have installed a new machine with rhel9 latest bits and libvirt does not
> feature the right pxb commits yet (returns pxb only is supported on q35).

Yes. I built libvirt-7.7.0-1.el9.src.rpm to verify the test.
 
> So I have just tested at qemu level and it works for me with a rhel9-beta guest:
> [root@localhost ~]# lspci -tv
> -+-[0000:c8]---00.0-[c9]----00.0  Red Hat, Inc. Virtio RNG
>  +-[0000:64]---00.0-[65]----00.0  Red Hat, Inc. Virtio memory balloon
>  \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>              +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>              +-02.0-[01]--
>              +-03.0-[02]----00.0  Red Hat, Inc. Virtio network device
>              +-04.0-[03]----00.0  Red Hat, Inc. Virtio block device
>              \-05.0  Red Hat, Inc. QEMU PCIe Expander bridge
> 
> I have tested with the followng qemu options:
> OTHER_DEVICE_OPTIONS="-object
> memory-backend-file,id=mem0,size=2G,mem-path=/dev/shm,share=on"
> OTHER_DEVICE_OPTIONS=$OTHER_DEVICE_OPTIONS" -object
> memory-backend-file,id=mem1,size=2G,mem-path=/dev/shm,share=on"
> OTHER_DEVICE_OPTIONS=$OTHER_DEVICE_OPTIONS" -numa
> node,memdev=mem0,cpus=0-3,nodeid=0 -numa node,memdev=mem1,cpus=4-7,nodeid=1"
> OTHER_DEVICE_OPTIONS=$OTHER_DEVICE_OPTIONS" -device
> pxb-pcie,bus_nr=200,id=bridge2,bus=pcie.0,numa_node=1 -device
> pcie-root-port,bus=bridge2,chassis=5,id=pcie.12 -object
> rng-random,id=obj0,filename=/dev/urandom  -device
> virtio-rng-pci,bus=pcie.12,rng=obj0,id=rng0"
> OTHER_DEVICE_OPTIONS=$OTHER_DEVICE_OPTIONS" -device
> pxb-pcie,bus_nr=100,id=bridge,bus=pcie.0,numa_node=0 -device
> pcie-root-port,bus=bridge,chassis=4,id=pcie.11 -device
> virtio-balloon-pci,bus=pcie.11"

I use your qemu options and it works for me.

[root@localhost ~]# lspci -tv
-+-[0000:c8]---00.0-[c9]----00.0  Red Hat, Inc. Virtio RNG
 +-[0000:64]---00.0-[65]----00.0  Red Hat, Inc. Virtio memory balloon
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0-[01]--
             +-01.1-[02]----00.0  Red Hat, Inc. Virtio GPU
             +-01.3-[03]----00.0  Red Hat, Inc. Virtio SCSI
             +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
             \-03.0  Red Hat, Inc. QEMU PCIe Expander bridge

But the error still exist when I use the attached guest xml.
Maybe something wrong with my guest xml? I will simplify the guest xml and test again.

> 
> What guest did you run?
RHEL9.
[root@localhost ~]# uname -r
5.14.0-1.2.1.el9.aarch64

Comment 30 Yiding Liu (Fujitsu) 2021-09-22 08:21:34 UTC
I simplify the guest xml then pcie-expander works.

# virsh dumpxml fj-kvm-vm
[snip]
    <controller type='pci' index='8' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='100'>
        <node>0</node>
      </target>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='9' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='200'>
        <node>1</node>
      </target>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x0'/>
      <alias name='pci.10'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0x0'/>
      <alias name='pci.11'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </controller>
[snip]
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
    </rng>
[snip]

]
[root@localhost ~]# lspci -tv
-+-[0000:c8]---00.0-[c9]----00.0  Red Hat, Inc. Virtio RNG
 +-[0000:64]---00.0-[65]----00.0  Red Hat, Inc. Virtio memory balloon
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0-[01]--
             +-01.1-[02]----00.0  Red Hat, Inc. QEMU XHCI Host Controller
             +-01.2-[03]----00.0  Red Hat, Inc. Virtio SCSI
             +-01.3-[04]----00.0  Red Hat, Inc. Virtio network device
             +-01.4-[05]----00.0  Red Hat, Inc. Virtio console
             +-01.5-[06]----00.0  Red Hat, Inc. Virtio block device
             +-01.6-[07]--
             +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
             \-03.0  Red Hat, Inc. QEMU PCIe Expander bridge

Comment 31 Eric Auger 2021-09-22 08:36:18 UTC
(In reply to Yiding Liu (Fujitsu) from comment #30)
Interesting. So by comparing the xml's do you have any idea what is causing the issue?

Comment 32 Yiding Liu (Fujitsu) 2021-09-22 08:55:03 UTC
Sorry for the noise :(

The root cause is the incorrect slot and function of pcie-root-port (index 14).
change slot and function to 0x00 to fix the issue. 

    <controller type='pci' index='12' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='100'>
        <node>0</node>
      </target>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
...
    <controller type='pci' index='14' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='14' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x02' function='0x2'/>
    </controller>

Comment 33 Yiding Liu (Fujitsu) 2021-09-22 09:10:42 UTC
Verify by below steps

1. Assign valid numa node to pcie-expander-bus
```
    <controller type='pci' index='8' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='100'>
        <node>0</node>
      </target>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='9' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='200'>
        <node>1</node>
      </target>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
```
[root@localhost ~]# cat /sys/devices/pci0000\:64/0000\:64\:00.0/0000\:65\:00.0/numa_node 
0
[root@localhost ~]# cat /sys/devices/pci0000\:c8/0000\:c8\:00.0/0000\:c9\:00.0/numa_node 
1


2. Configure the same numa node to 2 pcie-expander-bus controllers
```
    <controller type='pci' index='8' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='100'>
        <node>0</node>
      </target>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='9' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='200'>
        <node>0</node>
      </target>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
```
[root@localhost ~]# cat /sys/devices/pci0000\:64/0000\:64\:00.0/0000\:65\:00.0/numa_node
0
[root@localhost ~]# cat /sys/devices/pci0000\:c8/0000\:c8\:00.0/0000\:c9\:00.0/numa_node 
0

	
3. Attach device/pcie-root-port to pcie-expander-bus
```
    <controller type='pci' index='8' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='100'>
        <node>0</node>
      </target>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='9' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='200'>
        <node>0</node>
      </target>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x0'/>
      <alias name='pci.10'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0x0'/>
      <alias name='pci.11'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </controller>
...
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
    </rng>

```	
-+-[0000:c8]---00.0-[c9]----00.0  Red Hat, Inc. Virtio RNG
 +-[0000:64]---00.0-[65]----00.0  Red Hat, Inc. Virtio memory balloon
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0-[01]--
             +-01.1-[02]----00.0  Red Hat, Inc. QEMU XHCI Host Controller
             +-01.2-[03]----00.0  Red Hat, Inc. Virtio SCSI
             +-01.3-[04]----00.0  Red Hat, Inc. Virtio network device
             +-01.4-[05]----00.0  Red Hat, Inc. Virtio console
             +-01.5-[06]----00.0  Red Hat, Inc. Virtio block device
             +-01.6-[07]--
             +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
             \-03.0  Red Hat, Inc. QEMU PCIe Expander bridge

Comment 34 Eric Auger 2021-09-22 09:27:30 UTC
(In reply to Yiding Liu (Fujitsu) from comment #32)
> Sorry for the noise :(
> 
> The root cause is the incorrect slot and function of pcie-root-port (index
> 14).
> change slot and function to 0x00 to fix the issue. 
> 
>     <controller type='pci' index='12' model='pcie-expander-bus'>
>       <model name='pxb-pcie'/>
>       <target busNr='100'>
>         <node>0</node>
>       </target>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> function='0x0'/>
>     </controller>
> ...
>     <controller type='pci' index='14' model='pcie-root-port'>
>       <model name='pcie-root-port'/>
>       <target chassis='14' port='0x12'/>
>       <address type='pci' domain='0x0000' bus='0x0c' slot='0x02'
> function='0x2'/>
>     </controller>

Is it normal that libvirt allowed you to set those values then? I guess you entered them manually instead of letting libvirt allocate them. Shouldn't it be rejected by libvirt somehow?

Comment 35 Meina Li 2021-09-23 03:02:06 UTC
1) For the unexpected result in comment 22, it was caused by inappropriate address:
<controller type='pci' index='14' model='pcie-root-port'>
  <model name='pcie-root-port'/>
  <target chassis='14' port='0x12'/>
  <address type='pci' domain='0x0000' bus='0x0c' slot='0x02' function='0x2'/>
</controller>

In this address, it use function > '0x0', but 'multifunction' is not set to 'on'. In this situation, it will not detect the pci device in guest which pulgged into the related controller. This is the current design now, and the devel is not going to fix it. So we can't get memory device in guest in comment 22.

2) In libvirt, we usually assign the address manually to test pcie-expander-bus controller, but not let libvirt allocate the address. This method can save our test time, but not affect the quality of test.

Comment 36 Eric Auger 2021-09-23 06:49:50 UTC
(In reply to Meina Li from comment #35)
> 1) For the unexpected result in comment 22, it was caused by inappropriate
> address:
> <controller type='pci' index='14' model='pcie-root-port'>
>   <model name='pcie-root-port'/>
>   <target chassis='14' port='0x12'/>
>   <address type='pci' domain='0x0000' bus='0x0c' slot='0x02' function='0x2'/>
> </controller>
> 
> In this address, it use function > '0x0', but 'multifunction' is not set to
> 'on'. 
I guess you meant 0x02?

So in that case the xml is not corrected and with fixed xml it works so I think we can move the BZ to VERIFIED, correct?

Comment 37 Yiding Liu (Fujitsu) 2021-09-23 07:02:31 UTC
(In reply to Eric Auger from comment #36)
> (In reply to Meina Li from comment #35)
> > 1) For the unexpected result in comment 22, it was caused by inappropriate
> > address:
> > <controller type='pci' index='14' model='pcie-root-port'>
> >   <model name='pcie-root-port'/>
> >   <target chassis='14' port='0x12'/>
> >   <address type='pci' domain='0x0000' bus='0x0c' slot='0x02' function='0x2'/>
> > </controller>
> > 
> > In this address, it use function > '0x0', but 'multifunction' is not set to
> > 'on'. 
> I guess you meant 0x02?
> 
> So in that case the xml is not corrected and with fixed xml it works so I
> think we can move the BZ to VERIFIED, correct?

Yes. I have verified the BZ with fixed xml. Verify steps please refer to comment 33

Comment 41 errata-xmlrpc 2022-05-17 12:45:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: libvirt), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2390


Note You need to log in before you can comment on or make changes to this bug.