Bug 1436051

Summary: qemu: Remove pci-expander-bus (PXB) device for Power (Libvirt)
Product: Red Hat Enterprise Linux 7 Reporter: Dan Zheng <dzheng>
Component: libvirtAssignee: Andrea Bolognani <abologna>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.4CC: abologna, dgibson, gsun, juzhang, knoel, marcel, qzhang, rbalakri, virt-maint
Target Milestone: rcKeywords: TestOnly
Target Release: ---   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1400785 Environment:
Last Closed: 2017-08-02 07:52:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1400785    
Bug Blocks:    

Description Dan Zheng 2017-03-27 02:15:37 UTC
It is a TestOnly bug to ensure the removing has no side effect in libvirt.

+++ This bug was initially created as a clone of Bug #1400785 +++

Description of problem:

We have some qemu devices that further consideration shows aren't really useful on Power:

1) ehci-hcd

EHCI is not well tested on Power, and has hit a number of bugs.  There's nothing it can do that can't be done with an XHCI device, which is the recommended USB controller for Power guests.

2) pxb

PCI Expander Bridge.  I'm trying to confirm, but this appears to be a hack to work around x86's poor support for multiple PCI domains / host bridges.  On Power we should prefer creating multiple independent host bridges.


We should consider removing these devices from qemu for RHEL 7.4.

--- Additional comment from David Gibson on 2016-12-04 23:57:34 EST ---

Marcel,

I'm trying to understand exactly what the pxb device is for.  My impression is that it's essentially a hack to deal with the fact that x86 doesn't really handle truly independent PCI host bridges nicely. Does that sound right?

I think we don't want it on Power, but I'm trying to make sure.

[Incidentally pxi_expander_bridge.c #includes hw/i386/pc.h, but doesn't seem to need it (no compile errors if I take it away).]

--- Additional comment from Marcel Apfelbaum on 2016-12-05 06:13:20 EST ---

(In reply to David Gibson from comment #1)
> Marcel,

Hi David,

> 
> I'm trying to understand exactly what the pxb device is for.  My impression
> is that it's essentially a hack to deal with the fact that x86 doesn't
> really handle truly independent PCI host bridges nicely. Does that sound
> right?
>

You are right, anyway, the real reason is to be able
to "assign" a PCI device to the correct Guest NUMA node.

If the guest RAM comes from different host NUMA nodes and you have
a host assigned device , you need to associate it somehow to
the corresponding guest NUMA node.

The pxb/pxb-pcie device exposes an extra root bus that can be associated
with a specific guest NUMA node using the ACPI tables. 
 
> I think we don't want it on Power, but I'm trying to make sure.
> 

Do you have a solution for the problem above?

> [Incidentally pxi_expander_bridge.c #includes hw/i386/pc.h, but doesn't seem
> to need it (no compile errors if I take it away).]

Thanks,
Marcel

--- Additional comment from Andrea Bolognani on 2016-12-05 08:53:58 EST ---

(In reply to Marcel Apfelbaum from comment #2)
> > I'm trying to understand exactly what the pxb device is for.  My impression
> > is that it's essentially a hack to deal with the fact that x86 doesn't
> > really handle truly independent PCI host bridges nicely. Does that sound
> > right?
> 
> You are right, anyway, the real reason is to be able
> to "assign" a PCI device to the correct Guest NUMA node.
> 
> If the guest RAM comes from different host NUMA nodes and you have
> a host assigned device , you need to associate it somehow to
> the corresponding guest NUMA node.
> 
> The pxb/pxb-pcie device exposes an extra root bus that can be associated
> with a specific guest NUMA node using the ACPI tables. 
>  
> > I think we don't want it on Power, but I'm trying to make sure.
> 
> Do you have a solution for the problem above?

You should be able to achieve the same end result on ppc64
guests by creating additional PHBs (spapr-pci-host-bridge):
Bug 1280542 is about adding support for this to libvirt.

--- Additional comment from David Gibson on 2016-12-06 18:50:53 EST ---

Right.  i.e. we have a solution on the qemu side, but not yet the libvirt side.

Well.. we do upstream, in RHEL7.3, I believe you can add PHBs, but you can't set their NUMA node.  That's fixed upstream so it should be in RHEL 7.4 as well.

--- Additional comment from David Gibson on 2017-01-05 23:30:42 EST ---

So we can track them separately, move the EHCI related stuff to bug 1410674.  Refocus this bug purely on removing the pxb.

--- Additional comment from David Gibson on 2017-01-05 23:34:49 EST ---

The PXB is currently included unconditionally upstream, so it will need to be made conditional there, rather than only requiring a downstream config change.

--- Additional comment from David Gibson on 2017-01-06 02:10:08 EST ---

I've posted an RFC patch for this upstream:

https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg00715.html

--- Additional comment from David Gibson on 2017-02-08 19:13:07 EST ---

This is now merged upstream, so we should get it in the rebase to 2.9.

Qunfang, can we get a QA ack for this: this should reduce the overall testing load, since all the PXB related tests can be dropped.

Comment 2 Dan Zheng 2017-05-10 08:42:01 UTC
Test package:
libvirt-3.2.0-4.el7.ppc64le
qemu-kvm-rhev-2.9.0-2.el7.ppc64le

Configure guest XML by adding below :
<controller type='pci' index='1' model='pci-expander-bus'/>

    # virsh edit vm1
    error: unsupported configuration: pci-expander-bus controllers are only supported on 440fx-based machinetypes
    Failed. Try again? [y,n,i,f,?]: 


The error message looks friendly. So make it verified.