1330024 – Under q35 libvirt auto-plugs virtio devices into pci-bridge

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1330024 - Under q35 libvirt auto-plugs virtio devices into pci-bridge

Summary: Under q35 libvirt auto-plugs virtio devices into pci-bridge

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.3
Hardware:	x86_64
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	rc
Target Release:	7.4
Assignee:	Laine Stump
QA Contact:	lijuan men
Docs Contact:	Jiri Herrmann
URL:
Whiteboard:
Duplicates (1):	1414734 (view as bug list)
Depends On:
Blocks:	1343094 1395265
TreeView+	depends on / blocked

Reported:	2016-04-25 10:09 UTC by Yang Yang
Modified:	2017-08-01 23:51 UTC (History)
CC List:	18 users (show)
Fixed In Version:	libvirt-3.1.0-1.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-01 17:09:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:1846	0	normal	SHIPPED_LIVE	libvirt bug fix and enhancement update	2017-08-01 18:02:50 UTC

Description Yang Yang 2016-04-25 10:09:12 UTC

Description of problem:
Under q35, libvirt auto-plugs virtio devices into pci-bridge. However,
in qemu side, virtio devices should be PCIe devices. So libvirt should auto-plug virtio devices into pcie switch. AFAIK, qemu uses disable-modern=off to enable pcie. See
Bug 1272759 - [PCIe] Make virtio devices pci-express on Q35 machines

Version-Release number of selected component (if applicable):
libvirt-1.3.3-2.el7.x86_64
qemu-kvm-rhev-2.5.0-4.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1. define/start vm with following xml, virtio devices and pcie
switches are provided

 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/RHEL-7.2-20151008.0.qcow2'/>
      <target dev='sda' bus='sata'/>
      <boot order='1'/>
    </disk>
  <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/q35.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>
<controller type='pci' index='3' model='pcie-root-port'/>
<controller type='pci' index='4' model='pcie-switch-upstream-port'/>
<controller type='pci' index='5' model='pcie-switch-downstream-port'/>


[root@rhel7_test ~]# virsh define vm1-q35.xml
Domain vm1-q35 defined from vm1-q35.xml

[root@rhel7_test ~]# virsh start vm1-q35
Domain vm1-q35 started

2. check vm xml
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/RHEL-7.2-20151008.0.qcow2'/>
      <backingStore/>
      <target dev='sda' bus='sata'/>
      <boot order='1'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/q35.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' function='0x0'/>
    </disk>
 <controller type='pci' index='3' model='pcie-root-port'>
      <model name='ioh3420'/>
      <target chassis='3' port='0x10'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='4' model='pcie-switch-upstream-port'>
      <model name='x3130-upstream'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='5' model='pcie-switch-downstream-port'>
      <model name='xio3130-downstream'/>
      <target chassis='5' port='0x0'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </controller>

Actual results:
Libvirt auto-plugs virtio device into pci bridge

Expected results:
Libvirt should auto-plugs virtio device into pcie switch

Additional info:

Comment 1 Daniel Berrangé 2016-04-25 10:19:00 UTC

(In reply to yangyang from comment #0)
> Description of problem:
> Under q35, libvirt auto-plugs virtio devices into pci-bridge. However,
> in qemu side, virtio devices should be PCIe devices. So libvirt should
> auto-plug virtio devices into pcie switch. AFAIK, qemu uses
> disable-modern=off to enable pcie. See
> Bug 1272759 - [PCIe] Make virtio devices pci-express on Q35 machines

Sigh, that bug gives zero useful information

  "Under Q35 virtio devices should be PCIe devices."

Marcel, can you explain the rational for this. What is the actual benefit to having the virtio devices be PCIe instead of PCI devices ? Conversely, what downsides might there be to using PCIe over PCI. In particular if we set disable-modern=off, is that going to break any existing guests with virtio support. If so, then we cannot just switch the default behaviour in this way.

So I think there's two separate libvirt items here

 - If the mgmt app has given an  <address> for a device that points to a PCIe controller, then libvirt *must* set disable-modern=off to make the virtiodevice be PCIe. This is a clear bug we must fix

 - If the mgmt app has *not* given any <address>, then should libvirt generate an <address> that points to a PCI or PCIe bus. This is really a feature/behavioural change, we carefully consider back compat implications of before we decide to do.

Comment 2 Laine Stump 2016-04-25 17:04:42 UTC

(I'll leave it to Marcel to explain why it's useful to have the virtio devices (or any other emulated device) connected via PCIe rather than PCI - it's a question I'd love to see a clear answer to as well :-)

I don't think it's as clear-cut as "must". My current understanding is that any PCI device can be plugged into a PCIe slot in qemu, but if it isn't a PCIe device then it will be missing the pcie capabilities info that the guest would normally see; this could confuse the guest OS, so we don't do it unless someone explicitly tells us to (by manually assigning the device to a PCIe slot).

I just learned about the disable_modern switch in an email thread on libvir-list:

https://www.redhat.com/archives/libvir-list/2016-April/msg01657.html

Based on that information, I *think* that what libvirt should do is this:

1) Check for presence of the disable_modern flag for each virtio-*-pci device and set a capability bit for each.

2) when auto-assigning addresses, if the device in question has disable_modern available and the domain has a pcie-root (i.e. Q35, arm/virt), prefer to assign it to a pcie-root-port or pcie-switch-downstream-port.

3) when building the commandline, if the device has the disable_modern cap bit, and it is being connected to a PCIe slot (either a pcie-*-port, or directly to pcie-root) add disable_modern=false to the commandline.

(I'm actually curious why such a generic name was chosen for the option. Seems like it could lead to confusion, both for people using it as well as for qemu people considering enabling/disabling some device functionality via this option).

Comment 3 Marcel Apfelbaum 2016-04-26 17:16:10 UTC

(In reply to Daniel Berrange from comment #1)
> (In reply to yangyang from comment #0)
> > Description of problem:
> > Under q35, libvirt auto-plugs virtio devices into pci-bridge. However,
> > in qemu side, virtio devices should be PCIe devices. So libvirt should
> > auto-plug virtio devices into pcie switch. AFAIK, qemu uses
> > disable-modern=off to enable pcie. See
> > Bug 1272759 - [PCIe] Make virtio devices pci-express on Q35 machines
> 
> Sigh, that bug gives zero useful information
> 
>   "Under Q35 virtio devices should be PCIe devices."
> 
> Marcel, can you explain the rational for this. What is the actual benefit to
> having the virtio devices be PCIe instead of PCI devices ?

Hi,
At first I thought is a virtio-1 spec requirement but it appears virtio-1
devices may be both PCI and PCIe:
    http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.html

So, to answer your question, the immediate benefits should be PCIe (native)
hotplug and the possibility to have a system without legacy (PCI) bridges.

 Conversely, what
> downsides might there be to using PCIe over PCI.

There is a limitation of around 15 root-ports if the devices attached to them
need IO space. If the devices attached to them do not need IO space,
there is no downside. The PCIe spec requires the PCIe devices to function
properly without IO. (PC machines have the same max 15 pci-bridges limitation)

 In particular if we set
> disable-modern=off, is that going to break any existing guests with virtio
> support. If so, then we cannot just switch the default behaviour in this way.

If we set 'disable-modern=off' we add functionality to the device allowing
it to function both as legacy virtio and modern virtio, so a guest driver
supporting only legacy virtio should work as before. My main assumption
here is that the DEVICE ID doesn't change, you should check the virtio-1
spec or talk to virtio maintainers. However, I think the ID doesn't change.

The problem starts when we set 'disable-legacy=off'. This mode disables
the IO BAR and I am not sure if the legacy virtio drivers will continue
to work. Again, please check with the virtio maintainers on this.

Thanks,
Marcel

> 
> So I think there's two separate libvirt items here
> 
>  - If the mgmt app has given an  <address> for a device that points to a
> PCIe controller, then libvirt *must* set disable-modern=off to make the
> virtiodevice be PCIe. This is a clear bug we must fix
> 
>  - If the mgmt app has *not* given any <address>, then should libvirt
> generate an <address> that points to a PCI or PCIe bus. This is really a
> feature/behavioural change, we carefully consider back compat implications
> of before we decide to do.

Comment 4 Laine Stump 2016-04-26 17:46:56 UTC

(In reply to Marcel Apfelbaum from comment #3)
> So, to answer your question, the immediate benefits should be PCIe (native)
> hotplug and the possibility to have a system without legacy (PCI) bridges.

...although there is still other work to be done to allow this. For example, I'm guessing the memory balloon device doesn't do PCIe, nor any of the emulated video devices, or the USB2 controllers (although the USB3 controller does - it autodetects the type of slot it's plugged into and does the right thing accordingly). I suppose the video and USB aren't a problem as long as they're plugged into pcie.0 (since that is explicitly allowed by the PCIe spec, if I understand correctly).

(/me still wondering why the name of this option is the very unhelpful "disable_modern" instead of something like simply "pcie"...)

(In reply to Daniel Berrange from comment #1)
>  - If the mgmt app has *not* given any <address>, then should libvirt
> generate an <address> that points to a PCI or PCIe bus. This is really a
> feature/behavioural change, we carefully consider back compat implications
> of before we decide to do.

Fortunately our backward compatibility requirements are limited here, since there is currently (still!) no combination of a released libvirt + released qemu that can successfully migrate a Q35 domain. In the past the problem was that the SATA controller in qemu didn't migrate properly, and it is an integrated device in  Q35, non-removable. The latest problem was that libvirt *always* adds a dmi-to-pci-bridge to every domain, and until just a couple weeks ago (if that) qemu's 82801b11-bridge device (used by libvirt for dmi-to-pci-bridge) didn't migrate properly.

So all we need to worry about is using config created by new libvirt to parse/start a domain on an old libvirt, and in this case the worst that will happen is that a non-Express device (due to no "disable_modern") would be plugged into an express slot, which so far has gotten the rating "not smart to do, but it has seemed to work so far".

I actually think that if we *ever* plan to prefer PCIe over PCI, we should do as much switching to PCIe slots as possible as soon as we can - up until now Q35 domains haven't been very useful outside of niche cases (particularly since they can't be migrated) so the impact of a change now will be relatively low. If we wait for a couple years to do it, it may be more problematic.

Comment 5 Michael S. Tsirkin 2016-04-27 10:23:44 UTC

Several points:
1. pcie is required since modern chipsets no longer support
   hotplug in legacy pci slots. We can try enabling it
   using hacks but we'd rather not.
2. playing with disable-modern should not be required.
   virtio device placed on a pcie bus set this flag
   automatically.
3. ability to disable legacy interfaces is useful but in my opinion
   not tied to pci express.

Comment 6 Laine Stump 2016-04-29 20:39:32 UTC

(In reply to Michael S. Tsirkin from comment #5)
> Several points:
> 1. pcie is required since modern chipsets no longer support
>    hotplug in legacy pci slots. We can try enabling it
>    using hacks but we'd rather not.

Every day a new surprise :-) (in the multiple years since I started putting pcie stuff into libvirt, this is the first time I've heard that hotplug in standard pci slots is officially not supported).

So does this mean that we can't hotplug any devices that don't publish PCIe capabilities? (we've already said that we don't want to plug non-express devices into express ports, and now you've said that hotplug into non-express isn't supported, so...)

> 2. playing with disable-modern should not be required.
>    virtio device placed on a pcie bus set this flag
>    automatically.

That is good on two counts: 1) we don't need to specify the option on the commandline, and 2) we can use introspection to learn which devices support it, and thus make intelligent decisions.


> 3. ability to disable legacy interfaces is useful but in my opinion
>    not tied to pci express.

I still don't understand the usefulness of this (and whether or not libvirt needs to support setting it / detecting it / making use of its presence in some way)

Comment 7 Michael S. Tsirkin 2016-05-03 10:35:45 UTC

> > 3. ability to disable legacy interfaces is useful but in my opinion
> >    not tied to pci express.

> I still don't understand the usefulness of this (and whether or not libvirt
> needs to support setting it / detecting it / making use of its presence in 
> some way)

why is disabling the legacy interface useful? for example, because legacy
interfaces consume valuable io resources.

Comment 8 Marcel Apfelbaum 2016-05-03 17:32:16 UTC

After a discussion with Michael we arrived to the following conclusions:

1. For legacy PCI devices (by the way, what are the current PCI devices used by libvirt?) we will try to enable hotplug for the dmi-to-pci bridge. All legacy devices should be attached to this bridge. Is it possible that the Operating Systems will not be able to enable both ACPI (legacy) hot-plug and PCIe (native) hotplug, but it worth the try. (In theory each slot may be controlled by another type of hot-plug controller)

2. Regarding the PCI virtio devices, starting with QEMU 2.7 we will set disable-modern=off by default and the devices would automatically be PCIe if attached to a Root Port or Switch downstream port. So virtio devices will use PCIe hotplug mechanism. This is the recommended way to use virtio devices with Q35. (A side problem will be enabling "modern" for virtio block devices, but we will work on that)

Thanks,
Marcel

Comment 9 Laine Stump 2016-05-03 18:41:59 UTC

Are there released (or to-be-released) versions of qemu where there is any device that has the ability to be a proper pcie device when plugged into a pcie slot, but the device doesn't do that by default?

After hearing the description of disable-modern and the auto-add of pcie capability info when being attached to a pcie slot, I was hoping that libvirt could just do this (without ever needing to add "disable_modern=no"):

1) if a device is on a white list (that would currently contain nec-xhci) or supports the option disable-modern, then plug it into a pcie slot

2) otherwise, plug it into a PCI slot.

If there is even a single qemu release where "add pcie capabilities when plugged into a pcie slot" isn't the default behavior (what happens with 2.6?), then we will have to always add disable_modern=no (or just disavow support for the qemu versions in question)

---

Topic 2: Since pci-bridge already doesn't support hotplug, I guess libvirt can immediately switch to plugging devices into dmi-to-pci-bridge instead of pci-bridge even before it supports hotplug, with no loss of functionality. 

What about when there are > 32 non-express devices? I made the
following setup and it *worked*:

  virtio-net
    -->dmi-to-pci-bridge
    -->downstream-port
    -->upstream-port
    -->root-port
    -->pcie.0

but is that okay according to the PCI spec? In order to avoid the
non-hotpluggability of pci-bridge, this is what we'll have to do when
auto-adding bridges on Q35.

Comment 10 Marcel Apfelbaum 2016-05-06 06:01:47 UTC

(In reply to Laine Stump from comment #9)
> Are there released (or to-be-released) versions of qemu where there is any
> device that has the ability to be a proper pcie device when plugged into a
> pcie slot, but the device doesn't do that by default?

Yes, they are. All virtio devices have 'disable-modern'=on until QEMU 2.6,
meaning that they are not PCIe devices by default. I am not sure when disable-modern flag was introduced, but I think QEMU 2.5.

Starting QEMU 2.7 virtio devices will have 'disable-modern'=off, so they
will be PCIe devices by default.

> 
> After hearing the description of disable-modern and the auto-add of pcie
> capability info when being attached to a pcie slot, I was hoping that
> libvirt could just do this (without ever needing to add "disable_modern=no"):
> 
> 1) if a device is on a white list (that would currently contain nec-xhci) or
> supports the option disable-modern, then plug it into a pcie slot

From QEMU 2.7 is OK. Also disable-modern flag is specific only to virtio
devices.

Another PCIe native device (for the white list) is the e1000e emulated device,
the patches are currently in the mailing list.

> 
> 2) otherwise, plug it into a PCI slot.
> 
> If there is even a single qemu release where "add pcie capabilities when
> plugged into a pcie slot" isn't the default behavior (what happens with
> 2.6?), then we will have to always add disable_modern=no (or just disavow
> support for the qemu versions in question)
> 

Michael said there was a bug with virtio block devices support for disable-modern, so sadly we cannot set disable-modern=off for QEMU versions before QEMU 2.7. (at least until I will not figure that out)

Even if I don't like this, it seems that prior 2.7 we should attach the virtio 
devices to PCI slots.

> ---
> 
> Topic 2: Since pci-bridge already doesn't support hotplug, I guess libvirt
> can immediately switch to plugging devices into dmi-to-pci-bridge instead of
> pci-bridge even before it supports hotplug, with no loss of functionality. 
> 

Agreed. As discussed I will work on hot-plug support for the dmi-to-pci-bridge.

> What about when there are > 32 non-express devices?

Plug a pci-bridge in one of the slots of the dmi-to-pci-bridge

 I made the
> following setup and it *worked*:
> 
>   virtio-net
>     -->dmi-to-pci-bridge
>     -->downstream-port
>     -->upstream-port
>     -->root-port
>     -->pcie.0
> 
> but is that okay according to the PCI spec? 


I didn't quite understand that, but this are the supported configuration
  -->pcie.0
           --> virtio-net  (Integrated device)

 -->dmi-to-pci-bridge
           --> virtio-net

 -->dmi-to-pci-bridge
           --> pci-bridge  (not enough slots on the dmi-to-pci bridge)
                  -->virtio-net

 -->root-port   ('standard' PCIe slot)
          --> virtio-net
 
 -->root-port
       -->upstream-port (can be plugged only into a root port
                          or a downstream-port of another switch)
             -->downstream-port (can be plugged only into an upstream port)
                     -->virtio-net

             --> we can have several downstream ports here


Thanks,
Marcel

In order to avoid the
> non-hotpluggability of pci-bridge, this is what we'll have to do when
> auto-adding bridges on Q35.

Comment 11 Laine Stump 2016-05-06 18:31:18 UTC

(In reply to Marcel Apfelbaum from comment #10)
> (In reply to Laine Stump from comment #9)
> > Are there released (or to-be-released) versions of qemu where there is any
> > device that has the ability to be a proper pcie device when plugged into a
> > pcie slot, but the device doesn't do that by default?
> 
> Yes, they are. All virtio devices have 'disable-modern'=on until QEMU 2.6,
> meaning that they are not PCIe devices by default. I am not sure when
> disable-modern flag was introduced, but I think QEMU 2.5.
> 
> Starting QEMU 2.7 virtio devices will have 'disable-modern'=off, so they
> will be PCIe devices by default.
> 
> > 
> > After hearing the description of disable-modern and the auto-add of pcie
> > capability info when being attached to a pcie slot, I was hoping that
> > libvirt could just do this (without ever needing to add "disable_modern=no"):
> > 
> > 1) if a device is on a white list (that would currently contain nec-xhci) or
> > supports the option disable-modern, then plug it into a pcie slot
> 
> From QEMU 2.7 is OK. Also disable-modern flag is specific only to virtio
> devices.

For any device that can be PCI or PCIe, I want/need a way to determine if this particular version of qemu can support it as PCIe, and a reliable option to set (if any is defined) to make sure that happens. So devices other than virtio-* will either need to have this option, or have a different option/property introduced simultaneously with the ability to behave as a PCIe device. Alternately all versions of that device ever in a released qemu must have always had the ability to be PCIe (in which case they'll go on a whitelist in libvirt). It would be greatly helpful if the option was named the same for all devices, of course.


> 
> Another PCIe native device (for the white list) is the e1000e emulated
> device,
> the patches are currently in the mailing list.

So what you're really saying is "after qemu 2.7, the e1000e device will be able to behave as a PCIe device". Although there are still instances of doing things based purely on qemu version, we try to avoid doing that as much as possible. Is there a possibility of adding the same options for e1000e (or starting a convention for a new option that will be the same everywhere)?

> 
> > 
> > 2) otherwise, plug it into a PCI slot.
> > 
> > If there is even a single qemu release where "add pcie capabilities when
> > plugged into a pcie slot" isn't the default behavior (what happens with
> > 2.6?), then we will have to always add disable_modern=no (or just disavow
> > support for the qemu versions in question)
> > 
> 
> Michael said there was a bug with virtio block devices support for
> disable-modern, so sadly we cannot set disable-modern=off for QEMU versions
> before QEMU 2.7. (at least until I will not figure that out)

Ugh. So we can't just look for the presence of the option even for virtio devices, but have to add caveats based on qemu version
 :-(

> 
> Even if I don't like this, it seems that prior 2.7 we should attach the
> virtio 
> devices to PCI slots.
> 
> > ---
> > 
> > Topic 2: Since pci-bridge already doesn't support hotplug, I guess libvirt
> > can immediately switch to plugging devices into dmi-to-pci-bridge instead of
> > pci-bridge even before it supports hotplug, with no loss of functionality. 
> > 
> 
> Agreed. As discussed I will work on hot-plug support for the
> dmi-to-pci-bridge.
> 
> > What about when there are > 32 non-express devices?
> 
> Plug a pci-bridge in one of the slots of the dmi-to-pci-bridge

But pci-bridge doesn't support hotplug! We can at least add a certain number of dmi-to-pci-bridge devices to pcie.0 to satisfy some additional devices; I'm wondering what can be done if there are no more free slots on pcie.0 - whether a dmi-to-pci-bridge can be plugged into a downstream-port for example. AS I said, it worked in practice. But then plugging PCI devices into PCIe slots also works in practice, but we've been told not to do that.


> 
>  I made the
> > following setup and it *worked*:
> > 
> >   virtio-net
> >     -->dmi-to-pci-bridge
> >     -->downstream-port
> >     -->upstream-port
> >     -->root-port
> >     -->pcie.0
> > 
> > but is that okay according to the PCI spec? 
> 
> 
> I didn't quite understand that, 


That is a single chain of devices - virtio-net plugged into dmi-to-pci-bridge plugged into downstream-port plugged into upstream-port plugged into root-port plugged into pcie.0. I was trying to see if dmi-to-pci-bridge could be plugged into something other than pcie.0.

Comment 12 Marcel Apfelbaum 2016-05-08 18:05:23 UTC

(In reply to Laine Stump from comment #11)
> (In reply to Marcel Apfelbaum from comment #10)
> > (In reply to Laine Stump from comment #9)
> > > Are there released (or to-be-released) versions of qemu where there is any
> > > device that has the ability to be a proper pcie device when plugged into a
> > > pcie slot, but the device doesn't do that by default?
> > 
> > Yes, they are. All virtio devices have 'disable-modern'=on until QEMU 2.6,
> > meaning that they are not PCIe devices by default. I am not sure when
> > disable-modern flag was introduced, but I think QEMU 2.5.
> > 
> > Starting QEMU 2.7 virtio devices will have 'disable-modern'=off, so they
> > will be PCIe devices by default.
> > 
> > > 
> > > After hearing the description of disable-modern and the auto-add of pcie
> > > capability info when being attached to a pcie slot, I was hoping that
> > > libvirt could just do this (without ever needing to add "disable_modern=no"):
> > > 
> > > 1) if a device is on a white list (that would currently contain nec-xhci) or
> > > supports the option disable-modern, then plug it into a pcie slot
> > 
> > From QEMU 2.7 is OK. Also disable-modern flag is specific only to virtio
> > devices.

Hi Laine,

> 
> For any device that can be PCI or PCIe, I want/need a way to determine if
> this particular version of qemu can support it as PCIe, and a reliable
> option to set (if any is defined) to make sure that happens. So devices
> other than virtio-* will either need to have this option, or have a
> different option/property introduced simultaneously with the ability to
> behave as a PCIe device. Alternately all versions of that device ever in a
> released qemu must have always had the ability to be PCIe (in which case
> they'll go on a whitelist in libvirt). It would be greatly helpful if the
> option was named the same for all devices, of course.
> 

Devices other than virtio-* as far as I know are always PCIe (or not), so a
white list should be enough.
Exceptions are:
  - pvscsi and vmxnet3 were PCI before 2.6 and now are PCIe if plugged
    into a root port (or downstream port).
  - I am not sure what's up with megasas_scsi
  - vfio devices (assigned devices) will be express if the phys device
    is express (has PCIe capability) and they are plugged into a root
    port (or downstream port). More exactly, if there is no PCI bus from
    the device's bus up to the root bus.

Regarding virtio-* devices, would a QEMU version (say 2.7) and setting
disable-modern=off be enough?

QEMU < 2.7  ? => plug all virtio-* in PCI slots
QEMU >= 2.7 ? => set  disable-modern=off (not actually needed since it will
be the default value, and Michael prefers libvirt to not touch the modern flag,
but you can maybe convince him you want that to be 'extra' sure)
              => plug the virtio devices into a root port or a downstream port
                 (do not plug them in pcie.0 bus) => PCIe
> 
> > 
> > Another PCIe native device (for the white list) is the e1000e emulated
> > device,
> > the patches are currently in the mailing list.
> 
> So what you're really saying is "after qemu 2.7, the e1000e device will be
> able to behave as a PCIe device". Although there are still instances of
> doing things based purely on qemu version, we try to avoid doing that as
> much as possible. Is there a possibility of adding the same options for
> e1000e (or starting a convention for a new option that will be the same
> everywhere)?

Maybe I didn't explain myself clear, from what I know e1000e is a *new*
device and enters directly into the white-list. The legacy device e1000
remains unchanged - PCI.

I suppose adding a 'is-pcie' QMP property would be possible,
but you would need to run QEMU in order to query this property
and also the same device on different slots may return a different
value. Is this good enough?

> 
> > 
> > > 
> > > 2) otherwise, plug it into a PCI slot.
> > > 
> > > If there is even a single qemu release where "add pcie capabilities when
> > > plugged into a pcie slot" isn't the default behavior (what happens with
> > > 2.6?), then we will have to always add disable_modern=no (or just disavow
> > > support for the qemu versions in question)
> > > 
> > 
> > Michael said there was a bug with virtio block devices support for
> > disable-modern, so sadly we cannot set disable-modern=off for QEMU versions
> > before QEMU 2.7. (at least until I will not figure that out)
> 
> Ugh. So we can't just look for the presence of the option even for virtio
> devices, but have to add caveats based on qemu version
>  :-(
> 

I understand your concerns. Please let me re-check the *possible* virtio-blk
issue, maybe is not there and you can check the presence of disable-modern flag.
If you accept the 'is-pcie' QMP property, I can implement all the logic
in QEMU and return the result.

> > 
> > Even if I don't like this, it seems that prior 2.7 we should attach the
> > virtio 
> > devices to PCI slots.
> > 
> > > ---
> > > 
> > > Topic 2: Since pci-bridge already doesn't support hotplug, I guess libvirt
> > > can immediately switch to plugging devices into dmi-to-pci-bridge instead of
> > > pci-bridge even before it supports hotplug, with no loss of functionality. 
> > > 
> > 
> > Agreed. As discussed I will work on hot-plug support for the
> > dmi-to-pci-bridge.
> > 
> > > What about when there are > 32 non-express devices?
> > 
> > Plug a pci-bridge in one of the slots of the dmi-to-pci-bridge
> 
> But pci-bridge doesn't support hotplug! We can at least add a certain number
> of dmi-to-pci-bridge devices to pcie.0 to satisfy some additional devices;

But also dmi-to-pci bridge does not support hotplug yet. If I'll succeed
to implement hotplug for it, I will do the same for the pci-bridge.

> I'm wondering what can be done if there are no more free slots on pcie.0 -
> whether a dmi-to-pci-bridge can be plugged into a downstream-port for
> example.

Well, DMI (Direct Media Interface) is a proprietary Intel bus used to
connect the North-bridge to South-bridge in order to attach the legacy
PCI devices. So in theory you should use a pcie-to-pci bridge for
downstream ports, but we don't have one in QEMU.
Maybe we should have, but I think
our plan was to use dmi-to-pci bridge and then add a pci-bridge into
one of dmi-to-pci bridge and so on.
  

> AS I said, it worked in practice. But then plugging PCI devices
> into PCIe slots also works in practice, but we've been told not to do that.

I agree, some OS-es may be "surprised" and we shouldn't do that.

> 
> 
> > 
> >  I made the
> > > following setup and it *worked*:
> > > 
> > >   virtio-net
> > >     -->dmi-to-pci-bridge
> > >     -->downstream-port
> > >     -->upstream-port
> > >     -->root-port
> > >     -->pcie.0
> > > 
> > > but is that okay according to the PCI spec? 
> > 
> > 
> > I didn't quite understand that, 
> 
> 
> That is a single chain of devices - virtio-net plugged into
> dmi-to-pci-bridge plugged into downstream-port 

As explained before, 'dmi-*' should be, at least in theory,
connected to root complex only.

plugged into upstream-port
> plugged into root-port plugged into pcie.0. I was trying to see if
> dmi-to-pci-bridge could be plugged into something other than pcie.0.

As above, is preferable to avoid that.

Thanks,
Marcel

Comment 13 Laine Stump 2016-05-09 15:37:45 UTC

(In reply to Marcel Apfelbaum from comment #12)
> 
> Devices other than virtio-* as far as I know are always PCIe (or not), so a
> white list should be enough.
> Exceptions are:
>   - pvscsi and vmxnet3 were PCI before 2.6 and now are PCIe if plugged
>     into a root port (or downstream port).

My understanding is that this is how the nec-xhci device works - it is PCI if plugged into a PCI slot, and PCIe if plugged into a PCIe slot (and unless I'm told otherwise, I assume it has always been this way).

>   - I am not sure what's up with megasas_scsi
>   - vfio devices (assigned devices) will be express if the phys device
>     is express (has PCIe capability) and they are plugged into a root
>     port (or downstream port). More exactly, if there is no PCI bus from
>     the device's bus up to the root bus.
> 
> Regarding virtio-* devices, would a QEMU version (say 2.7) and setting
> disable-modern=off be enough?
> 
> QEMU < 2.7  ? => plug all virtio-* in PCI slots
> QEMU >= 2.7 ? => set  disable-modern=off

As I said before, libvirt greatly prefers to not base any decision on the qemu version. The reason for this is that bugfixes and even new features are quite often backported to older releases for downstream *qemu* builds of various distros, and libvirt has no way of knowing if it is dealing with an "upstream build of stock qemu 2.6" or a "downstream build of qemu 2.6 + tons of bugfixes and new features" - even patching the downstream libvirt to account for the bugfixes/features present in the downstream qemu is error prone (in the case that someone installs an upstream qemu build alongside downstream libvirt, or is running an older qemu build that has the same upstream version, but is missing some of the bugfixes in the newer downstream version).

So please, if at all possible, provide us with a way to determine whether or not a device is/can be PCIe by querying qemu, rather than forcing us to rely on an error-prone qemu version check.

> (not actually needed since it will
> be the default value, and Michael prefers libvirt to not touch the modern
> flag, but you can maybe convince him you want that to be 'extra' sure)

I would also prefer to never specify it on the commandline, but its presence/absence (at first!) seemed like an opportune thing we could look at to see if the device could be PCIe.

Ideally, we should be able to query whether or not a device can be PCIe, then place it on either a PCI or PCIe slot with no changes to the commandline aside from the different bus specification (I don't want to cloud up the every single function that creates a -device arg with knowledge about which PCI bus is what type - that would get very ugly very fast).

>               => plug the virtio devices into a root port or a downstream
> port
>                  (do not plug them in pcie.0 bus) => PCIe

Wait, so we should never plug PCIe devices into pcie.0 directly? Even if we don't need to hotplug them? Instead of clearer, this is getting murkier all the time :-/

> Maybe I didn't explain myself clear, from what I know e1000e is a *new*
> device and enters directly into the white-list. The legacy device e1000
> remains unchanged - PCI.

Is e1000e PCIe-only? (i.e. it should be connected *only* to a pcie-(root|downstream)-port?) Or does it behave like virtio-*, nec-xhci, etc - changes PCI vs. PCIe depending on the type of slot it is plugged into?

> 
> I suppose adding a 'is-pcie' QMP property would be possible,
> but you would need to run QEMU in order to query this property
> and also the same device on different slots may return a different
> value. Is this good enough?

A property is only useful if it can be queried without actually instantiating a virtual machine and plugging in the device. Commandline options are like that - libvirt can easily check for the presence of a commandline option. If the type of properties you're talking about can be queried without instantiating the machine, then that's fine (and if it changes based on whether it's plugged into PCI or PCIe, then either the devices with your proposed "is-pcie" property must *all* behave that way, or you need to make that queryable as well).

> 
> I understand your concerns. Please let me re-check the *possible* virtio-blk
> issue, maybe is not there and you can check the presence of disable-modern
> flag.
> If you accept the 'is-pcie' QMP property, I can implement all the logic
> in QEMU and return the result.

Again, that's only useful if we can query it without starting up the virtual machine and attaching a device to a port. And we have to be careful that it means *exactly * the same thing in every case or it's pointless (well, cumbersome at best).

> 
> > I'm wondering what can be done if there are no more free slots on pcie.0 -
> > whether a dmi-to-pci-bridge can be plugged into a downstream-port for
> > example.
> 
> Well, DMI (Direct Media Interface) is a proprietary Intel bus used to
> connect the North-bridge to South-bridge in order to attach the legacy
> PCI devices. So in theory you should use a pcie-to-pci bridge for
> downstream ports, but we don't have one in QEMU.
> Maybe we should have, but I think
> our plan was to use dmi-to-pci bridge and then add a pci-bridge into
> one of dmi-to-pci bridge and so on.

So you didn't intend to make dmi-to-pci-bridge a replacement for *all* pci-bridge use on Q35, but just to decrease the total bridge count by 1.

Comment 14 Marcel Apfelbaum 2016-05-10 13:46:44 UTC

(In reply to Laine Stump from comment #13)
> (In reply to Marcel Apfelbaum from comment #12)
> > 
> > Devices other than virtio-* as far as I know are always PCIe (or not), so a
> > white list should be enough.
> > Exceptions are:
> >   - pvscsi and vmxnet3 were PCI before 2.6 and now are PCIe if plugged
> >     into a root port (or downstream port).
> 
> My understanding is that this is how the nec-xhci device works - it is PCI
> if plugged into a PCI slot, and PCIe if plugged into a PCIe slot

I checked and this is correct

 (and unless
> I'm told otherwise, I assume it has always been this way).
> 

No idea if it was always so.


> >   - I am not sure what's up with megasas_scsi
> >   - vfio devices (assigned devices) will be express if the phys device
> >     is express (has PCIe capability) and they are plugged into a root
> >     port (or downstream port). More exactly, if there is no PCI bus from
> >     the device's bus up to the root bus.
> > 
> > Regarding virtio-* devices, would a QEMU version (say 2.7) and setting
> > disable-modern=off be enough?
> > 
> > QEMU < 2.7  ? => plug all virtio-* in PCI slots
> > QEMU >= 2.7 ? => set  disable-modern=off
> 
> As I said before, libvirt greatly prefers to not base any decision on the
> qemu version. The reason for this is that bugfixes and even new features are
> quite often backported to older releases for downstream *qemu* builds of
> various distros, and libvirt has no way of knowing if it is dealing with an
> "upstream build of stock qemu 2.6" or a "downstream build of qemu 2.6 + tons
> of bugfixes and new features" - even patching the downstream libvirt to
> account for the bugfixes/features present in the downstream qemu is error
> prone (in the case that someone installs an upstream qemu build alongside
> downstream libvirt, or is running an older qemu build that has the same
> upstream version, but is missing some of the bugfixes in the newer
> downstream version).
> 
> So please, if at all possible, provide us with a way to determine whether or
> not a device is/can be PCIe by querying qemu, rather than forcing us to rely
> on an error-prone qemu version check.
> 

OK. I will discuss with Markus the possibility to add a 'pcie' flag
with OnOffAuto values, which defaults to:
  - off - default value legacy PCI devices
  - auto - on, if plugged into a pcie port for
            pvscsi, vmxnet3, virtio-*, xhci, more?
  - on - default value for PCie native devices

Sounds acceptable if Markus accepts the proposed solution?

> > (not actually needed since it will
> > be the default value, and Michael prefers libvirt to not touch the modern
> > flag, but you can maybe convince him you want that to be 'extra' sure)
> 
> I would also prefer to never specify it on the commandline, but its
> presence/absence (at first!) seemed like an opportune thing we could look at
> to see if the device could be PCIe.
> 
> Ideally, we should be able to query whether or not a device can be PCIe,
> then place it on either a PCI or PCIe slot with no changes to the
> commandline aside from the different bus specification (I don't want to
> cloud up the every single function that creates a -device arg with knowledge
> about which PCI bus is what type - that would get very ugly very fast).

The above solution should be fine, right?

> 
> >               => plug the virtio devices into a root port or a downstream
> > port
> >                  (do not plug them in pcie.0 bus) => PCIe
> 
> Wait, so we should never plug PCIe devices into pcie.0 directly? Even if we
> don't need to hotplug them? Instead of clearer, this is getting murkier all
> the time :-/
> 

We were talking only for virtio-* devices, but I am going to remove
this restriction.
Getting back to the proposed solution, pcie=auto will make the device PCIe
if plugged into a PCIe bus, not port. Since pcie.0 is PCIe it will be OK
to attach PCIe devices to the Root Complex.

> > Maybe I didn't explain myself clear, from what I know e1000e is a *new*
> > device and enters directly into the white-list. The legacy device e1000
> > remains unchanged - PCI.
> 
> Is e1000e PCIe-only? (i.e. it should be connected *only* to a
> pcie-(root|downstream)-port?)

From what I know is PCIe only, does not depend on the port being connected to.

 Or does it behave like virtio-*, nec-xhci, etc
> - changes PCI vs. PCIe depending on the type of slot it is plugged into?
> 
> > 
> > I suppose adding a 'is-pcie' QMP property would be possible,
> > but you would need to run QEMU in order to query this property
> > and also the same device on different slots may return a different
> > value. Is this good enough?
> 
> A property is only useful if it can be queried without actually
> instantiating a virtual machine and plugging in the device. Commandline
> options are like that - libvirt can easily check for the presence of a
> commandline option. If the type of properties you're talking about can be
> queried without instantiating the machine, then that's fine (and if it
> changes based on whether it's plugged into PCI or PCIe, then either the
> devices with your proposed "is-pcie" property must *all* behave that way, or
> you need to make that queryable as well).
> 

Got it

> > 
> > I understand your concerns. Please let me re-check the *possible* virtio-blk
> > issue, maybe is not there and you can check the presence of disable-modern
> > flag.
> > If you accept the 'is-pcie' QMP property, I can implement all the logic
> > in QEMU and return the result.
> 
> Again, that's only useful if we can query it without starting up the virtual
> machine and attaching a device to a port. And we have to be careful that it
> means *exactly * the same thing in every case or it's pointless (well,
> cumbersome at best).
> 
> > 
> > > I'm wondering what can be done if there are no more free slots on pcie.0 -
> > > whether a dmi-to-pci-bridge can be plugged into a downstream-port for
> > > example.
> > 
> > Well, DMI (Direct Media Interface) is a proprietary Intel bus used to
> > connect the North-bridge to South-bridge in order to attach the legacy
> > PCI devices. So in theory you should use a pcie-to-pci bridge for
> > downstream ports, but we don't have one in QEMU.
> > Maybe we should have, but I think
> > our plan was to use dmi-to-pci bridge and then add a pci-bridge into
> > one of dmi-to-pci bridge and so on.
> 
> So you didn't intend to make dmi-to-pci-bridge a replacement for *all*
> pci-bridge use on Q35, but just to decrease the total bridge count by 1.

Right, the way I see it we need the dmi-to-pci bridge as the *first* in the
chain to comply with the real world architecture.
You attach the dmi-to-pci to the Root Complex to start a legacy PCi hierarchy.
If you need another level, you add a PCI bridge.

Thanks,
Marcel

Comment 15 Marcel Apfelbaum 2016-05-10 13:51:08 UTC

Hi Markus,

I would appreciate if you can advice us on how to correctly expose
to libvirt the ability of a device to connect to a PCIe bus (PCIe port).

Please see comment #14 for my current proposal.

Thanks,
Marcel

Comment 16 Markus Armbruster 2016-05-30 14:37:58 UTC

I only skimmed most of this bug; apologies if I'm missing details, or
get confused.

General problem: libvirt needs to know into what kind of socket a
certain device can be plugged.

Special problem: libvirt needs to know into what kind of socket (PCI,
PCIe, either) a certain PCI device can be plugged.

These are introspection problems.  Problem that start with "libvirt
needs to know" generally are.

Aside: to find PCI devices, use QMP command

    { "execute": "qom-list-types",
      "arguments": { "implements": "pci-device", "abstract": false } }

Proposed solution (direct quote):

    Add a 'pcie' flag with OnOffAuto values [...]
    - off - default value legacy PCI devices
    - auto - on, if plugged into a pcie port for pvscsi, vmxnet3,
      virtio-*, xhci, more?
    - on - default value for PCie native devices

I read this multiple times, but still can't make heads or tails of it.
To what entity should this flag be added?  How can it be queried in
QMP?  Please give a usage example.

Comment 17 Marcel Apfelbaum 2016-06-01 11:52:02 UTC

(In reply to Markus Armbruster from comment #16)
> I only skimmed most of this bug; apologies if I'm missing details, or
> get confused.
> 

Hi Markus,

Thank you for your involvement.

> General problem: libvirt needs to know into what kind of socket a
> certain device can be plugged.
> 
> Special problem: libvirt needs to know into what kind of socket (PCI,
> PCIe, either) a certain PCI device can be plugged.
> 

And there are devices that can can be PCI/PCIe depending on the
slot they are being plugged into while other devices that are
PCI only/PCIe only.

> These are introspection problems.  Problem that start with "libvirt
> needs to know" generally are.
> 

I understood form Laine that libvirt needs the above information before
running QEMU - just from the command line help.

> Aside: to find PCI devices, use QMP command
> 
>     { "execute": "qom-list-types",
>       "arguments": { "implements": "pci-device", "abstract": false } }
> 
> Proposed solution (direct quote):
> 
>     Add a 'pcie' flag with OnOffAuto values [...]
>     - off - default value legacy PCI devices
>     - auto - on, if plugged into a pcie port for pvscsi, vmxnet3,
>       virtio-*, xhci, more?
>     - on - default value for PCie native devices
> 
> I read this multiple times, but still can't make heads or tails of it.
> To what entity should this flag be added?  How can it be queried in
> QMP?  Please give a usage example.

As stated above, if I understood correctly, libvirt needs this information
before running QEMU. (I think they are parsing the command line help)

I was thinking of adding the property to pci devices.
It will appear in device's help.
The possible values would depend on the device:
- if the device supports both PCI/PCE, the "pcie" options would
  have three values: auto (default), "on" (forced), "off" (forced)
- for a PCI only device it would be only "off"
- for a PCIe only device it would have only one value: "on"

The above proposal is very odd.
I was hoping maybe you can think of a better one.

Thanks,
Marcel

Comment 18 Markus Armbruster 2016-06-01 13:53:18 UTC

"Libvirt needs to know before starting the guest" does not imply
"libvirt should get it from command line help".

QEMU provides a number of introspection features, most of them via
QMP.  Parsing command line is not a sane way to introspect QEMU.
Libvirt still does for two valid reasons:

* QEMU provides no other way to introspect.  QEMU should be enhanced
  to provide one.  Then this reason morphs into the next one.

* Back when the libvirt code was written, QEMU provided no other way
  to introspect, but now it does.  Libvirt should be updated to use
  proper introspection.  No rush, of course.

It follows that adding more command line help parsing to libvirt is
technical debt.  Such debt needs to be minimized.

QEMU already provides device introspection, but it's rather limited.

Ideally, QMP schema introspection would let you introspect available
devices and properties via introspection of QMP command device_add,
but that's not implemented.  Until it is, you have to fall back to
other interfaces, which follow.

QMP command qom-list-types lets you introspect available devices.  Its
@implements argument let you find available devices of certain types.
See comment#14 for an example.

QMP command device-list-properties lets you introspect device
properties.  For example, to list the properties of device model
"e1000", use

    { "execute": "device-list-properties",
      "arguments": { "typename": "e1000" } }

This returns an array of objects with members "name", "type" and
"description", all strings.  The meaning of "name" is obvious.  The
other two let you make educated guesses about possible values.  They
are not what I'd consider a decent API.  Still better than parsing
command line help, though.

Can we make these interfaces answer the general question "into what
kind(s) of socket can this device model be plugged?" or at least the
special question "can this PCI device model be plugged into PCI, PCIe
or both?"

We could abuse device properties to expose the information in
device-list-properties.  Say add a property "plug" with type and
description chosen so that they somehow list the sockets into which
the device can be plugged.  Begs the question what plug=... means with
-device and device_add.  Feels like a hack to me.  Gross hack if
there's no sane use of plug=...

We could extend qom-list-types to include information on valid QOM
parents.  If we can derive valid sockets from valid QOM parents, we're
home dry.  I don't know whether this makes any QOM sense, though.
Suggest to check with a QOM expert such as Paolo.

If all else fails, we can always add yet another ad hoc QMP
introspection command.

Aside: I never liked the idea to make a single device model serve both
as PCI and PCIe device.  Yes, we want to let users use the same name
for closely related devices, but that doesn't mean we have to make
them the same device model.  We could map a single, sugared name to
whatever device model is appropriate in its context.  Anyway, that's
not what we've created, and it's probably too much work to change now.

Comment 19 Laine Stump 2016-06-01 15:51:52 UTC

Note that libvirt no longer parses the help text unless the QMP method fails. So adding things to the help text gets you exactly nothing (unless you're backporting a patch to qemu 0.12 or something :-)

Comment 22 Paolo Bonzini 2016-06-06 18:35:45 UTC

Perhaps we could add a dummy PCIe interface or class for devices that could be plugged into a PCIe bus, and libvirt could enumerate that interface with 

    { "execute": "qom-list-types",
      "arguments": { "implements": "pcie-device", "abstract": false } }

This would have the disadvantage of suggesting PCIe buses for virtio on <=2.6 machine types, which would consume I/O resources, but otherwise it's fine.

Comment 23 Markus Armbruster 2016-06-07 11:37:00 UTC

Laine, would this suffice for libvirt?

If I/O resources turn out to be a problem, perhaps libvirt could special-case virtio on <= 2.6 machine types.

Comment 24 Laine Stump 2016-06-09 15:51:10 UTC

So if we run qom-list-types with no arguments (to get the properties for all devices) then every device that can be a pcie device will have a pcie-device property that shows up in a similar place to the commandline options that device has (e.g. "bootindex")? If the device *doesn't* support pcie, would the property still be there but with a value of false? Or would it just not be there? Currently libvirt runs qom-list-types then looks for the mere presence of specific properties on specific devices, but not for their value. If pcie-device is either there or not, and if it shows up in the same place as the commandline options for the device, then detecting this with the current qemu capabilities code would be a simple as adding a string and an enum to a table.

We need to make sure this works with those devices that change behavior based on the type of port they're plugged into (and we may need a way to force PCI even when plugged into a pcie slot).

As for "special casing" - I don't like special cases, and would prefer to avoid it unless absolutely necessary.

Does that help? If you want a less confused answer, maybe it's best to ask Dan, since I think he wrote qemu_capabilities.c :-)

Comment 25 Markus Armbruster 2016-06-09 16:35:13 UTC

No.

qom-list-types lists device model names, no more:

    QMP> { "execute": "qom-list-types" }
    {"return": [{"name": "virtio-tablet-pci"}, {"name": "pc-0.13-machine"}, ...]}

To list properties, you need to do use device-list-properties, like this:

    QMP> { "execute": "device-list-properties", "arguments": { "typename": "e1000" }  }
    {"return": [{"name": "rombar", "type": "uint32"}, {"name": "autonegotiation", "description": "on/off", "type": "bool"}, ...]}

This shows property name, type and description.  The latter two give
you an idea on possible values, but they don't show you *the* value.
That's because a device type doesn't have property values, only an
*instance* of the type has[*].

But Paolo didn't suggest to add a "pcie" *property*, he suggested a
"pcie-device" *type*.  There is no QMP command to list the QOM types a
device model implements.  But qom-list-types can list the devices that
implement a QOM type.  If create a new type "pcie-device", and make
all PCIe-capable devices implement it, you can find them with

    QMP> { "execute": "qom-list-types", "arguments": { "implements": "pcie-device", "abstract": false } }

The reply would be a list just like the one shown above, only narrowed
from all the QOM types to the PCIe-capable devices.

Is this mechanism clear now?

We can use the mechanism to support any partition of PCI devices.
Possible buckets are "PCI" (can only be plugged into a PCI socket),
"PCIe" (PCIe socket only) and "PCImagic" (either socket, morphs itself
into either a PCI or a PCIe device depending on the socket it's
plugged into).

PCImagic devices are of course a virtualization artifact.


[*] The type has property value defaults, which could perhaps be
pressed into service somehow, but we'd then have to extend
device-list-properties to show them.  Let's not go there.

Comment 26 Laine Stump 2016-06-28 21:51:52 UTC

I thought I should get down some notes here since I'm about to be AFK for a couple weeks... Stepping back from the above specifics a bit, here is my current plan in sequence:

1) change libvirt to allow assigning "non-hotpluggable" devices directly to pcie-root (aka "pcie.0" aka "PCIe root complex") even if they are legacy PCI devices (the PCIe spec allows this; I think they are called "integrated legacy devices" or something like that). This by itself doesn't help anything, because we currently set the flag to require hotplug for *all* devices even if qemu is incapable of hotplugging them. But it opens the way to steps 2 and 3:

2) modify libvirt to set its internal HOTPLUGGABLE flag (used for finding an appropriate bus for auto-assignment) according to reality (currently it is a blanket requirement for all devices). e.g. USB controllers aren't hotpluggable, nor are video controllers or virtio-balloon (that's just a few; there are more). Once this is done, all of those devices will automatically be assigned to pcie-root rather than looking for a pci-bridge slot, but disks, network interfaces, etc. will still go to pci-bridge.

(NB: I have patches for the above items now. Haven't yet done what's below)

3) add a "hotpluggable" attribute to the XML for all device types that support hotplugging (default to "yes"). This will allow the user to mark certain devices with "Nope, not going to hotplug this!", and it can then be safely assigned to pcie-root.

4) add "availablePCISlots" and "availablePCIeSlots" (or some name, I can't think of a good one) attributes to the domain config [somewhere]. On any machinetype with a pcie-root, availablePCIeSlots will be 1 and availablePCISlots will be 0, and any machinetype with pci-root it will be the opposite. This will not only allow users to tune the domain down to eliminate extra slots that will never be used, it will by default stop libvirt from adding a dmi-to-pci-bridge and pci-bridge to every single Q35 domain just so that it can have at least one legacy PCI slot available.

5) This is where we need the "pcie" flag available from qemu - libvirt will start looking for the flag indicating exact bus-type and setting the appropriate internal flag when looking for an address to auto-assign. Precedence will be like this:

1) if hotpluggable='no' or the device isn't capable of being hotplugged,
it will be placed in pcie-root if a spot is available.
2) if the device can be PCIe, then we will add everything necessary for
a hotpluggable PCIe slot.
3) if the device is legacy-PCI-only, we will need to find or create
a legacy-PCI bus with a free slot, and use that.

There are some warts here though:

1) what if someone actually *wants* a PCIe-capable device to be on a legacy PCI bus?

2) what if they want a PCIe-capable device to be placed on pcie-root, but only as a legacy-PCI device (Can qemu do that? Would there be any reason to do it?)

Comment 27 Marcel Apfelbaum 2016-06-29 11:52:17 UTC

(In reply to Laine Stump from comment #26)
> I thought I should get down some notes here since I'm about to be AFK for a
> couple weeks... Stepping back from the above specifics a bit, here is my
> current plan in sequence:
> 
> 1) change libvirt to allow assigning "non-hotpluggable" devices directly to
> pcie-root (aka "pcie.0" aka "PCIe root complex") even if they are legacy PCI
> devices (the PCIe spec allows this; I think they are called "integrated
> legacy devices" or something like that). This by itself doesn't help
> anything, because we currently set the flag to require hotplug for *all*
> devices even if qemu is incapable of hotplugging them. But it opens the way
> to steps 2 and 3:
> 
> 2) modify libvirt to set its internal HOTPLUGGABLE flag (used for finding an
> appropriate bus for auto-assignment) according to reality (currently it is a
> blanket requirement for all devices). e.g. USB controllers aren't
> hotpluggable, nor are video controllers or virtio-balloon (that's just a
> few; there are more). Once this is done, all of those devices will
> automatically be assigned to pcie-root rather than looking for a pci-bridge
> slot, but disks, network interfaces, etc. will still go to pci-bridge.
> 

Using pcie-root for legacy devices instead of adding a dmi-to-pci bridge is a good idea.

> (NB: I have patches for the above items now. Haven't yet done what's below)
> 
> 3) add a "hotpluggable" attribute to the XML for all device types that
> support hotplugging (default to "yes"). This will allow the user to mark
> certain devices with "Nope, not going to hotplug this!", and it can then be
> safely assigned to pcie-root.
> 

OK, even if you can safely add a hotpluggable device into pcie-root.
It will work but you can't hot-unplug it.

> 4) add "availablePCISlots" and "availablePCIeSlots" (or some name, I can't
> think of a good one) attributes to the domain config [somewhere]. On any
> machinetype with a pcie-root, availablePCIeSlots will be 1 and
> availablePCISlots will be 0, and any machinetype with pci-root it will be
> the opposite. This will not only allow users to tune the domain down to
> eliminate extra slots that will never be used, it will by default stop
> libvirt from adding a dmi-to-pci-bridge and pci-bridge to every single Q35
> domain just so that it can have at least one legacy PCI slot available.
> 
> 5) This is where we need the "pcie" flag available from qemu - libvirt will
> start looking for the flag indicating exact bus-type and setting the
> appropriate internal flag when looking for an address to auto-assign.
> Precedence will be like this:
> 
>    1) if hotpluggable='no' or the device isn't capable of being hotplugged,
>       it will be placed in pcie-root if a spot is available.
>    2) if the device can be PCIe, then we will add everything necessary for
>       a hotpluggable PCIe slot.
>    3) if the device is legacy-PCI-only, we will need to find or create
>       a legacy-PCI bus with a free slot, and use that.
> 
> There are some warts here though:
> 
> 1) what if someone actually *wants* a PCIe-capable device to be on a legacy
> PCI bus?
> 

If is a "PCIMagic type" proposed by Markus it will become a PCI device.

> 2) what if they want a PCIe-capable device to be placed on pcie-root, but
> only as a legacy-PCI device

virtio devices will become PCI, other devices will remain PCIe...

> (Can qemu do that?

Yes.

The virtio devices are "PCIMagic type" devices and they will be:

PCI - if plugged into pcie-root or pci bus
PCIe - otherwise.

There are other "PCI magic" devices, I don't know much about them.

> Would there be any reason to
> do it?)

The only reason is that the "legacy PCIe integrated points" *can* be PCI,
but we suspect (legacy) OSes might behave strangely when a legacy device
is PCIe. We don't have a concrete fact though. 

I know that Jason is thinking about making virtio devices PCIe also
for root-pcie.

Thanks,
Marcel

Comment 28 Marcel Apfelbaum 2016-06-29 11:58:22 UTC

(In reply to Paolo Bonzini from comment #22)
> Perhaps we could add a dummy PCIe interface or class for devices that could
> be plugged into a PCIe bus, and libvirt could enumerate that interface with 
> 
>     { "execute": "qom-list-types",
>       "arguments": { "implements": "pcie-device", "abstract": false } }
> 
> This would have the disadvantage of suggesting PCIe buses for virtio on
> <=2.6 machine types, which would consume I/O resources, but otherwise it's
> fine.

The only problem I see with this approach is that virtio devices can be PCIe only if disable-modern flag is off, so they cannot automatically implement this interface.

Comment 29 Laine Stump 2016-08-14 04:37:10 UTC

An update on this:

* I sent patches upstream adding a "hotpluggable" attribute, and assigning all devices with hotpluggable='no', as well as devices that we don't support hotplugging, to pcie-root (aka pcie.0), but this was rejected. Conversation here:

https://www.redhat.com/archives/libvir-list/2016-August/msg00394.html

(in particular, see the responses to Patch 4/6). This means that we won't be *automatically* assigning PCI-only devices to pcie-root (with the exception of the primary video and USB controllers, which we've always done). It will of course be possible for a user or management application to do that themselves though.

* In that same series I also included patches to use (hotpluggable) pcie ports for virtio-pci devices if they support it (detected via checking for presence of --disable-modern). Although nobody said anything bad about those patches, I'm reworking them in light of the rejection of "hotpluggable='no'" and will be posting them in a day or two.

Without any method of saying "I don't want hotplug", if libvirt is auto-assigning the address of a device, it cannot assign it to pcie-root (since pcie-root doesn't support hotplug), so the need to auto-add pcie-*-port controllers is now much more important.

* jtomko has also posted a few versions of patches to allow setting --disable-modern and --disable-legacy. We will need to force auto-assign of virtio devices with disable-modern=on to legacy PCI slots.

Comment 33 Laine Stump 2016-09-20 20:32:17 UTC

A new patchset posted upstream:

 https://www.redhat.com/archives/libvir-list/2016-September/msg00840.html

This one has enough functionality to actually be useful (my benchmark is if virt-manager can successfully create a working domain with no extra help, which is true for this patchset)

Open items:

1) detecting whether an assigned device is legacy or Express and assigning it an appropriate address (all vfio-assigned devices are still put on legacy slots unless manually addressed).

pcie-root-ports are auto-added as needed, but:

2) no extra unused pcie-root-ports are added to allow future hotplug.

3) the pcie-root-ports are placed one per slot, so we run out after 30 devices (27 if you have a video device, USB controller, and memory balloon controller). The plan is to combine 8 pcie-root-ports on each slot, allowing us 240 devices (216 if the aforementioned devices each take up a full slot).

Comment 34 Laine Stump 2016-10-14 18:37:41 UTC

Several patches (currently 24) are needed to resolve this BZ. I will record each here as it's pushed upstream, so that none are left out from any potential backport. Here is the first:

commit 538220c3c42cad0adbd818b6a931c69492a572de
Author: Laine Stump <laine>
Date:   Sun Aug 7 17:13:58 2016 -0400

    conf: restrict what type of buses will accept a pci-bridge

Comment 35 Laine Stump 2016-10-23 16:42:28 UTC

Six more patches have been pushed upstream, with 3x that many more still to go:

commit 9ca53303f84e05386a3abdb425e331a08a090450
Author: Laine Stump <laine>
Date:   Tue Sep 6 14:35:26 2016 -0400

    qemu: replace a lot of "def->controllers[i]" with equivalent "cont"

commit ac47e4a62251f953a2ea84740b2be159661c5471
Author: Laine Stump <laine>
Date:   Wed Oct 12 15:01:32 2016 -0400

    qemu: replace "def->nets[i]" with "net" and "def->sounds[i]" with "sound"
    
commit 9b4122bf2e7831f4a59f9d9f7650774bed69abab
Author: Laine Stump <laine>
Date:   Fri Oct 7 15:51:30 2016 -0400

    conf: add typedef for anonymous enum used for memballoon device model
    
commit d4afd34110ae9b39f27765d4b7f33f1b91b791ae
Author: Laine Stump <laine>
Date:   Thu Oct 13 14:49:12 2016 -0400

    qemu: remove superfluous setting of addrs->nbuses
    
commit 116564e3b0a76b8bdd601972e34029f59a49699d
Author: Laine Stump <laine>
Date:   Thu Oct 13 14:50:23 2016 -0400

    qemu: make error message in qemuDomainPCIAddressSetCreate more clear.
    
commit dbe481a14a4c2837352b4a2179167644379fc99c
Author: Laine Stump <laine>
Date:   Wed Oct 12 15:24:57 2016 -0400

    qemu: change first arg of qemuDomainAttachChrDeviceAssignAddr()

Comment 36 Laine Stump 2016-10-24 18:00:20 UTC

Four more patches pushed upstream:

commit 848e7ff2b3f0e756aee4c20e452d59470998fe8a
Author: Laine Stump <laine>
Date:   Thu Sep 1 09:29:01 2016 -0400

    conf: new function virDomainPCIAddressReserveNextAddr()
    
commit a0bb224cf5684b9dc6ffe66769883c8080e7a46b
Author: Laine Stump <laine>
Date:   Thu Sep 1 11:23:06 2016 -0400

    qemu: use virDomainPCIAddressReserveNextAddr in qemuDomainAssignDevicePCISlots
    
commit 696929e67f8b7b2ea82f3a0aa1e65d477f620053
Author: Laine Stump <laine>
Date:   Thu Sep 1 11:38:02 2016 -0400

    conf: make virDomainPCIAddressGetNextSlot() a local static function
    
commit ab9202e431e6ccca44cf6cd3ef12492a010f988e
Author: Laine Stump <laine>
Date:   Fri Sep 2 16:39:18 2016 -0400

    qemu: replace calls to virDomainPCIAddressReserveNext*() with static function

Comment 37 Laine Stump 2016-10-27 17:13:02 UTC

Just to keep the news here current, the following have been posted upstream:

* Repost of the patches from the original series that haven't yet been ACKed/pushed:

  https://www.redhat.com/archives/libvir-list/2016-October/msg01102.html

* Patches to implement auto-assign of pcie-root-ports onto multiple functions of a single pcie-root slot:

  https://www.redhat.com/archives/libvir-list/2016-October/msg01048.html

* Cleanup patches (not necessary for new functionality, but useful in case there are further bugfixes on top of the above patches):

 https://www.redhat.com/archives/libvir-list/2016-October/msg01057.html

Comment 38 Laine Stump 2016-11-14 19:29:35 UTC

And even more patches pushed upstream (still not done):

commit 50adb8a660c97662ed96aed40f5b04959229976b
Author: Laine Stump <laine>
Date:   Fri Sep 2 21:41:43 2016 -0400

commit bd776c2b091610b0cecb17dbde1a2198d89079b6
Author: Laine Stump <laine>
Date:   Sun Sep 4 16:03:57 2016 -0400

    qemu: new functions to calculate/set device pciConnectFlags
    
commit 7f784f576b96eee68849dff1759eefe86c534df7
Author: Laine Stump <laine>
Date:   Sun Sep 4 22:14:40 2016 -0400

    qemu: set/use info->pciConnectFlags when validating/assigning PCI addresses
    
commit abb7a4bd6bb416c55c8bf664c2cb9a5cec411503
Author: Laine Stump <laine>
Date:   Wed Sep 7 12:29:30 2016 -0400

    qemu: set/use proper pciConnectFlags during hotplug
    
commit b27375a9b86f010a9a9fc0323be3f204b62195f4
Author: Laine Stump <laine>
Date:   Fri Oct 7 18:08:13 2016 -0400

    qemu: set pciConnectFlags to 0 instead of PCI|HOTPLUGGABLE if device isn't PCI
    
commit c7fc151eec74c7bbb380f78268a45d0d1e559d52
Author: Laine Stump <laine>
Date:   Sat Aug 13 18:10:41 2016 -0400

    qemu: assign virtio devices to PCIe slot when appropriate
    
commit 9dfe733e99f47b605ef02639276efc514825012d
Author: Laine Stump <laine>
Date:   Mon Aug 8 05:23:57 2016 -0400

    qemu: assign e1000e network devices to PCIe slots when appropriate

commit 5266426b211c23c80aae272e858e9500d7f4332c
Author: Laine Stump <laine>
Date:   Sun Aug 14 01:58:11 2016 -0400

    qemu: assign nec-xhci (USB3) controller to a PCIe address when appropriate
    
commit b2c887844f278f7122106b6362994a9072884ab6
Author: Laine Stump <laine>
Date:   Mon Sep 19 14:17:59 2016 -0400

    qemu: only force an available legacy-PCI slot on domains with pci-root
    
commit 0702f48ef485d6b893a1633886a1b92851a2101f
Author: Laine Stump <laine>
Date:   Mon Sep 19 14:38:47 2016 -0400

    qemu: auto-add pcie-root-port/dmi-to-pci-bridge controllers as needed
    
commit 815b51d97ab6e9512b2cf1819d57e3cc88c413e1
Author: Laine Stump <laine>
Date:   Fri Nov 11 12:14:02 2016 -0500

    qemu: update tests to not assume dmi-to-pci-bridge is always added
    
commit 807232203a197451a02d254dfe45f12e2173405f
Author: Laine Stump <laine>
Date:   Mon Sep 19 17:11:20 2016 -0400

    qemu: don't force-add a dmi-to-pci-bridge just on principle
    
commit d8bd8376698a79079f8f594cad7efed1ffa373ef
Author: Laine Stump <laine>
Date:   Mon Sep 19 18:46:41 2016 -0400

    qemu: add a USB3 controller to Q35 domains by default
    
commit 8d873a5a471179fc7e9ccc178759611a51af8e0f
Author: Laine Stump <laine>
Date:   Tue Sep 20 13:12:55 2016 -0400

    qemu: try to put ich9 sound device at 00:1B.0
    
commit 70d15c9ac65cd5b90f40deea4700e60139b7fe33
Author: Laine Stump <laine>
Date:   Tue Sep 27 20:37:30 2016 -0400

    qemu: initially reserve one open pcie-root-port for hotplug

Comment 39 Laine Stump 2016-11-15 01:10:41 UTC

Aside from placing virtio devices on PCIe ports when appropriate, the above patches accomplish a few other useful things that are 1) simple extensions, 2) useful in the same context, and 3) would not be possible without the same prerequsite patches:

* place e1000e network devices on PCIe (also covered in Bug 1343094)

* place USB3 (nec-xhci) controllers on PCIe

* when no USB controller is given in a Q35 config, add a USB3 controller rather than a set of USB2 controllers.

* place ich9 sound devices at 00:1B.0 (which is on pcie-root).

All of these serve to the same end - eliminating the need for legacy PCI controllers in a Q35 domain

(Note that none of these patches changes any behavior for i440fx machinetypes).

Comment 40 Laine Stump 2017-01-26 17:36:09 UTC

Here are the remaining patches that resolve this BZ completely. They are all upstream in libvirt-3.0.0.

The first 4 are to get VFIO assigned devices on PCIe ports:


commit e026563f0136a04b177e3627cec3afab233da690
Author: Laine Stump <laine>
Date:   Thu Nov 17 12:18:27 2016 -0500

    util: new function virFileLength()
    
commit bfdc14515366894348e943ff33671f19037bf4d6
Author: Laine Stump <laine>
Date:   Sat Nov 19 14:30:03 2016 -0500

    util: new function virPCIDeviceGetConfigPath()
    
commit 9b0848d52349a432244da49b64e99a211f85a26b
Author: Laine Stump <laine>
Date:   Thu Nov 3 16:33:32 2016 -0400

    qemu: propagate virQEMUDriver object to qemuDomainDeviceCalculatePCIConnectFlags
    
commit 70249927b7c2d40220a40c96571b26734666c253
Author: Laine Stump <laine>
Date:   Tue Nov 1 20:40:27 2016 -0400

    qemu: assign VFIO devices to PCIe addresses when appropriate
    

===================

The next 8 aggregate pcie-root-ports onto as few slots of pcie-root as possible (by assigning them to multiple functions):


commit 9838cad9cd9b7969162e4fe1546689e26af33902
Author: Laine Stump <laine>
Date:   Sun Oct 16 17:14:25 2016 -0400

    conf: use struct instead of int for each slot in virDomainPCIAddressBus
    
commit 9ff9d9f5a905dee7aabbeeae932efda0df1960f1
Author: Laine Stump <laine>
Date:   Wed Oct 19 12:43:04 2016 -0400

    conf: eliminate concept of "reserveEntireSlot"
    
commit 99bf66f3fa38884294a51f68bf24ea93eaf0dbb5
Author: Laine Stump <laine>
Date:   Wed Oct 19 13:58:42 2016 -0400

    conf: eliminate repetitive code in virDomainPCIAddressGetNextSlot()
    
commit 66e0b08d3477e43240f3713856a6ef1621fb1266
Author: Laine Stump <laine>
Date:   Fri Oct 21 13:05:33 2016 -0400

    conf: start search for next unused PCI address at same slot as previous find
    
commit 3c1a0fc27d271590818c672fab4dc935c6d2f21d
Author: Laine Stump <laine>
Date:   Tue Jan 10 00:02:40 2017 -0500

    conf: new function virDomainPCIAddressSetAllMulti()
    
commit 8f4008713a49873b4bf354321f94b3fdeee14231
Author: Laine Stump <laine>
Date:   Tue Jan 10 00:20:11 2017 -0500

    qemu: use virDomainPCIAddressSetAllMulti() to set multi when needed
    
commit 48d39cf96d9e16b33efd7a1a9eeffb04fee4e945
Author: Laine Stump <laine>
Date:   Wed Oct 19 14:15:01 2016 -0400

    conf: aggregate multiple devices on a slot when assigning PCI addresses
    
commit 147ebe6ddf313b8e003aef5ad87f40497fa89cb1
Author: Laine Stump <laine>
Date:   Thu Oct 20 15:46:01 2016 -0400

    conf: aggregate multiple pcie-root-ports onto a single slot

Comment 41 Martin Tessun 2017-02-10 14:39:04 UTC

*** Bug 1414734 has been marked as a duplicate of this bug. ***

Comment 43 lijuan men 2017-06-14 08:49:48 UTC

verify the bug

version:
libvirt-3.2.0-9.el7.x86_64
qemu-kvm-rhev-2.9.0-9.el7.x86_64

1.scenario for comment0:

1)coldplug a virtio disk without address
start a guest with a virtio disk:
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>
[root@lmen1 ~]# virsh destroy test;virsh start test
Domain test destroyed

Domain test started

[root@lmen1 ~]# virsh dumpxml test
...
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>

...
   <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x10'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x11'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x12'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>

...

disk can be read/wrote

2)hotplug a virtio disk without address

attach a disk with the xml:
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>

[root@lmen1 ~]# virsh attach-device uefi disk.xml 
Device attached successfully

[root@lmen1 ~]# virsh dumpxml uefi
...
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
...

disk can be read/wrote

3)hotplug a virtio disk with address
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>    
</disk>

[root@lmen1 ~]# virsh attach-device uefi disk.xml 
Device attached successfully

[root@lmen1 ~]# virsh dumpxml uefi
...
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
...

disk can be read/wrote

4)coldplug a virtio disk with pci address

delete the sound card,and start a guest with the xml:
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
</disk>

[root@lmen1 ~]# virsh start uefi
Domain uefi started


[root@lmen1 ~]# virsh dumpxml uefi
  <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </disk>
disk can be read/wrote




2.scenario for patch 'assign nec-xhci (USB3) controller to a PCIe address'

a guest already has a virtio disk,with xml:
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>

Edit guest xml, delete all the USB related configurations.
then save XML. It will add USB3 into guest xml automatically.
[root@lmen1 ~]# virsh dumpxml uefi
   <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
...
  <controller type='usb' index='0' model='nec-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x13'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
...

3.scenario for patch 'auto-add pcie-root-port/dmi-to-pci-bridge controllers as needed'
add 2 virtio disks without address
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vdb' bus='virtio'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vdc' bus='virtio'/>
    </disk>

[root@lmen1 ~]# virsh start uefi
Domain uefi started

[root@lmen1 ~]# virsh dumpxml uefi

   <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vdc' bus='virtio'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </disk>
...
 <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x12'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
<controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x13'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
...

the 2 disks can be listed in the guest

4.scenario for patch ' don't force-add a dmi-to-pci-bridge just on principle'

define a guest without sound card and network interface,start the guest,check the xml of the guest
there is not any dmi-to-pci-bridge and pci-bridge

5.scenario for patch 'add a USB3 controller to Q35 domains by default'

Edit guest xml, delete all the USB related configurations.
then save XML. It will add USB3 into guest xml automatically.
[root@lmen1 ~]# virsh dumpxml uefi
...
 <controller type='usb' index='0' model='nec-xhci'>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </controller>
...

6. sceranio for patch 'try to put ich9 sound device at 00:1B.0'

delete the sound card in the guest,and add the following xml into the guest
 <sound model='ich9'>
</sound>

check the guest again,
[root@lmen1 ~]# virsh dumpxml uefi
...
    <sound model='ich9'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
    </sound>
...


7.scenario for patch ' initially reserve one open pcie-root-port for hotplug'

1)define a guest with a non-virtio disk and check the xml:
[root@lmen1 ~]# virsh dumpxml uefi
...
  <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
...

the first and second pcie-root-ports are used by virtio-serial and memballoon,the last one is not used.

attach a virtio disk into the guest
[root@lmen1 ~]# cat disk.xml
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vda' bus='virtio'/>
</disk>

[root@lmen1 ~]# virsh attach-device uefi disk.xml 
Device attached successfully

[root@lmen1 ~]# virsh dumpxml uefi
...
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
...

disk can be read/wrote in the guest


2)define a guest with a virtio disk and check the xml:

 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/uefi.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
 <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
 </controller>
 <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
 </memballoon>


  <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>

the last pcie-root-port is not used
attach a virtio-disk into the guest
[root@lmen1 ~]# cat disk.xml 
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vdb' bus='virtio'/>
</disk>
[root@lmen1 ~]# virsh attach-device uefi disk.xml 
Device attached successfully

[root@lmen1 ~]# virsh dumpxml uefi
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </disk>

disk can be read/wrote in the guest

8.scenario for patch 'aggregate multiple pcie-root-ports onto a single slot'

check the addresses in scenario7,all the pcie-root-ports are in a single slot


9.scenario for patch 'VFIO assigned devices on PCIe ports'

start a guest with the xml:
   <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
    </hostdev>

[root@lmen1 ~]# virsh dumpxml uefi
<hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
 <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>

Comment 44 lijuan men 2017-06-14 09:03:46 UTC

there is a problem in the following scenario:

scenario:**hotplug** a virtio disk with **pci** address,but I can't see the virtio disk in the guest


1)there is a pci-bridge in the guest,and there is not any device using the pci-bridge:
 <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>

2)[root@lmen1 ~]# cat disk.xml 
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
</disk>



3)[root@lmen1 ~]# virsh attach-device uefi disk.xml 
Device attached successfully

[root@lmen1 ~]# virsh dumpxml uefi
...
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/a.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </disk>

...

[root@lmen1 ~]# virsh domblklist uefi
Target     Source
------------------------------------------------
sda        /var/lib/libvirt/images/uefi.qcow2
vda        /var/lib/libvirt/images/a.qcow2

there is not the virtio disk in the guest

But as described in comment 43(scenario1),it is ok to **coldplug** a virtio disk with **pci** address

Comment 45 Laine Stump 2017-06-14 17:05:02 UTC

> 1) there is a pci-bridge in the guest,and there is not any device using
>    the pci-bridge:

This *might* be your problem. In some versions of qemu, the implicit shpc controller was removed from the pci-bridge, meaning that the bridge was *truly* empty at boot time. My understanding is that when a guest has a pci-bridge with no devices plugged into it, there is no iospace allocated to that controller, so devices hotplugged into it will not work correctly if they require IO space.

Marcel could tell you for certain if the version of qemu-kvm-rhev you used for the test had this problem.

There are two ways to work around the problem:

1) use a version of qemu-kvm-rhev that doesn't disable the shpc controller in pci-bridges.

2) put some other device in a slot of the pci-bridge prior to starting the domain, then try a hot plug of the device.


I think this error can be ignored wrt verifying this BZ. If updating to a qemu that doesn't disable SHPC doesn't solve the problem, there may be an existing qemu BZ tracking the problem; otherwise Marcel will be able to tell you if a new BZ needs to be filed.

Comment 46 lijuan men 2017-06-15 03:34:25 UTC

(In reply to Laine Stump from comment #45)
> > 1) there is a pci-bridge in the guest,and there is not any device using
> >    the pci-bridge:
> 
> This *might* be your problem. In some versions of qemu, the implicit shpc
> controller was removed from the pci-bridge, meaning that the bridge was
> *truly* empty at boot time. My understanding is that when a guest has a
> pci-bridge with no devices plugged into it, there is no iospace allocated to
> that controller, so devices hotplugged into it will not work correctly if
> they require IO space.
> 
> Marcel could tell you for certain if the version of qemu-kvm-rhev you used
> for the test had this problem.
> 
> There are two ways to work around the problem:
> 
> 1) use a version of qemu-kvm-rhev that doesn't disable the shpc controller
> in pci-bridges.
> 
> 2) put some other device in a slot of the pci-bridge prior to starting the
> domain, then try a hot plug of the device.
> 
> 
> I think this error can be ignored wrt verifying this BZ. If updating to a
> qemu that doesn't disable SHPC doesn't solve the problem, there may be an
> existing qemu BZ tracking the problem; otherwise Marcel will be able to tell
> you if a new BZ needs to be filed.

thanks,Laine

@Marcel,the version of qemu I used is qemu-kvm-rhev-2.9.0-9.el7.x86_64

Comment 47 Marcel Apfelbaum 2017-06-18 16:45:41 UTC

(In reply to Laine Stump from comment #45)
> > 1) there is a pci-bridge in the guest,and there is not any device using
> >    the pci-bridge:
> 
> This *might* be your problem. In some versions of qemu, the implicit shpc
> controller was removed from the pci-bridge, meaning that the bridge was
> *truly* empty at boot time. My understanding is that when a guest has a
> pci-bridge with no devices plugged into it, there is no iospace allocated to
> that controller, so devices hotplugged into it will not work correctly if
> they require IO space.

Hi Laine,
I have a few answers, I'll try to be concise.

1. We do not support the pci-pci bridge and the dmi-pci bridge on the current
   RHEL Q35 machines.
2. We do not support hot plug for the above devices in Q35 even for the
   upstream QEMU.
3. Q35 hot-plug support is only for PCIe Root Ports. (and switch downstream ports)

> 
> Marcel could tell you for certain if the version of qemu-kvm-rhev you used
> for the test had this problem.
> 
> There are two ways to work around the problem:
> 
> 1) use a version of qemu-kvm-rhev that doesn't disable the shpc controller
> in pci-bridges.
> 

RHEL 7.3 does not have this issue and the latest qemu-kvm-rhev is also
OK - for PC machines! (Q35 is out of the scope)

> 2) put some other device in a slot of the pci-bridge prior to starting the
> domain, then try a hot plug of the device.
> 

No need, since we don't have a RHEl release with this issue
and for Q35 would not help either since the feature is not implemented.

> 
> I think this error can be ignored wrt verifying this BZ. If updating to a
> qemu that doesn't disable SHPC doesn't solve the problem, there may be an
> existing qemu BZ tracking the problem; otherwise Marcel will be able to tell
> you if a new BZ needs to be filed.

Thanks,
Marcel

Comment 48 lijuan men 2017-06-19 02:59:07 UTC

based on comment 43--47
mark the bug as verified

Comment 49 errata-xmlrpc 2017-08-01 17:09:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 50 errata-xmlrpc 2017-08-01 23:51:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Note You need to log in before you can comment on or make changes to this bug.