Bug 1410589

Summary: PCI: Reserve MMIO space over 4G for PCI hotplug
Product: Red Hat OpenStack Reporter: Marcel Apfelbaum <marcel>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED INSUFFICIENT_DATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: low Docs Contact:
Priority: low    
Version: unspecifiedCC: ailan, berrange, chayang, dasmith, dyuan, eglynn, jinzhao, juzhang, kchamart, lhuang, libvirt-maint, lyarwood, marcel, mtessun, sbauza, sferdjao, sgordon, srevivo, stephenfin, virt-maint, vromanso, xuzhang
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1408813 Environment:
Last Closed: 2018-08-28 09:49:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1390346, 1408813    
Bug Blocks:    

Description Marcel Apfelbaum 2017-01-05 20:07:40 UTC
+++ This bug was initially created as a clone of Bug #1408813 +++

+++ This bug was initially created as a clone of Bug #1390346 +++

QEMU reserves only 32-bit memory range for hotplug. The range can be rather limited and not enough to hot-plug devices with large BARs.

Add a parameter to QEMU to reserve a 64-bit range in CRS
starting after the memory ranged reserved for memory hotplug.

Check that all the range is addressable by VM's CPU.

Add libvirt support for the new command line parameters.

Comment 1 Marcel Apfelbaum 2017-01-05 20:10:48 UTC
This should work the same as with Memory Hotplug.
If for Memory Hotplug we need to specify the slots and max size at command line, for the PCI Hotplug over 4G we only need to specify the size.

By default the allocated size will be 0, because when we reserve the space we add a constraint on the VM CPU addressable bits. Reserving a bigger chunk may create problems when we want to migrate the VM to a host with less addressable bits.

Because of the above, libvirt can't decide by itself and the upper layers need to handle this trade-off: more hot-pluggable space VS potential migration limitations.

Comment 2 Kashyap Chamarthy 2017-01-13 14:56:20 UTC
Marcel,

I take it this bug is dependent on the relevant QEMU bug that you're assigned to:

    https://bugzilla.redhat.com/show_bug.cgi?id=1390346

If you get time, I'd also appreciate if you have some examples of this bug-fix / behavior with practical QEMU command-line.

Also, a small clarification, I'm not clear what you mean by "over 4G for PCI hotplug", what precisely you mean?

Dave Gilbert was guessing on IRC: I suspect it's PCI bus addressing; every device on a PCI bus has a few address ranges, and something somewhere has to pick them.

Comment 3 Stephen Gordon 2017-01-18 19:39:39 UTC
Leaving un-flagged, no clear feature definition to present for Pike.

Comment 4 Marcel Apfelbaum 2017-01-25 13:26:57 UTC
(In reply to Kashyap Chamarthy from comment #2)
> Marcel,
> 
> I take it this bug is dependent on the relevant QEMU bug that you're
> assigned to:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=1390346
> 
> If you get time, I'd also appreciate if you have some examples of this
> bug-fix / behavior with practical QEMU command-line.
> 
> Also, a small clarification, I'm not clear what you mean by "over 4G for PCI
> hotplug", what precisely you mean?
> 
> Dave Gilbert was guessing on IRC: I suspect it's PCI bus addressing; every
> device on a PCI bus has a few address ranges, and something somewhere has to
> pick them.

Hi,

In order to be able to hot-plug PCI devices, QEMU needs to reserve some address space to be mapped to PCI devices registers.

QEMU today reserves space only on 32-bit area (< 4G) memory. The problem is the "window" reserved is not always enough, especially if the VM has a lot of devices (they also use the same pool).
However (>4G) memory space is huge and the most part is not actually used, the only limit is the CPU addressable bits.

The question is how much space to reserve? Reserving too much can limit the migration only to the host where the VMs can support the same addressable bits.

Who can "guess" how much reservation we "need"? I suppose libvirt/nova, they can query the pyhsical PCI devices that may be attached in the future, the hosts CPU limitations and so on.

However, this is too low level and we want to come up with a solution that does not involve libvirt/nova, at least we will try.

Thanks,
Marcel

Comment 5 Stephen Finucane 2018-08-28 09:49:12 UTC
There's no clear explanation as to what use cases this feature will resolve. Until such a time as this is provided, I'm going to mark this as closed.