Description of feature: Intel RSD platforms starting with v2.4 will support pooling of FPGA devices connected through PCIe interfaces. This request is for a tenant to be able to request FPGA resources from the pool to be attached to a node, and made available to a VM on that node when a workload is instantiated with that request Version-Release number of selected component (if applicable): OpenStack Nova version in OpenStack Stein release 2. Business Justification: a) Why is this feature needed? As more and more applications get deployed to the cloud, performance becomes a more critical issue. FPGA-based acceleration is now seen as a cost-effective way to give specific workloads the additional computing resources they need to deliver on SLAs, whether in terms of throughput or reduced latencies b) What hardware does this enable? New FPGA hardware on Intel RSD platforms c) Is this hardware on-board in a system (eg, LOM) or an add-on card? FPGA accelerators will be available in PCIe add-on boards d) Business impact? CSPs and Communication Service Providers (CoSPs) can deploy demanding workloads more cost-effectively e) Other business drivers: N/A 3. Primary contact at Partner, email, phone (chat) sundar.nadathur, lin.a.yang 4. Expected results: - Pooled devices should be discovered and tracked by the Nova Resource Tracker via Placement API - RSD Pod Manager should be the source of device information - Pod Manager will expose the topology (PCIe zones) of nodes & resources that can be composed together - Those zones will be marked as the Nova host aggregates by an Operator (with help of the Ansible scripts to automate the work). - Scheduler filter will return a list of machines that are capable of providing requested FPGA function (PF or VF) to the VM. - Conductor entity will monitor VMs and resource attachment and detach resources that are not in use Additional info: - Links to blueprints and specs will be added as soon as they're done - Will need close interaction with the Cyborg community to ensure the Cyborg agent has the ability to act on the Nova request to attach an FPGA device
Hi Krish. This feature request has several unresolved dependencies. Firstly cyborg is not a currently supported project in OSP and is not currently targeted to be added in OSP 15. Can you open a sperate Bugzilla to track that request and add it as a dependency for this request. Adding cyborg as a supported project is not trivial as it would require packaging the project as an rpm, adding a set of cyborg containers to kolla and the integrating the deployment of those containers with tripleo/director. In addition to the generic cyborg support above OSP director would have to be enhanced to be able to configure the cyborg agent with the credential for the PDOM to enable this feature. With that in mind, this is likely and OSP-next-next intersect not OSP-15. As you indicated this feature depends on upstream changes to Nova and cyborg that are yet to be implemented. when you have that info available please update this thicket with the relevant blueprint/reviews. Finally, from my reading of the request, we would require a specific hardware configuration to develop and validate this feature request. In particular a minimum of the following: - 1 networks switch for management/provisioning. - 1 RSD 2.4 compatible PODM (could be deployed in a VM if reference code is used else this is an appliance.) - 1 RSD 2.4 compatible PCIe switch with PSME. - 1+ RSD 2.4 compatible computer drawer with external PCIe backplane support - 1+ RSD 2.4 compatible FPGA drawer with external PCIe backplane support - 1+ FPGAs that are compatible with both RSD 2.4 and the cyborg agent. - 1+ standard servers for OSP control plane and standard compute nodes. Can you provide a detailed description of the hardware and topology required to deploy and test this feature and indicate whether intel would be able to provide a minimal RSD system as described above or access to one in a lab for the development and validation of this feature request.
(In reply to smooney from comment #1) > > Hi Krish. > This feature request has several unresolved dependencies. > > Firstly cyborg is not a currently supported project in OSP and is not > currently targeted to be added in OSP 15. > Can you open a sperate Bugzilla to track that request and add it > as a dependency for this request. > > Adding cyborg as a supported project is not trivial as it would require > packaging the project as an rpm, adding a set of cyborg containers to kolla > and the integrating the deployment of those containers with tripleo/director. > > In addition to the generic cyborg support above OSP director would have to > be enhanced to be able to configure the cyborg agent with the credential > for the PDOM to enable this feature. > > With that in mind, this is likely and OSP-next-next intersect not OSP-15. > > As you indicated this feature depends on upstream changes to Nova and cyborg > that are yet to be implemented. when you have that info available > please update this thicket with the relevant blueprint/reviews. > > Finally, from my reading of the request, we would require a specific hardware > configuration to develop and validate this feature request. > > In particular a minimum of the following: > - 1 networks switch for management/provisioning. > - 1 RSD 2.4 compatible PODM (could be deployed in a VM if reference code > is > used else this is an appliance.) > - 1 RSD 2.4 compatible PCIe switch with PSME. > - 1+ RSD 2.4 compatible computer drawer with external PCIe backplane > support > - 1+ RSD 2.4 compatible FPGA drawer with external PCIe backplane support > - 1+ FPGAs that are compatible with both RSD 2.4 and the cyborg agent. > - 1+ standard servers for OSP control plane and standard compute nodes. > > Can you provide a detailed description of the hardware and topology required > to deploy and test this feature and indicate whether intel > would be able to provide a minimal RSD system as described above > or access to one in a lab for the development and validation > of this feature request. Sean, the basic Cyborg request is already at https://bugzilla.redhat.com/show_bug.cgi?id=1562173 Lin Yang will add links to the BPs or specs as they are submitted. I will have to discuss the hardware availability with the team and get back. I believe Red Hat has had access to an RSD rack in one of our labs in the past and probably still does - I'll investigate
Thanks krish. i did not see that in my bugzilla query. i have added it as a dependency.
As discussed in Sep 27th engineering meeting: We understand Red Hat has moved this to RH OSP16. FYI - Intel continues to work on this. Red Hat has moved to OSP16 (enhancement work of Cyborg). Revisit based on upstream status/customer use case.
We are de-prioritizing this in favor of FPGA pooling over Ethernet fabric. Will open a separate BZ for the latter
Closed based on the feedback from Intel.
BZ can be Closed, as this project is being changed