Bug 1659539

Summary: [Intel OSP16][SmartNIC] Support for VM-based workloads on SmartNICs
Product: Red Hat OpenStack Reporter: Krish Raghuram <krishnan.raghuram>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED DEFERRED QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16.0 (Train)CC: dasmith, eglynn, jhakimra, kchamart, krishnan.raghuram, mbooth, pchavva, pragyansri.pathi, sbauza, sgordon, stephenfin, sundar.nadathur, vromanso
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-26 17:45:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1595325, 1636090    

Description Krish Raghuram 2018-12-14 16:02:56 UTC
Description of feature:
Intel SmartNICs, starting with "Cascade Glacier", provide the ability to offload network related workloads from the main processor to get vastly improved throughput. The first use case being addressed through BZ #1629005 & 1659531 is the ability to offload ovs-vswitch onto a SmartNIC attached to a node, and launch a bare-metal instance onto that node. This requirement is to similarly support the launch of VM instances

Version-Release number of selected component (if applicable):
OpenStack Nova version in OpenStack Train release

2. Business Justification:
  a) Why is this feature needed?
     As more and more applications get deployed to the cloud, performance becomes a more critical issue, and networking bottlenecks become key inhibitors. SmartNIC-based acceleration is now seen as a cost-effective way to give specific workloads the additional computing resources (through offload onto the NIC) needed to deliver on SLAs, whether in terms of throughput or reduced latencies. 

In this case, once the Orchestrator gets the requirement to create a VM workload with specific network traffic acceleration characteristics that can be provided by the presence of a Smart NIC (with the corresponding hypervisor interface) the placement should be done on top of the node providing the hardware presence, and the vf assigned to the VM instance shall be presented to the VM using libvirt(? open for discussion) and QEMU(? as starting point and KVM later)

  b) What hardware does this enable?
   New Intel SmartNICs, starting with "Cascade Glacier"
  c) Is this hardware on-board in a system (eg, LOM) or an add-on card? 
  These are add-on cards
  d) Business impact? CSPs and Communication Service Providers (CoSPs) can deploy demanding workloads more cost-effectively
  
  e) Other business drivers: N/A

3. Primary contact at Partner, email, phone (chat)
   sundar.nadathur, derek.a.chilcote.bacco

4. Expected results:

End-user workflow
1) User requests instance using nova API
2) Nova picks a host with the required SmartNIC features and creates a port
3) Nova spawns the instance on the host
4) Nova compute deploys the instance (details of connection setup to neutron ovs agent on SmartNIC to be provided later)

Additional info:
- Neutron blueprint - https://blueprints.launchpad.net/neutron/+spec/scalable-ovs-agent 
- Spec for Neutron remote ovs agent - https://review.openstack.org/#/c/595402/
- Link to Ironic code change submissions (from Mellanox) - https://review.openstack.org/#/c/582767/
- Links for Nova submissions will be added in comments when available

Comment 1 Krish Raghuram 2019-02-21 16:39:03 UTC
This can be de-prioritized for OSP16, and brought back once Intel raises the priority

Comment 2 Pavan Chavva 2019-02-26 17:45:59 UTC
Closed based on the feedback from Intel.