Bug 1412053 - [Intel OSP13][RSD] Compose high-performance RSD nodes with NVMe drive pools over PCIe [NEEDINFO]
Summary: [Intel OSP13][RSD] Compose high-performance RSD nodes with NVMe drive pools o...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-rsdclient
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: z3
: 13.0 (Queens)
Assignee: James Slagle
QA Contact: Omri Hochman
URL: https://review.openstack.org/#/c/503841/
Whiteboard:
Depends On: 1466874
Blocks: epic-rsd 1419948 1422243
TreeView+ depends on / blocked
 
Reported: 2017-01-11 04:54 UTC by Krish Raghuram
Modified: 2020-04-27 01:34 UTC (History)
12 users (show)

Fixed In Version: python-rsdclient-0.1.1-1.el7ost
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
pchavva: needinfo? (racedoro)


Attachments (Terms of Use)

Description Krish Raghuram 2017-01-11 04:54:00 UTC
1. Description of feature:
RSD Controller can help TripleO compose high performance nodes with NVMe drive pooling over PCIe.
By composing such nodes, RH OSP end users can deploy high priority customer workloads on extremely efficient non-volatile storage backends used as ephemeral storage. Storage is available only for the life span of the node. NVMe storage is delivered back to the pool post-usage.

Version-Release number of selected component (if applicable):
TripleO version in OpenStack Pike release

2. Business Justification:
  a) Why is this feature needed?
     RSD is a new architecture that realizes an agile infrastructure where the hardware resources can be pooled according to application needs. It also enables a more easily scaled infrastructure, so CPU, memory, network and storage resources can be added as needed, without the need to do complete replacements of nodes
  b) What hardware does this enable?
  
  c) Is this hardware on-board in a system (eg, LOM) or an add-on card? 
  
  d) Business impact? N/A
  
  e) Other business drivers: N/A

3. Primary contact at Partner, email, phone (chat)
   Priyank Durugkar
   priyank.durugkar@intel.com

4. Expected results:
- DC admin login to TripleO
- Based on QOS policies, TripleO composes high performance (Xeon) RSD logical node with NVMe drive as ephemeral storage
- Later end user uses this RSD logical node to host/run high performance workloads
- Once end user finishes running the workload, TripleO deletes the node & NVMe resource is back in RSD pool as "available"

Additional info:

Comment 2 Jaromir Coufal 2017-01-24 20:30:34 UTC
Can we be more specific on what is the output of this RFE (is it role definition with default params?)? Isn't it only documentation effort in the end? Since the mechanism for such use-case is available. Is there missing some puppet modules / heat config? Who is expected to work on this feature? Who is expected to test it?

Comment 3 Joe Donohue 2017-01-24 21:07:35 UTC
Hi Krish,

Could you respond to the questions posed in comment #2?

Thanks,
Joe

Comment 4 Krish Raghuram 2017-01-24 21:11:41 UTC
(In reply to Joe Donohue from comment #3)
> Hi Krish,
> 
> Could you respond to the questions posed in comment #2?
> 
> Thanks,
> Joe

We had put this in as a placeholder while still in internal discussion about what exactly needs to be done in TripleO. Please bear with us - should be able to get back in a couple of weeks

Comment 5 Krish Raghuram 2017-09-18 15:51:30 UTC
(In reply to Jaromir Coufal from comment #2)
> Can we be more specific on what is the output of this RFE (is it role
> definition with default params?)? Isn't it only documentation effort in the
> end? Since the mechanism for such use-case is available. Is there missing
> some puppet modules / heat config? Who is expected to work on this feature?
> Who is expected to test it?

We have provided capability to compose an RSD node with PCIe-attached NVMe storage in the RSDclient and RSDlib libraries to be used by OpenStack Client plug-ins (https://github.com/openstack/python-rsdclient and https://githib.com/openstack/rsd-client)

We're now working on an Ironic plug-in driver to recognize the presence of this PCIe-attached storage, and hope to get it into Queens release

Red Hat will need to test the ability of TripleO to provision an RSD node with this attached storage. A future enhancement will be to allow remote boot from this PCIe-attached storage

Comment 6 Ramon Acedo 2017-10-20 10:11:14 UTC
Krish, since the ability to compose and show RSD nodes with NVMe storage devices is done in the python-rsdclient and TripleO/director will be able to use these nodes as Overcloud nodes as done with any other type of hardware, what type of functionality is described here?

In my view, it would be desirable that Ironic inspector can see these storage devices, would that be what's requested here? If that's the case, we could track this tests here. 

Could you have the workflow tested and reported here. If there are any gaps, we can track them here.

This would be the workflow we envision:

 a. operator composes RSD logical nodes with NVMe storage (via the OSC python-rsdclient plug-in)

 b. operator registers nodes to TripleO/director using their Redfish interface

 c. operator uses root device hints to instruct TripleO/director to deploy Overcloud nodes' OS on the NVMe disks (provided there are more disks to install on other than the NVMe)

 c. operator deploys Overcloud on the composed RSD logical nodes

Please, confirm if our understanding is right. Many thanks.

Comment 7 Joe Donohue 2017-10-30 18:56:46 UTC
Adding needinfo to krish per comment#6

Comment 8 Krish Raghuram 2017-10-31 19:15:27 UTC
(In reply to Ramon Acedo from comment #6)
> Krish, since the ability to compose and show RSD nodes with NVMe storage
> devices is done in the python-rsdclient and TripleO/director will be able to
> use these nodes as Overcloud nodes as done with any other type of hardware,
> what type of functionality is described here?
> 
> In my view, it would be desirable that Ironic inspector can see these
> storage devices, would that be what's requested here? If that's the case, we
> could track this tests here. 
> 
> Could you have the workflow tested and reported here. If there are any gaps,
> we can track them here.
> 
> This would be the workflow we envision:
> 
>  a. operator composes RSD logical nodes with NVMe storage (via the OSC
> python-rsdclient plug-in)
> 
>  b. operator registers nodes to TripleO/director using their Redfish
> interface
> 
>  c. operator uses root device hints to instruct TripleO/director to deploy
> Overcloud nodes' OS on the NVMe disks (provided there are more disks to
> install on other than the NVMe)
> 
>  c. operator deploys Overcloud on the composed RSD logical nodes
> 
> Please, confirm if our understanding is right. Many thanks.

Yes, this is correct, Ramon. The basic work in being able to compose nodes with PCIe-attached NVMe drives is already there in rsdlib/rsdclient. Just need to test as you've described

Comment 9 Ramon Acedo 2017-11-01 15:51:17 UTC
Thanks. Adding BZ#1466874, which tracks the inclusion of rsd-lib and python-rsdclient, as a dependency.

When the packages are ready and you can test it, please, post here the test plan and results.

Comment 15 Lon Hohberger 2018-07-23 10:34:46 UTC
According to our records, this should be resolved by python-rsdclient-0.1.1-1.el7ost.  This build is available now.

Comment 17 Sean Merrow 2018-08-16 20:55:52 UTC
Krish, this is ready to be QA'd by Intel. It is in build python-rsdclient-0.1.1-1.el7ost or later. Please post results when complete.

Thanks,
Sean

Comment 18 Krish Raghuram 2018-08-24 16:55:02 UTC
(In reply to Sean Merrow from comment #17)
> Krish, this is ready to be QA'd by Intel. It is in build
> python-rsdclient-0.1.1-1.el7ost or later. Please post results when complete.
> 
> Thanks,
> Sean

Sean, we don't have a test setup for RH OSP, and thus cannot do QA on this. It is our expectation that you will confirm that RH Director is able to deploy nodes on the overcloud with NVMe storage attached, and initiate boot sequence to bring them up. Our work is restricted to testing & verifying that the nodes can be composed with NVMe storage and registered with OpenStack Ironic in the undercloud

Krish


Note You need to log in before you can comment on or make changes to this bug.