Bug 1214284

Summary: [RFE][spine-leaf] Support deploying RHEL OSP nodes on a L3 leaf-spine network topology
Product: Red Hat OpenStack Reporter: Ramon Acedo <racedoro>
Component: openstack-tripleo-commonAssignee: James Slagle <jslagle>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: abdelhadi.chari, achernet, athomas, bfournie, brault, djuran, dsneddon, dtantsur, ealcaniz, ecv-redhat, erkki.peura, gcharot, gdrapeau, gkeegan, hbrock, hjensas, ibodunov, jcoufal, jschluet, kbasil, kimi.zhang, luca.miccini, lyarwood, mburns, mcornea, mrussell, pablo.iranzo, pneedle, psanchez, radoslaw.smigielski, rbinkhor, rhel-osp-director-maint, sasha, sclewis, shetze, slinaber, sputhenp, supadhya
Target Milestone: Upstream M3Keywords: FutureFeature, Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
URL: https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-deployment
Whiteboard: upstream_milestone_pike-2 upstream_definition_review upstream_status_not-started
Fixed In Version: openstack-tripleo-common-8.4.1-0.20180224032817.d51ed49.el7ost puppet-neutron-12.3.1-0.20180222064632.07b93f1.el7ost Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:26:22 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1266597, 1284124, 1337769, 1337770, 1403955, 1406102, 1420524, 1420525, 1441780, 1441788, 1477642, 1535593, 1560690, 1687381    
Bug Blocks: 1347518, 1241596, 1265219, 1419948, 1442136, 1459148, 1594606    

Description Ramon Acedo 2015-04-22 10:59:33 UTC
A number of network architects where it is planned to deploy RHEL OSP with the Installer are requesting that we support a L3 leaf-spine network topology, where (usually) ToR-leaf switches have a L3 routed link via the spine switches to each other.

In this topology, each rack will have hosts in different subnets. Currently the OSP Installer supports just one "Management" network, one "Storage" network, etc. With this topology, multiple "Management", "Storage", etc, networks should be supported.

In addition, we just support one "Provisioning" network for the provisioning via PXE boot. This should also be addressed, for example requesting setting up DHCP relays in the network.

Comment 3 chris alfonso 2015-07-31 18:10:07 UTC
I believe your network isolation implementation addresses this. Can you confirm?

Comment 4 Dan Sneddon 2015-08-24 16:32:19 UTC
(In reply to chris alfonso from comment #3)
> I believe your network isolation implementation addresses this. Can you
> confirm?

No, network isolation configures multiple networks, but they span every rack. This request is to have isolated networks per rack.

This is a feature we really want to add. We were hoping that the segmentation logic would make it into Neutron, since there was a Blueprint from almost a year ago, but that blueprint never caught traction. There is another proposal for network segmentation in Neutron being discussed, but it's just getting off the ground.

We rely on Neutron to issue the IP addresses that we use for static IPs on the network interfaces. We create a set of isolated networks. Without segmentation logic in Neutron, we would need to create a separate set of isolated networks for each rack. Since we don't know in advance how many racks there will be, we don't know how many networks to create.

We also have the issue that Undercloud DHCP does not currently support DHCP relay. This is required, because when a system in leaf rack tries to boot for discovery or deployment, it will request DHCP on its local network. The top-of-rack switch can proxy DHCP requests, but ironic-discovery DHCP doesn't support DHCP relay with multiple scopes for different L2 domains.

Comment 10 Dan Sneddon 2016-06-14 17:10:03 UTC
I submitted a blueprint upstream for the spine-and-leaf development:
https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-deployment

Comment 26 Bob Fournier 2018-01-17 18:49:42 UTC
FFE text:

This doc is to request a Feature Freeze exception for OSP-13 for the following BZs which are related to the Spine/Leaf functionality for OSP-13. 
https://bugzilla.redhat.com/show_bug.cgi?id=1214284
https://bugzilla.redhat.com/show_bug.cgi?id=1477642
In OSP-12, Spine/Leaf with composable networks was added in OSP-12, it did not include support for a routed provisioning network.  This routed provisioning network support is what is planned on being added in OSP-13.  The remaining patches are up for review and includes changes in THT, heat, instack-undercloud, networking-baremetal (formerly under ironic), and neutron.
It is anticipated that the remaining patches will land in M3. 
Regarding the set of questions for these BZs:
What is the potential impact to other DFGs?
There will be very little impact to other DFGs, however there are some services which still require a shared VLAN in order to perform clustering. We will need to identify which services can be spread across multiple subnets, and then potentially look at refactoring those services in a future version. 
What are the new software dependencies, if any, required by the feature?
	All associated patches are listed in the BZs.
Will the DFG be able to test it in time for the release? 
In Pike we had a joint collaboration between QE and development to test many features and we plan on doing this also for this feature.  The entire composable networks setup was developed between development and QE.
Will it block testing of anything else?
	No.
Will documentation be able to do its work in time? (Have you checked with the doc team?)
This feature, as the related Spine/Leaf feature did in Pike, requires a lot of engineering assistance for documentation.  As a DFG we anticipate in helping document this feature.  We have already produced a Spine/Leaf document and we will be augmenting this document for these BZs.
What other work will not get done?
We have engineering resources dedicated to this feature so it will not affect any other features planned for OSP-13. Additionally, almost all of the coding is finished, and we are in the process of obtaining reviews and doing final cleanup.
What is the business impact of not doing this now?
We have had multiple customers requesting support for Spine/Leaf.  In OSP-12, a partial solution was implemented that did not include the provisioning network.  This feature completes the Spine/Leaf functionality and is desired by customers that do not want to implement the partial solution in OSP-12. In particular, NFV customers have been waiting for this feature to be available for a number of releases.
What is the best case/worst case scenario for this landing (timing/impact)?
	Best case: All remaining patches associated with both BZs land in OSP-13 M3.
Good case: Enough patches land in OSP-13 M3 so that a “Spine/Leaf with routed provisioning network” solution can be implemented with some manual configuration steps documented.
Worst case:Remaining patches do not land and Spine/Leaf provisioning network support gets deferred to OSP-14.
What other Feature Freeze Exception are being requested?
https://docs.google.com/document/d/1fNgXIYq1-L8XzU5XQYYG9CsGSTws2Uudp2IUUn9qk1o/edit#heading=h.aqg8t797wddr
https://docs.google.com/document/d/10tCf3LmSoHoO9kPfbCY99dP2ElMqksjC_OzLig9psVw/edit
https://docs.google.com/document/d/1GBj-ignKGfcFdt3UPZtwdwRRuVw0Qe7aI8fBggfsykE/edit#

Comment 27 Bob Fournier 2018-02-12 15:03:40 UTC
*** Bug 1337769 has been marked as a duplicate of this bug. ***

Comment 30 Alexander Chuzhoy 2018-05-07 20:18:48 UTC
Verified:

Environment:
openstack-tripleo-common-8.6.1-6.el7ost.noarch
puppet-neutron-12.4.1-0.20180412211913.el7ost.noarch

Successfully introspected nodes on separated spine leafs and deployed OC.
Successfully populated the OC with entities too.

Comment 32 errata-xmlrpc 2018-06-27 13:26:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086