Bug 1621919

Summary: [Netvirt] ODL should take into account bridge/provider mappings when scheduling routing
Product: Red Hat OpenStack Reporter: Tim Rozet <trozet>
Component: opendaylightAssignee: Aswin Suryanarayanan <asuryana>
Status: CLOSED ERRATA QA Contact: Noam Manos <nmanos>
Severity: high Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: aadam, dsneddon, dsorrent, harsh.kotak, jjoyce, ksavich, mariel, mkolesni, nyechiel, sgaddam, trozet
Target Milestone: z4Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Netvirt
Fixed In Version: opendaylight-8.3.0-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-16 17:56:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tim Rozet 2018-08-23 21:34:51 UTC
ODL North/South routing works by dedicating a compute node to do SNAT/DNAT while FIP routing is done local to the compute node where the instance is. The problem is with scheduling, any node will be chosen regardless of whether or not that node actually has external network access. The same is true about FIP assuming that the node with the instance where the FIP will be placed has external network access.

For example, take Computes A and B. Compute A has external access via an external network created for physnet datacentre, which maps to br-ex bridge. Compute B has no br-ex bridge or external network access.

In this example when SNAT/DNAT is scheduled to a compute node, it should check whether or not that node is part of the external network it is attaching to by examining the provider mappings in hostconfig.

For FIP (static NAT), the same issue occurs. If an instance is scheduled on Compute B, and a floating IP is associated with it, ODL will assume that Compute B must have external network access. Really in this case the flows should be installed so that FIP will be handled by a node that actually does have external access (Compute A in this example).

Comment 1 Dan Sneddon 2018-08-23 22:03:47 UTC
A use case for this is a multi-site deployment with different external and provider network(s) at each edge site. Edge site A uses br-ex-a bridge, while edge site B uses br-ex-b bridge. When scheduling a compute node to do SNAT/DNAT for a particular network, ODL should check to see that the physical network used by the network exists on the compute node.

For instance, suppose deployment-wide bridge mappings are "physnet_a:br-ex-a;physnet_b:br-ex-b", and there are two networks, net_a has physical_network=physnet_a, and net_b has physnet_b. When scheduling a SNAT/DNAT worker for net_a, ODL should include only compute nodes with bridge br-ex-a, and similarly only include compute nodes with br-ex-b as candidates for net_b.

Comment 2 Aswin Suryanarayanan 2018-08-25 08:44:19 UTC
Currently ODL assumes all the computes have external network access. To make it suitable to the topology mentioned above we can consider doing the below changes.

1)For SNAT, the centralized switch is currently scheduled from a pool of available switches. Currently the pool contains all the switches . We can now have separate pools based on the provider mappings. While selecting a NAPT switch the provider mapping on external network shall be used to select the pool and then a switch will be selected from the pool. Thus ensuring the selected switch will have the expected bridges.

2) For FIP before programming the flows on the local compute node we can check if the compute have the necessary bridge mappings. The data structure in (1) where we divided nodes into pools can be used for that. If the we find the node has the appropriate bridge mappings we can continue to program the flows locally. If not we can reuse the NAPT switch selected as per (1) for doing the FIP translations. This does not involve adding any new flows to the local compute, but we will not add any FIP flows to the local compute but all will be added to the remote compute. This includes the ARP responder flows for the FIP as well. The current pipline is capable of carrying the packet from local node (which should be a non-NAPT switch) to NAPT switch the do the FIP forward and reverse translations and the retuen packet will reach back to the local node.

Comment 3 Tim Rozet 2018-08-27 16:07:26 UTC
This sounds good to me Aswin.

Comment 4 Mike Kolesnik 2018-08-28 13:33:23 UTC
(In reply to Aswin Suryanarayanan from comment #2)
> 
> 2) For FIP before programming the flows on the local compute node we can
> check if the compute have the necessary bridge mappings. The data structure
> in (1) where we divided nodes into pools can be used for that. If the we
> find the node has the appropriate bridge mappings we can continue to program
> the flows locally. If not we can reuse the NAPT switch selected as per (1)
> for doing the FIP translations. This does not involve adding any new flows
> to the local compute, but we will not add any FIP flows to the local compute
> but all will be added to the remote compute. This includes the ARP responder
> flows for the FIP as well. The current pipline is capable of carrying the
> packet from local node (which should be a non-NAPT switch) to NAPT switch
> the do the FIP forward and reverse translations and the retuen packet will
> reach back to the local node.

In this case you'll also need to consider that these FIP flows need to move should the NAPT switch migrate for any reason, or that node goes down (which I'm not sure we have a good way to check). Also what happens if the FIP flow gets left over on the old node? How will that affect the network?

To me this sounds like a different bug which should be tracked on it's own (even though the root cause is the same).

Comment 5 Aswin Suryanarayanan 2018-08-29 08:17:49 UTC
(In reply to Mike Kolesnik from comment #4)
> (In reply to Aswin Suryanarayanan from comment #2)
> > 
> > 2) For FIP before programming the flows on the local compute node we can
> > check if the compute have the necessary bridge mappings. The data structure
> > in (1) where we divided nodes into pools can be used for that. If the we
> > find the node has the appropriate bridge mappings we can continue to program
> > the flows locally. If not we can reuse the NAPT switch selected as per (1)
> > for doing the FIP translations. This does not involve adding any new flows
> > to the local compute, but we will not add any FIP flows to the local compute
> > but all will be added to the remote compute. This includes the ARP responder
> > flows for the FIP as well. The current pipline is capable of carrying the
> > packet from local node (which should be a non-NAPT switch) to NAPT switch
> > the do the FIP forward and reverse translations and the retuen packet will
> > reach back to the local node.
> 
> In this case you'll also need to consider that these FIP flows need to move
> should the NAPT switch migrate for any reason, or that node goes down (which
> I'm not sure we have a good way to check). Also what happens if the FIP flow
> gets left over on the old node? How will that affect the network?
> 
> To me this sounds like a different bug which should be tracked on it's own
> (even though the root cause is the same).

With this bug we can ensure that the flows shall be configured in the node which has external connectivity. As you mentioned, the fail-over part for the FIP flows in NAPT switch can be a new bug.

Comment 7 Aswin Suryanarayanan 2018-09-21 10:47:23 UTC
The patch is not merged upstream.

Comment 23 errata-xmlrpc 2019-01-16 17:56:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0093