Bug 1570843 - East/West traffic goes through controller node on DVR-VLAN deployment
Summary: East/West traffic goes through controller node on DVR-VLAN deployment
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: zstream
: 13.0 (Queens)
Assignee: Miguel Angel Ajo
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On: 1561880
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-23 14:11 UTC by Eran Kuris
Modified: 2019-09-09 16:07 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-18 11:02:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eran Kuris 2018-04-23 14:11:36 UTC
Description of problem:
When sending East/West traffic, traffic between 2 instances on different tenant network & compute nodes, we can see that the traffic is going through controller node.
The issue was found on networking-ovn-13_director-rhel-virthost-3cont_2comp-ipv4-vlan-dvr job.



Version-Release number of selected component (if applicable):
OSP13 2018-04-19.2
 rpm -qa | grep ovn 
openstack-nova-novncproxy-17.0.3-0.20180413225830.fda768b.el7ost.noarch
puppet-ovn-12.4.0-0.20180329043503.36ff219.el7ost.noarch
openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64
python-networking-ovn-metadata-agent-4.0.1-0.20180405185449.b9e550d.el7ost.noarch
openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64
novnc-0.6.1-1.el7ost.noarch
python-networking-ovn-4.0.1-0.20180405185449.b9e550d.el7ost.noarch
openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64
[root@controller-0 ~]# rpm -qa | grep triple
puppet-tripleo-8.3.2-0.20180411174307.el7ost.noarch
How reproducible:
100%

Steps to Reproduce:
1.openstack network create net-64-2
2.openstack subnet create --subnet-range 10.0.2.0/24  --network net-64-2 --dhcp subnet_4_2
3.openstack subnet create --subnet-range 2002::/64 --network net-64-2  --ipv6-address-mode slaac  --ipv6-ra-mode slaac --ip-version 6 subnet_6_2
4.openstack network create net-64-1
5.openstack subnet create --subnet-range 10.0.1.0/24  --network net-64-1 --dhcp subnet_4_1
6.openstack subnet create --subnet-range 2001::/64 --network net-64-1  --ipv6-address-mode slaac  --ipv6-ra-mode slaac --ip-version 6 subnet_6_1
7.openstack router create Router_eNet
8.openstack router add subnet Router_eNet subnet_4_2
9.openstack router add subnet Router_eNet subnet_4_1
10.openstack router add subnet Router_eNet subnet_6_1
11.openstack router add subnet Router_eNet subnet_6_2
12.openstack router set --external-gateway nova Router_eNet
13.openstack security group rule create --protocol icmp --ingress --prefix 0.0.0.0/0 8416b209-7674-4a7d-ae38-76389b0584b5
14.openstack security group rule create --protocol tcp --ingress --prefix 0.0.0.0/0 8416b209-7674-4a7d-ae38-76389b0584b5
15.openstack security group rule create --protocol udp --ingress --prefix 0.0.0.0/0 8416b209-7674-4a7d-ae38-76389b0584b5

send traffic from vm 1 to vm 2 and take tcpdunp on controller nodes.

Comment 2 anil venkata 2018-04-25 07:21:19 UTC
When the packet is leaving compute node, routing is already done in source compute node itself.

I see these pipelines hits in source compute node for the packet
1) packet entered in net-64-1 switch pipeline and then directed to router Router_eNet pipeline
2) In Router Router_eNet pipeline, as part of routing decision, source mac is changed to 10.0.1.0 and destination mac to vm2 mac and handed over to destination switch net-64-2 pipeline
3) In net-64-2 pipeline, as the destination mac is in different node, vlan tag of net-64-2 is set to packet and is output through localnet port.

In summary, 
1) routing is done in vm1's compute node 
2) packet is having destination network's(net-64-2) vlan tag
3) packet's source mac is 10.0.1.0
4) packet's destination mac is vm2's mac
I dumped the packet content on compute node and verified the same.

I guess triple0 external switch(which forwarding these tenant vlan network packets) is forwarding to all nodes instead of forwarding to destination compute.

If the triple0 external switch is setup in the same way in ml2/ovs, we can have the same issue there. Can we have ml2/ovs setup to test this?

Comment 3 Eran Kuris 2018-04-25 07:25:08 UTC
(In reply to anil venkata from comment #2)
> When the packet is leaving compute node, routing is already done in source
> compute node itself.
> 
> I see these pipelines hits in source compute node for the packet
> 1) packet entered in net-64-1 switch pipeline and then directed to router
> Router_eNet pipeline
> 2) In Router Router_eNet pipeline, as part of routing decision, source mac
> is changed to 10.0.1.0 and destination mac to vm2 mac and handed over to
> destination switch net-64-2 pipeline
> 3) In net-64-2 pipeline, as the destination mac is in different node, vlan
> tag of net-64-2 is set to packet and is output through localnet port.
> 
> In summary, 
> 1) routing is done in vm1's compute node 
> 2) packet is having destination network's(net-64-2) vlan tag
> 3) packet's source mac is 10.0.1.0
> 4) packet's destination mac is vm2's mac
> I dumped the packet content on compute node and verified the same.
> 
> I guess triple0 external switch(which forwarding these tenant vlan network
> packets) is forwarding to all nodes instead of forwarding to destination
> compute.
> 
> If the triple0 external switch is setup in the same way in ml2/ovs, we can
> have the same issue there. Can we have ml2/ovs setup to test this?

Anil when I checked same scenario on Geneve setup I did not see the issue.
I dont have ml2/ovs to land you sorry.

Comment 4 Miguel Angel Ajo 2018-04-25 07:59:43 UTC
Ok, but this is still problematic, because, the MAC of the router will be flipping across several ports (the compute node ports that do the routing pipeline thing, and the gateway node when it forwards N/S traffic again with the router MAC address).

I believe this will be an issue and we need to bring this discussion upstream to find a solution. (Like not translating the MAC in the case of VLAN, or making it go through geneve...? )

Comment 5 Miguel Angel Ajo 2018-04-25 08:41:02 UTC
(In reply to Miguel Angel Ajo from comment #4)
> Ok, but this is still problematic, because, the MAC of the router will be
> flipping across several ports (the compute node ports that do the routing
> pipeline thing, and the gateway node when it forwards N/S traffic again with
> the router MAC address).
> 
> I believe this will be an issue and we need to bring this discussion
> upstream to find a solution. (Like not translating the MAC in the case of
> VLAN, or making it go through geneve...? )

And for what Anil said, it makes sense, you would be seeing the traffic also in the network node because the bridge is propagating the traffic there to (hub mode?)

Comment 6 Miguel Angel Ajo 2018-05-30 10:28:54 UTC
As per talks with Anil, traffic is not going through the network node, it's just seen on the network node because the switch between the VMs is propagating the traffic.

Comment 7 Miguel Angel Ajo 2018-05-30 11:02:12 UTC
reopening as per IRC conversation 

Can you provide tcpdump logs (specially with the -e flag to have the L2 headers) ? 

or any working environment to check ?

Comment 8 Eran Kuris 2018-05-30 11:17:03 UTC
I cant deploy VLAN environment because of this patch:
https://review.openstack.org/#/c/565053/

Comment 9 Miguel Angel Ajo 2018-06-10 22:18:40 UTC
Reassigning to myself for verification

Comment 10 Miguel Angel Ajo 2018-07-18 11:02:31 UTC
I have verified that this doesn't really happen. I'm still working with Anil on: https://bugzilla.redhat.com/show_bug.cgi?id=1561880


Note You need to log in before you can comment on or make changes to this bug.