Description of problem: When sending East/West traffic, traffic between 2 instances on different tenant network & compute nodes, we can see that the traffic is going through controller node. The issue was found on networking-ovn-13_director-rhel-virthost-3cont_2comp-ipv4-vlan-dvr job. Version-Release number of selected component (if applicable): OSP13 2018-04-19.2 rpm -qa | grep ovn openstack-nova-novncproxy-17.0.3-0.20180413225830.fda768b.el7ost.noarch puppet-ovn-12.4.0-0.20180329043503.36ff219.el7ost.noarch openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64 python-networking-ovn-metadata-agent-4.0.1-0.20180405185449.b9e550d.el7ost.noarch openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64 novnc-0.6.1-1.el7ost.noarch python-networking-ovn-4.0.1-0.20180405185449.b9e550d.el7ost.noarch openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64 [root@controller-0 ~]# rpm -qa | grep triple puppet-tripleo-8.3.2-0.20180411174307.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1.openstack network create net-64-2 2.openstack subnet create --subnet-range 10.0.2.0/24 --network net-64-2 --dhcp subnet_4_2 3.openstack subnet create --subnet-range 2002::/64 --network net-64-2 --ipv6-address-mode slaac --ipv6-ra-mode slaac --ip-version 6 subnet_6_2 4.openstack network create net-64-1 5.openstack subnet create --subnet-range 10.0.1.0/24 --network net-64-1 --dhcp subnet_4_1 6.openstack subnet create --subnet-range 2001::/64 --network net-64-1 --ipv6-address-mode slaac --ipv6-ra-mode slaac --ip-version 6 subnet_6_1 7.openstack router create Router_eNet 8.openstack router add subnet Router_eNet subnet_4_2 9.openstack router add subnet Router_eNet subnet_4_1 10.openstack router add subnet Router_eNet subnet_6_1 11.openstack router add subnet Router_eNet subnet_6_2 12.openstack router set --external-gateway nova Router_eNet 13.openstack security group rule create --protocol icmp --ingress --prefix 0.0.0.0/0 8416b209-7674-4a7d-ae38-76389b0584b5 14.openstack security group rule create --protocol tcp --ingress --prefix 0.0.0.0/0 8416b209-7674-4a7d-ae38-76389b0584b5 15.openstack security group rule create --protocol udp --ingress --prefix 0.0.0.0/0 8416b209-7674-4a7d-ae38-76389b0584b5 send traffic from vm 1 to vm 2 and take tcpdunp on controller nodes.
When the packet is leaving compute node, routing is already done in source compute node itself. I see these pipelines hits in source compute node for the packet 1) packet entered in net-64-1 switch pipeline and then directed to router Router_eNet pipeline 2) In Router Router_eNet pipeline, as part of routing decision, source mac is changed to 10.0.1.0 and destination mac to vm2 mac and handed over to destination switch net-64-2 pipeline 3) In net-64-2 pipeline, as the destination mac is in different node, vlan tag of net-64-2 is set to packet and is output through localnet port. In summary, 1) routing is done in vm1's compute node 2) packet is having destination network's(net-64-2) vlan tag 3) packet's source mac is 10.0.1.0 4) packet's destination mac is vm2's mac I dumped the packet content on compute node and verified the same. I guess triple0 external switch(which forwarding these tenant vlan network packets) is forwarding to all nodes instead of forwarding to destination compute. If the triple0 external switch is setup in the same way in ml2/ovs, we can have the same issue there. Can we have ml2/ovs setup to test this?
(In reply to anil venkata from comment #2) > When the packet is leaving compute node, routing is already done in source > compute node itself. > > I see these pipelines hits in source compute node for the packet > 1) packet entered in net-64-1 switch pipeline and then directed to router > Router_eNet pipeline > 2) In Router Router_eNet pipeline, as part of routing decision, source mac > is changed to 10.0.1.0 and destination mac to vm2 mac and handed over to > destination switch net-64-2 pipeline > 3) In net-64-2 pipeline, as the destination mac is in different node, vlan > tag of net-64-2 is set to packet and is output through localnet port. > > In summary, > 1) routing is done in vm1's compute node > 2) packet is having destination network's(net-64-2) vlan tag > 3) packet's source mac is 10.0.1.0 > 4) packet's destination mac is vm2's mac > I dumped the packet content on compute node and verified the same. > > I guess triple0 external switch(which forwarding these tenant vlan network > packets) is forwarding to all nodes instead of forwarding to destination > compute. > > If the triple0 external switch is setup in the same way in ml2/ovs, we can > have the same issue there. Can we have ml2/ovs setup to test this? Anil when I checked same scenario on Geneve setup I did not see the issue. I dont have ml2/ovs to land you sorry.
Ok, but this is still problematic, because, the MAC of the router will be flipping across several ports (the compute node ports that do the routing pipeline thing, and the gateway node when it forwards N/S traffic again with the router MAC address). I believe this will be an issue and we need to bring this discussion upstream to find a solution. (Like not translating the MAC in the case of VLAN, or making it go through geneve...? )
(In reply to Miguel Angel Ajo from comment #4) > Ok, but this is still problematic, because, the MAC of the router will be > flipping across several ports (the compute node ports that do the routing > pipeline thing, and the gateway node when it forwards N/S traffic again with > the router MAC address). > > I believe this will be an issue and we need to bring this discussion > upstream to find a solution. (Like not translating the MAC in the case of > VLAN, or making it go through geneve...? ) And for what Anil said, it makes sense, you would be seeing the traffic also in the network node because the bridge is propagating the traffic there to (hub mode?)
As per talks with Anil, traffic is not going through the network node, it's just seen on the network node because the switch between the VMs is propagating the traffic.
reopening as per IRC conversation Can you provide tcpdump logs (specially with the -e flag to have the L2 headers) ? or any working environment to check ?
I cant deploy VLAN environment because of this patch: https://review.openstack.org/#/c/565053/
Reassigning to myself for verification
I have verified that this doesn't really happen. I'm still working with Anil on: https://bugzilla.redhat.com/show_bug.cgi?id=1561880