We have two networker nodes in our OSP 13 deployment. We have an external network attached to br-ex that works fine on one node, but routers attached to that network on the second networker are unable to pass packets. Looking at the openflow roles on the working node we see the following, which looks approximately correct: [root@neu-17-11-nc2 ~]# ovs-ofctl dump-flows br-ex cookie=0xb2514f67fe7d2fc1, duration=87362.294s, table=0, n_packets=2563737, n_bytes=251925611, priority=4,in_port="phy-br-ex",dl_vlan=12 actions=strip_vlan,NORMAL cookie=0x79c8c5bf077fd19e, duration=87362.244s, table=0, n_packets=47, n_bytes=2814, priority=4,in_port="phy-br-ex",dl_vlan=14 actions=strip_vlan,NORMAL cookie=0xb2514f67fe7d2fc1, duration=87378.996s, table=0, n_packets=221465, n_bytes=12102660, priority=2,in_port="phy-br-ex" actions=drop cookie=0xb2514f67fe7d2fc1, duration=87379.025s, table=0, n_packets=5451790, n_bytes=18038331531, priority=0 actions=NORMAL Given that we have in `ovs-vsctl show` ports tagged with vlan 12 associated with that external network: Port "qg-f921a854-20" tag: 12 Interface "qg-f921a854-20" type: internal On the other networker, the openflow rules for br-ex look wrong: [root@neu-19-11-nc1 ~]# ovs-ofctl dump-flows br-ex cookie=0x951ea463c71bfcae, duration=111477.400s, table=0, n_packets=816, n_bytes=83579, priority=2,in_port="phy-br-ex" actions=drop cookie=0x951ea463c71bfcae, duration=111477.407s, table=0, n_packets=0, n_bytes=0, priority=0 actions=NORMAL As you might expect given the above rule, routers on this host that are attached to the external network are unable to pass packets. If we power off one of the networker nodes so that everything fails back to a single host, the problem goes away and all routers attached to the external network work correctly.
After restarting the node, I've been able to reproduce the problem. Here's a router on nc1: [root@neu-19-11-nc1 ~]# ip netns exec qrouter-4cb73b3c-d4d1-4add-9191-3d60e3abb0f7 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 72: ha-3f3d5c15-a8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:2a:ce:1e brd ff:ff:ff:ff:ff:ff inet 169.254.192.7/18 brd 169.254.255.255 scope global ha-3f3d5c15-a8 valid_lft forever preferred_lft forever inet 169.254.0.4/24 scope global ha-3f3d5c15-a8 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe2a:ce1e/64 scope link valid_lft forever preferred_lft forever 73: qg-824a6859-ae: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9050 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:d4:2e:b0 brd ff:ff:ff:ff:ff:ff inet 128.31.27.33/22 scope global qg-824a6859-ae valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fed4:2eb0/64 scope link nodad valid_lft forever preferred_lft forever [root@neu-19-11-nc1 ~]# ip netns exec qrouter-4cb73b3c-d4d1-4add-9191-3d60e3abb0f7 ip route default via 128.31.24.1 dev qg-824a6859-ae 128.31.24.0/22 dev qg-824a6859-ae proto kernel scope link src 128.31.27.33 169.254.0.0/24 dev ha-3f3d5c15-a8 proto kernel scope link src 169.254.0.4 169.254.192.0/18 dev ha-3f3d5c15-a8 proto kernel scope link src 169.254.192.7 That's plumbed into ovs: [root@neu-19-11-nc1 ~]# ovs-vsctl show 0e32ae2f-6c78-49bd-8045-8fc4ac32f425 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-sahara Controller "tcp:127.0.0.1:6633" fail_mode: secure Port br-sahara Interface br-sahara type: internal Port "p3p1.207" Interface "p3p1.207" Port phy-br-sahara Interface phy-br-sahara type: patch options: {peer=int-br-sahara} Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "qr-0e9a51d1-49" tag: 1 Interface "qr-0e9a51d1-49" type: internal Port "qg-f4ff7324-e2" tag: 11 Interface "qg-f4ff7324-e2" type: internal Port "ha-a7a451bc-cc" tag: 5 Interface "ha-a7a451bc-cc" type: internal Port "qr-225e951a-b4" tag: 9 Interface "qr-225e951a-b4" type: internal Port "ha-3fda299f-48" tag: 10 Interface "ha-3fda299f-48" type: internal Port "qr-ae359120-fd" tag: 8 Interface "qr-ae359120-fd" type: internal Port "tapae6b4ee5-26" tag: 1 Interface "tapae6b4ee5-26" type: internal Port "qg-b1c270f1-e3" tag: 11 Interface "qg-b1c270f1-e3" type: internal Port "tap5f0515ca-50" tag: 3 Interface "tap5f0515ca-50" type: internal Port "qg-26f86c24-c2" tag: 12 Interface "qg-26f86c24-c2" type: internal Port "ha-590394df-3e" tag: 6 Interface "ha-590394df-3e" type: internal Port "qg-f921a854-20" tag: 11 Interface "qg-f921a854-20" type: internal Port "ha-72270a12-02" tag: 6 Interface "ha-72270a12-02" type: internal Port "tapc99ad948-26" tag: 2 Interface "tapc99ad948-26" type: internal Port "qr-c024d5f4-bd" tag: 8 Interface "qr-c024d5f4-bd" type: internal Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port int-br-sahara Interface int-br-sahara type: patch options: {peer=phy-br-sahara} Port "qg-824a6859-ae" tag: 11 Interface "qg-824a6859-ae" type: internal Port "ha-06b73a2b-dd" tag: 6 Interface "ha-06b73a2b-dd" type: internal Port "ha-3f3d5c15-a8" tag: 6 Interface "ha-3f3d5c15-a8" type: internal Port "qg-438c3ad9-6c" tag: 11 Interface "qg-438c3ad9-6c" type: internal Port "tap007e5914-de" tag: 9 Interface "tap007e5914-de" type: internal Port "qg-265ebbb0-63" tag: 11 Interface "qg-265ebbb0-63" type: internal Port "qr-0ef11b01-d8" tag: 2 Interface "qr-0ef11b01-d8" type: internal Port "qr-860aa65b-20" tag: 4 Interface "qr-860aa65b-20" type: internal Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port "qr-34b1985c-e5" tag: 7 Interface "qr-34b1985c-e5" type: internal Port "tapaf7d1ae1-37" tag: 4 Interface "tapaf7d1ae1-37" type: internal Port "tap37377660-0d" tag: 8 Interface "tap37377660-0d" type: internal Port "tapb219b9ce-aa" tag: 7 Interface "tapb219b9ce-aa" type: internal Port br-int Interface br-int type: internal Port "ha-01711c4c-72" tag: 5 Interface "ha-01711c4c-72" type: internal Bridge br-ex Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Port "p3p1.3802" Interface "p3p1.3802" Port br-ex Interface br-ex type: internal Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "vxlan-ac104011" Interface "vxlan-ac104011" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.17"} Port "vxlan-ac104010" Interface "vxlan-ac104010" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.16"} Port "vxlan-ac104026" Interface "vxlan-ac104026" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.38"} Port "vxlan-ac104017" Interface "vxlan-ac104017" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.23"} Port "vxlan-ac10401d" Interface "vxlan-ac10401d" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.29"} Port "vxlan-ac104022" Interface "vxlan-ac104022" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.34"} Port "vxlan-ac10400a" Interface "vxlan-ac10400a" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.10"} Port "vxlan-ac10400b" Interface "vxlan-ac10400b" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.11"} Port "vxlan-ac10400c" Interface "vxlan-ac10400c" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.12"} Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port "vxlan-ac104013" Interface "vxlan-ac104013" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.19"} Port "vxlan-ac104020" Interface "vxlan-ac104020" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.32"} Port "vxlan-ac10400d" Interface "vxlan-ac10400d" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.13"} Port "vxlan-ac104015" Interface "vxlan-ac104015" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.21"} Port br-tun Interface br-tun type: internal Port "vxlan-ac104014" Interface "vxlan-ac104014" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.20"} Port "vxlan-ac104016" Interface "vxlan-ac104016" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.22"} Port "vxlan-ac10400e" Interface "vxlan-ac10400e" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.14"} Port "vxlan-ac10401a" Interface "vxlan-ac10401a" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.26"} Port "vxlan-ac104019" Interface "vxlan-ac104019" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.25"} Port "vxlan-ac104012" Interface "vxlan-ac104012" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.16.64.31", out_key=flow, remote_ip="172.16.64.18"} ovs_version: "2.9.0" But from inside that namespace I'm unable to ping the default gateway: [root@neu-19-11-nc1 ~]# ip netns exec qrouter-4cb73b3c-d4d1-4add-9191-3d60e3abb0f7 ping 128.31.24.1 PING 128.31.24.1 (128.31.24.1) 56(84) bytes of data. From 128.31.27.33 icmp_seq=1 Destination Host Unreachable From 128.31.27.33 icmp_seq=2 Destination Host Unreachable From 128.31.27.33 icmp_seq=3 Destination Host Unreachable Tracing on the physical interface (p3p1.3802), I see: [root@neu-19-11-nc1 ~]# tcpdump -n -i p3p1.3802 host 128.31.27.33 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on p3p1.3802, link-type EN10MB (Ethernet), capture size 262144 bytes 15:27:18.848713 ARP, Request who-has 128.31.24.1 tell 128.31.27.33, length 28 15:27:18.851510 ARP, Reply 128.31.24.1 is-at 54:1e:56:d9:6a:c0, length 46 15:27:19.851643 ARP, Request who-has 128.31.24.1 tell 128.31.27.33, length 28 15:27:19.852550 ARP, Reply 128.31.24.1 is-at 54:1e:56:d9:6a:c0, length 46 15:27:20.853643 ARP, Request who-has 128.31.24.1 tell 128.31.27.33, length 28 15:27:20.854105 ARP, Reply 128.31.24.1 is-at 54:1e:56:d9:6a:c0, length 46 It looks like the ARP replies aren't getting delivered to the namespace (and a tcpdump on qg-824a6859-ae inside the namespace confirms that).
At this point, `dump-flows br-ex` shows: [root@neu-19-11-nc1 ~]# ovs-ofctl dump-flows br-ex cookie=0x387b6837449cb19f, duration=8004.916s, table=0, n_packets=70, n_bytes=3780, priority=4,in_port="phy-br-ex",dl_vlan=11 actions=strip_vlan,NORMAL cookie=0xe27a7150f56cfc50, duration=8004.869s, table=0, n_packets=0, n_bytes=0, priority=4,in_port="phy-br-ex",dl_vlan=12 actions=strip_vlan,NORMAL
It's because br-ex is missing the final rule: priority=0 actions=NORMAL If I manually add that: ovs-ofctl add-flow br-ex priority=0,actions=NORMAL Then I am able to successfully ping out from the namespace: [root@neu-19-11-nc1 ~]# ip netns exec qrouter-4cb73b3c-d4d1-4add-9191-3d60e3abb0f7 ping 128.31.24.1 PING 128.31.24.1 (128.31.24.1) 56(84) bytes of data. 64 bytes from 128.31.24.1: icmp_seq=1 ttl=64 time=26.6 ms 64 bytes from 128.31.24.1: icmp_seq=2 ttl=64 time=0.572 ms
haleyb asked if restarting the ovs agent would restore the rule. I cleared the rules on br-ex like this: ovs-vsctl del-flows br-ex And then restarted the ovs agent: docker restart neutron_ovs_agent And after that the rules were not restored: [root@neu-19-11-nc1 ~]# ovs-ofctl dump-flows br-ex [root@neu-19-11-nc1 ~]#
(In that previous comment, it should read "ovs-ofctl del-flows br-ex")
Created attachment 1510177 [details] openvswitch-agent.log after the restart I've attached the contents of /var/log/containers/neutron/openvswitch-agent.log
Hello Lars: The physical bridges are reconfigured when a new port is added. Every OVS agent should receive an event informing about this new change. Then, during the RPC loop, the following nested call is done: - neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron.agent:OVSNeutronAgent --> process_network_ports --> treat_vif_port --> port_bound --> provision_loval_vlan --> (net type == FLAT [1]) phys_br.provision_local_vlan. Is there when a new set of rules is applied. In shake of clarity, I'll write the ovs_ofctl code: table_id = constants.LOCAL_VLAN_TRANSLATION if distributed else 0 if segmentation_id is None: self.add_flow(table=table_id, priority=4, in_port=port, dl_vlan=lvid, actions="strip_vlan,normal") else: self.add_flow(table=table_id, priority=4, in_port=port, dl_vlan=lvid, actions="mod_vlan_vid:%s,normal" % segmentation_id) So by default, the br-ex rules will have the default "drop" rule. Once you add a new port (to be precise, once you bind a port to the integration bridge), the agent will be informed and will create the rules in the physical bridges. Can you please add a new port and see what the rules are? [1] As far as I see in the flows, your internal network is flat.
Rodolfo, I've created a new vm, but I wouldn't expect that to update rules on br-ex, since the vm isn't connected directly to the external network. In any case, it didn't seem to have any impact. Just for kicks, I also tried: - Creating new tenant network - Connecting that to the router using 'openstack router add subnet' - Creating a new floating ip and binding it to the new vm I also tried resetting the external gateway on the router. Nothing had any effect; br-ex still has no rules since I cleared them manually: [root@neu-19-11-nc1 ~]# ovs-ofctl dump-flows br-ex [root@neu-19-11-nc1 ~]#
While rodolfo was investigating, we somehow managed to reproduce the problem on nc2. I powered off nc1 again and restart the ovs-agent container on nc2, and this seemed to restore network access for people. This time, when nc1 came back up, the flow rules on br-ex seem correct: [root@neu-19-11-nc1 ~]# ovs-ofctl dump-flows br-ex cookie=0xf2b17fdd4638f215, duration=220.728s, table=0, n_packets=9, n_bytes=546, priority=4,in_port="phy-br-ex",dl_vlan=12 actions=strip_vlan,NORMAL cookie=0xeb7d6f5281b073d5, duration=220.679s, table=0, n_packets=0, n_bytes=0, priority=4,in_port="phy-br-ex",dl_vlan=13 actions=strip_vlan,NORMAL cookie=0xf2b17fdd4638f215, duration=234.111s, table=0, n_packets=793, n_bytes=47678, priority=2,in_port="phy-br-ex" actions=drop cookie=0xf2b17fdd4638f215, duration=234.128s, table=0, n_packets=3538, n_bytes=213749, priority=0 actions=NORMAL
Hello Lars: Can I access again to your system? I've been testing in a development environment. Every time I restart an OVS agent, during the first loop, all ports in the switch are treated. In this first loop, the polling_manager informs about all the ports present in the switch. In this process, the VLAN segments are provisioned in all bridges, br-int and physical. I've tested several times and I always see the flows created in the physical bridge. Did you stop OVS/restart in any network controller? Do you have different provider networks in each network controller? Now you have nc2 and nc1 online again, is that correct? If you: - stop the agent in any nc - delete the OF rules - start again the agent --> do you have the OF rules restored? Regards.
Created attachment 1511862 [details] ovs-agent log Rodolfo, Since these systems are now live, our ability to restart services is somewhat constrained. However, due the failures last week, we have all routers running on nc2 right now, and a single test router live on nc1. That means I should be able to muck around on nc1 without causing an interruption. If I stop the agent on nc1: # docker stop neutron_ovs_agent Delete the OF rules: # ovs-ofctl del-flows br-ex # ovs-ofctl dump-flows br-ex # Restart the agent: # docker start neutron_ovs_agent And wait a bit, I end up with no flows on br-ex: # ovs-ofctl dump-flows br-ex # I've attached the resulting openvswitch-agent.log
Hello Lars: I think I missed one step: to reboot the openvswitch service. When OVS agent detects the OVS has been restarted, tries to provision again the local VLAN [1] and then is when the phys bridge OFs are provisioned again [2]. Notice "provisioning_needed" in [1] will be True if the OVS has been restarted. I don't know why the OF rules in phys bridge (br-ex) were deleted or not set the first time, but at least you'll have those OF rules set rebooting the OVS service. Can you do this? Thank you in advance. [1] https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L822-L824 [2] https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L652
If I start with no flows on br-ex: # ovs-ofctl dump-flows br-ex # And restart openvswitch: # systemctl restart openvswitch Then the VLAN-specific rules appear to get re-recreated on the bridge: # ovs-ofctl dump-flows br-ex cookie=0xfd5505371947e8d6, duration=22.335s, table=0, n_packets=0, n_bytes=0, priority=4,in_port="phy-br-ex",dl_vlan=17 actions=strip_vlan,NORMAL cookie=0xef1d5f842d3c95c4, duration=22.306s, table=0, n_packets=0, n_bytes=0, priority=4,in_port="phy-br-ex",dl_vlan=16 actions=strip_vlan,NORMAL However, the bridge is still missing the final rule that would permit inbound traffic to work correctly: cookie=0xe1da5be8a5448d27, duration=596287.858s, table=0, n_packets=22959775, n_bytes=12405178755, priority=0 actions=NORMAL
Aftrer also restarting the neutron_ovs_agent container, the rules on br-ex still look the same.
Rodolfo found this: https://ask.openstack.org/en/question/110544/ovs-connection-to-local-of-controller-failed-when-having-two-flat-provider-networks/ "This turned out to be an issue with Ryu (native mode) OF controller. When configured with multiple provider flat network, the controller seems rejecting connection from the two external br's. So I switch the of_interface mode back to ovs-ofctl, the br connections works and flows are what I would expect."
According to [1], Ryu native controller is not working well with two physical bridges. We switched back to ovs-ofctl controller and it's working well. This problem is also related to the problem seen with the ovs-ofctl connection reset. Both bridges, br-sahara and br-ex, where resetting the connection every second. Reviewing the ovs-agent logs, I can see there are four bridges: - br-in - br-tun - br-ex - br-sahara br-ex and br-sahara have the same datapath_id. This was reported in [2]. [1] https://ask.openstack.org/en/question/110544/ovs-connection-to-local-of-controller-failed-when-having-two-flat-provider-networks/ [2] https://bugs.launchpad.net/neutron/+bug/1697243
*** Bug 1654836 has been marked as a duplicate of this bug. ***
The workaround for us was to include the following in our deployment configuration: NetworkerDeployedServerExtraConfig: neutron::agents::ml2::ovs::of_interface: ovs-ofctl This causes the openvswitch-agent to use ovs-ofctl for controlling ovs, rather than the native interface via python-ryu.
Patch merged upstream: https://review.openstack.org/#/c/587244/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0935