Description of problem: On environment installed by downstream CI job with default settings it is not possible to ping internet ip addresses from a VM with floating ip. When running tcpdump on ens5 interface on all overcloud nodes no icmp packets are captured in case VM sends an icmp packet to any internet address. Packets to external network (10.0.0.0/24) addresses are captured and replied successfully as well as packets to other overcloud networks like management (172.16.0.0/24) or ctlplane (192.168.24.0/24) With ML2OVS there are no issues with pinging internet addresses from a VM with FIP. Version-Release number of selected component (if applicable): 16.1-RHEL-8/RHOS-16.1-RHEL-8-20200505.n.0 puppet-ovn-15.4.1-0.20200311045730.192ac4e.el8ost.noarch python3-networking-ovn-7.1.1-0.20200505113427.071bd83.el8ost.noarch ovn2.13-2.13.0-18.el8fdp.x86_64 openvswitch2.13-2.13.0-18.el8fdp.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create external network, internal network, router, connect both networks to the router. Create also keypair, security group with rules allowing icmp, ssh. 2. Launch an instance on the internal network using created keypair and security group. Add floating ip to the instance 3. Login into the instance via ssh to floating ip and ping 10.0.0.1 then 8.8.8.8 Actual results: Ping to 10.0.0.1 works - OK, ping to 8.8.8.8 does not work - not OK Expected results: Ping to 10.0.0.1 works, ping to 8.8.8.8 also works, packets are going out through compute external interface Additional info: In case VM does not have a floating ip internet addresses can be pinged, traffic is going out through external interface of controller node - OK Environment is available for investigation by first request.
The reason there is that router sends out an ARP request for the destination instead of routing it to the external network: 15:14:52.814735 fa:16:3e:4f:6a:b2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 8.8.8.8 tell 10.0.0.223, length 28 I'm looking at the configuration to determine if this is a new bug in core OVN.
Just for the record: it looks like OSP16 suffers with the same issue, so this is not a regression from the 16.
Thanks to Dumitru for finding out the problem. There is a regression introduced in ovn2.11-2.11.1-35 by https://github.com/ovn-org/ovn/commit/c0bf32d72f8b893bbe3cb64912b0fd259d71555f OVN router sets next hop as the destination IP instead of the gateway and that explains the ARPs for 8.8.8.8 in comment 1, -34 version works fine.
the topo like this: vm2 | vm1---------ls-------router1-----public-------external network | run the script as below: ovn-nbctl ls-add network1 ovn-nbctl lsp-add network1 vm1 ovn-nbctl lsp-set-addresses vm1 "40:44:00:00:00:01 192.168.0.11" ovn-nbctl lsp-add network1 vm2 ovn-nbctl lsp-set-addresses vm2 "40:44:00:00:00:02 192.168.0.12" ovn-nbctl ls-add network2 ovn-nbctl lsp-add network2 vm3 ovn-nbctl lsp-set-addresses vm3 "40:44:00:00:00:03 192.168.1.13" ovn-nbctl ls-add public ovn-nbctl lsp-add public public-localnet ovn-nbctl lsp-set-type public-localnet localnet ovn-nbctl lsp-set-addresses public-localnet unknown ovn-nbctl lsp-set-options public-localnet network_name=external ovs-vsctl add-br br-labNet ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=external:br-labNet ovs-vsctl add-port br-labNet ha_veth0 ip link set br-labNet up ovn-nbctl lr-add router1 ovn-nbctl lrp-add router1 router1-net1 40:44:00:00:00:04 192.168.0.1/24 ovn-nbctl lsp-add network1 net1-router1 ovn-nbctl lsp-set-type net1-router1 router ovn-nbctl lsp-set-addresses net1-router1 router ovn-nbctl lsp-set-options net1-router1 router-port=router1-net1 ovn-nbctl lrp-add router1 router1-net2 40:44:00:00:00:05 192.168.1.1/24 ovn-nbctl lsp-add network2 net2-router1 ovn-nbctl lsp-set-type net2-router1 router ovn-nbctl lsp-set-addresses net2-router1 router ovn-nbctl lsp-set-options net2-router1 router-port=router1-net2 ovn-nbctl lrp-add router1 router1-public 40:44:00:00:00:06 172.24.4.1/24 ovn-nbctl lsp-add public public-router1 ovn-nbctl lsp-set-type public-router1 router ovn-nbctl lsp-set-addresses public-router1 router ovn-nbctl lsp-set-options public-router1 router-port=router1-public ovn-nbctl --id=@gc0 create Gateway_Chassis name=public-gw1 chassis_name=hv1 priority=20 -- --id=@gc1 create Gateway_Chassis name=public-gw2 chassis_name=hv0 priority=10 -- set Logical_Router_Port router1-public 'gateway_chassis=[@gc0,@gc1]' ovn-nbctl lr-nat-add router1 snat 172.24.4.1 192.168.0.0/24 ovn-nbctl lr-nat-add router1 snat 172.24.4.1 192.168.1.0/24 ovn-nbctl lr-nat-add router1 dnat_and_snat 172.24.4.100 192.168.0.11 vm1 40:44:00:00:00:07 ovn-nbctl lr-nat-add router1 dnat_and_snat 172.24.4.101 192.168.0.12 vm2 40:44:00:00:00:08 ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ip netns add vm1 ip link set vm1 netns vm1 ip netns exec vm1 ip link set lo up ip netns exec vm1 ip link set vm1 up ip netns exec vm1 ip link set vm1 address 40:44:00:00:00:01 ip netns exec vm1 ip addr add 192.168.0.11/24 dev vm1 ip netns exec vm1 ip route add default via 192.168.0.1 dev vm1 ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal ip netns add vm2 ip link set vm2 netns vm2 ip netns exec vm2 ip link set lo up ip netns exec vm2 ip link set vm2 up ip netns exec vm2 ip link set vm2 address 40:44:00:00:00:02 ip netns exec vm2 ip addr add 192.168.0.12/24 dev vm2 ip netns exec vm2 ip route add default via 192.168.0.1 dev vm2 ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 ip netns add external ip link add ha_veth0 type veth peer name ha_veth0_p netns external ip netns exec external ip link set lo up ip netns exec external ip link set ha_veth0_p up ip link set ha_veth0 up ip netns exec external ip addr add 172.24.4.2/24 dev ha_veth0_p ip link add veth0 type veth peer name veth0_peer ip link set up dev veth0 ip link set veth0_peer netns external ip netns exec external ip link set up dev veth0_peer ip netns exec external ip addr add 192.168.100.1/24 dev veth0_peer ip addr add 192.168.100.2/24 dev veth0 ip route add 172.24.4.0/24 via 192.168.100.1 ip netns exec external ip route add default via 172.24.4.1 ip netns exec external sysctl net.ipv4.ip_forward=1 ovn-nbctl lr-route-add router1 "192.168.100.0/24" 172.24.4.2 reproduced on version 35 # rpm -qa |grep ovn ovn2.11-2.11.1-35.el7fdp.x86_64 ovn2.11-central-2.11.1-35.el7fdp.x86_64 ovn2.11-host-2.11.1-35.el7fdp.x86_64 ping 172.24.4.101 -c 3 PING 172.24.4.101 (172.24.4.101) 56(84) bytes of data. --- 172.24.4.101 ping statistics --- 3 packets transmitted, 100 received, 100% packet loss verified on verion: # rpm -qa |grep ovn ovn2.11-2.11.1-47.el7fdp.x86_64 ovn2.11-central-2.11.1-47.el7fdp.x86_64 ovn2.11-host-2.11.1-47.el7fdp.x86_64 :: [ 23:36:25 ] :: [ BEGIN ] :: Running 'ping 172.24.4.101 -c 3' PING 172.24.4.101 (172.24.4.101) 56(84) bytes of data. 64 bytes from 172.24.4.101: icmp_seq=1 ttl=62 time=0.431 ms 64 bytes from 172.24.4.101: icmp_seq=2 ttl=62 time=0.034 ms 64 bytes from 172.24.4.101: icmp_seq=3 ttl=62 time=0.029 ms --- 172.24.4.101 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.029/0.164/0.431/0.188 ms
The build
The build has been updated to RHEL 8 build. It has been verified by the FD QE team and is ready to be deployed. Hotfix request has been approved. The neutron team (Daniel Alvarez Sanchez/Maciej Jozefczyk) will test the hotfix in their OSP 16.0.2 environment and will then provide the complete package for GSS to build the updated containers. They will then upload to this BZ. This should be completed by 6/23. David Vallee Delisle from GSS will work on getting the containers built.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2942