Description of problem: In OpenShift we use a "shared" gateway mode, where OVN and the host both share the same mac address and ip address. From a bridge perspective this would look something like: host (10.0.0.1) | eth0----br-ex----br-int From a logical topology perspective: eth0-----br-ex----OVN GR---join sw---OVN DR---ovn node switch------pods From an ovn-k8s perspective we can conntrack traffic egress traffic from the node from host OVN, so that reply traffic is only directed to the right place. However, when new ingress traffic comes in from eth0 we have to send the traffic to both the host and OVN, since we don't know who is supposed to get the traffic. The problem is OVN is doing a PACKET_IN on every single packet that comes into it that is IP and destined to its ip+mac. This causes packet_in overflow in OVS. Flow and lflow: table=17(lr_in_arp_request ), priority=100 , match=(eth.dst == 00:00:00:00:00:00 && ip4), action=(arp { eth.dst = ff:ff:ff:ff:ff:ff; arp.spa = reg1; arp.tpa = reg0; arp.op = 1; output; };) table=25 ip,metadata=0x1,dl_dst=00:00:00:00:00:00 actions=controller(userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.00.1c.00.18.00.20.00.40.00.00.00.00.00.01.de.10.80.00.2c.04.00.00.00.00.00.1c.00.18.00.20.00.60.00.00.00 .00.00.01.de.10.80.00.2e.04.00.00.00.00.00.19.00.10.80.00.2a.02.00.01.00.00.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.20.00.00.00) lr_in_arp_request I'm guessing should be sending arp requests for an unknown neighbor, except the destination is this router, so he should just drop the packet. Full ofproto and ovs trace here: https://gist.github.com/trozet/de1e1ffd0311fefc720f616e3f73fd6a Additionally the ovn-trace does not match the ofproto trace. Note ofproto trace shows packet_in, but ovn-trace ends at lr_in_unsnat
I replicated the issue locally. ovn-northd adds flows in stage IN_IP_INPUT to drop IP packets destined to its owned IP addresses *except* if those IPs are used in SNAT rules or with options:lb_force_snat_ip. ovn-k8s uses options:lb_force_snat_ip=GW_RP_IP so all traffic destined to GW_RP_IP will advance stage IN_IP_INPUT as it might need to be "unSNATed". I'll investigate more to see how we can drop this kind of traffic further down the pipeline.
Fix sent upstream for review: http://patchwork.ozlabs.org/project/ovn/patch/1599494618-27057-1-git-send-email-dceara@redhat.com/
Hi Dumitru, should this bug be added into errata for 20.I
(In reply to Jianlin Shi from comment #4) > Hi Dumitru, > > should this bug be added into errata for 20.I Hi Jianlin, Yes, this should be added to the 20.I errata. Thanks, Dumitru
Steps: #setup ovn systemctl start openvswitch systemctl start ovn-northd ovn-sbctl set-connection ptcp:6642 ovn-nbctl set-connection ptcp:6641 ovs-vsctl set Open_vSwitch . external-ids:system-id=hv1 ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:127.0.0.1:6642 ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-type=geneve ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=127.0.0.1 systemctl start ovn-controller #create swtich and router ovn-nbctl lr-add r1 -- set logical_router r1 options:chassis=hv1 ovn-nbctl ls-add s1 # Connnect r1 to s1. ovn-nbctl lrp-add r1 lrp-r1-s1 00:00:00:00:01:01 10.0.1.1/24 ovn-nbctl lsp-add s1 lsp-s1-r1 -- set Logical_Switch_Port lsp-s1-r1 type=router \ options:router-port=lrp-r1-s1 addresses=router # Create logical port p1 in s1 ovn-nbctl lsp-add s1 p1 \ -- lsp-set-addresses p1 "f0:00:00:00:01:02 10.0.1.2" # Add an OVS interface and bind it to "p1" by setting external_ids:iface-id=p1 ip netns add vm1 ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ip link set vm1 netns vm1 ip netns exec vm1 ip link set vm1 address f0:00:00:00:01:02 ip netns exec vm1 ip addr add 10.0.1.2/24 dev vm1 ip netns exec vm1 ip link set vm1 up ovs-vsctl set Interface vm1 external_ids:iface-id=p1 ovn-nbctl set logical_router r1 options:lb_force_snat_ip=10.0.1.1 ovn-nbctl --wait=hv sync # Send a UDP traffic from p1 to dest IP 10.0.1.1 # Check that: # ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c reproduce on ovn2.13-20.06.2-11.el8fdp.x86_64 # rpm -qa|grep ovn ovn2.13-central-20.06.2-11.el8fdp.x86_64 ovn2.13-20.06.2-11.el8fdp.x86_64 ovn2.13-host-20.06.2-11.el8fdp.x86_6 #after send udp traffic, check that [root@dell-per740-11 ~]# ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c 1 verified on ovn2.13-20.09.0-12.el8fdp.x86_64 [root@dell-per740-17 ~]# rpm -qa|grep ovn ovn2.13-20.09.0-12.el8fdp.x86_64 ovn2.13-central-20.09.0-12.el8fdp.x86_64 ovn2.13-host-20.09.0-12.el8fdp.x86_64 #after send udp traffic, check that [root@dell-per740-17 ~]# ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c 0
Used the reproducer in commnt8 to verify on version ovn2.13-20.09.0-12.el7fdp [root@dell-per740-17 ~]# rpm -qa|grep ovn ovn2.13-central-20.09.0-12.el7fdp.x86_64 ovn2.13-20.09.0-12.el7fdp.x86_64 ovn2.13-host-20.09.0-12.el7fdp.x86_64 #after send udp traffic, check that [root@dell-per740-17 ~]# ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5308