1876174 – OVN Gateway Router packet-ins all IP packets destined to it

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1876174 - OVN Gateway Router packet-ins all IP packets destined to it

Summary: OVN Gateway Router packet-ins all IP packets destined to it

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	OVN
Sub Component:
Version:	RHEL 8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Dumitru Ceara
QA Contact:	Fei Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1872470 1886675
TreeView+	depends on / blocked

Reported:	2020-09-06 03:17 UTC by Tim Rozet
Modified:	2020-12-01 15:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:	ovn2.13-20.09.0-1.el7fdp ovn2.13-20.09.0-1.el8fdp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-01 15:07:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:5308	0	None	None	None	2020-12-01 15:07:50 UTC

Description Tim Rozet 2020-09-06 03:17:44 UTC

Description of problem:
In OpenShift we use a "shared" gateway mode, where OVN and the host both share the same mac address and ip address. From a bridge perspective this would look something like:

          host (10.0.0.1)
          |
eth0----br-ex----br-int

From a logical topology perspective:

eth0-----br-ex----OVN GR---join sw---OVN DR---ovn node switch------pods


From an ovn-k8s perspective we can conntrack traffic egress traffic from the node from host OVN, so that reply traffic is only directed to the right place. However, when new ingress traffic comes in from eth0 we have to send the traffic to both the host and OVN, since we don't know who is supposed to get the traffic. The problem is OVN is doing a PACKET_IN on every single packet that comes into it that is IP and destined to its ip+mac. This causes packet_in overflow in OVS. Flow and lflow:

table=17(lr_in_arp_request  ), priority=100  , match=(eth.dst == 00:00:00:00:00:00 && ip4), action=(arp { eth.dst = ff:ff:ff:ff:ff:ff; arp.spa = reg1; arp.tpa = reg0; arp.op = 1; output; };)
    table=25 ip,metadata=0x1,dl_dst=00:00:00:00:00:00 actions=controller(userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.00.1c.00.18.00.20.00.40.00.00.00.00.00.01.de.10.80.00.2c.04.00.00.00.00.00.1c.00.18.00.20.00.60.00.00.00        .00.00.01.de.10.80.00.2e.04.00.00.00.00.00.19.00.10.80.00.2a.02.00.01.00.00.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.20.00.00.00)

lr_in_arp_request I'm guessing should be sending arp requests for an unknown neighbor, except the destination is this router, so he should just drop the packet.

Full ofproto and ovs trace here:
https://gist.github.com/trozet/de1e1ffd0311fefc720f616e3f73fd6a

Additionally the ovn-trace does not match the ofproto trace. Note ofproto trace shows packet_in, but ovn-trace ends at lr_in_unsnat

Comment 1 Dumitru Ceara 2020-09-07 11:34:27 UTC

I replicated the issue locally. ovn-northd adds flows in stage IN_IP_INPUT to drop IP packets destined to its owned IP addresses *except* if those IPs are used in SNAT rules or with options:lb_force_snat_ip.

ovn-k8s uses options:lb_force_snat_ip=GW_RP_IP so all traffic destined to GW_RP_IP will advance stage IN_IP_INPUT as it might need to be "unSNATed".

I'll investigate more to see how we can drop this kind of traffic further down the pipeline.

Comment 2 Dumitru Ceara 2020-09-07 16:04:45 UTC

Fix sent upstream for review:
http://patchwork.ozlabs.org/project/ovn/patch/1599494618-27057-1-git-send-email-dceara@redhat.com/

Comment 4 Jianlin Shi 2020-11-04 00:53:32 UTC

Hi Dumitru,

should this bug be added into errata for 20.I

Comment 5 Dumitru Ceara 2020-11-04 09:40:20 UTC

(In reply to Jianlin Shi from comment #4)
> Hi Dumitru,
> 
> should this bug be added into errata for 20.I

Hi Jianlin,

Yes, this should be added to the 20.I errata.

Thanks,
Dumitru

Comment 8 Fei Liu 2020-11-20 07:01:17 UTC

Steps:

    #setup ovn
       systemctl start openvswitch
       systemctl start ovn-northd
       ovn-sbctl set-connection ptcp:6642
       ovn-nbctl set-connection ptcp:6641
       ovs-vsctl set Open_vSwitch . external-ids:system-id=hv1
       ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=tcp:127.0.0.1:6642
       ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-type=geneve
       ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=127.0.0.1
       systemctl start ovn-controller

    #create swtich and router
    ovn-nbctl lr-add r1 -- set logical_router r1 options:chassis=hv1
    ovn-nbctl ls-add s1
     
    # Connnect r1 to s1.
    ovn-nbctl lrp-add r1 lrp-r1-s1 00:00:00:00:01:01 10.0.1.1/24
    ovn-nbctl lsp-add s1 lsp-s1-r1 -- set Logical_Switch_Port lsp-s1-r1 type=router \
        options:router-port=lrp-r1-s1 addresses=router
     
    # Create logical port p1 in s1
    ovn-nbctl lsp-add s1 p1 \
    -- lsp-set-addresses p1 "f0:00:00:00:01:02 10.0.1.2"
     
    # Add an OVS interface and bind it to "p1" by setting external_ids:iface-id=p1  
    ip netns add vm1
    ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
    ip link set vm1 netns vm1
    ip netns exec vm1 ip link set vm1 address f0:00:00:00:01:02 
    ip netns exec vm1 ip addr add 10.0.1.2/24 dev vm1
    ip netns exec vm1 ip link set vm1 up
    ovs-vsctl set Interface vm1 external_ids:iface-id=p1
     
    ovn-nbctl set logical_router r1 options:lb_force_snat_ip=10.0.1.1
    ovn-nbctl --wait=hv sync
     
    # Send a UDP traffic from p1 to dest IP 10.0.1.1
     
    # Check that:
    # ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0   -c 

reproduce on ovn2.13-20.06.2-11.el8fdp.x86_64
# rpm -qa|grep ovn
ovn2.13-central-20.06.2-11.el8fdp.x86_64
ovn2.13-20.06.2-11.el8fdp.x86_64
ovn2.13-host-20.06.2-11.el8fdp.x86_6

#after send udp traffic, check that
[root@dell-per740-11 ~]# ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0   -c
1

verified on ovn2.13-20.09.0-12.el8fdp.x86_64
[root@dell-per740-17 ~]# rpm -qa|grep ovn
ovn2.13-20.09.0-12.el8fdp.x86_64
ovn2.13-central-20.09.0-12.el8fdp.x86_64
ovn2.13-host-20.09.0-12.el8fdp.x86_64
#after send udp traffic, check that
[root@dell-per740-17 ~]# ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c
0

Comment 9 Fei Liu 2020-11-20 10:37:24 UTC

Used the reproducer in commnt8 to verify on version ovn2.13-20.09.0-12.el7fdp

[root@dell-per740-17 ~]# rpm -qa|grep ovn
ovn2.13-central-20.09.0-12.el7fdp.x86_64
ovn2.13-20.09.0-12.el7fdp.x86_64
ovn2.13-host-20.09.0-12.el7fdp.x86_64

#after send udp traffic, check that
[root@dell-per740-17 ~]# ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c
0

Comment 11 errata-xmlrpc 2020-12-01 15:07:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5308

Note You need to log in before you can comment on or make changes to this bug.