The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1894478 - [OVN2.13] arp flooding and reaching limit: ofproto_dpif_xlate|WARN|over 4096 resubmit
Summary: [OVN2.13] arp flooding and reaching limit: ofproto_dpif_xlate|WARN|over 4096 ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.13
Version: FDP 20.H
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On: 1894048
Blocks: 1894871
TreeView+ depends on / blocked
 
Reported: 2020-11-04 11:06 UTC by Dumitru Ceara
Modified: 2020-12-01 15:07 UTC (History)
10 users (show)

Fixed In Version: ovn2.13-20.09.0-10.el7fdp ovn2.13-20.09.0-10.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1894048
: 1894871 (view as bug list)
Environment:
Last Closed: 2020-12-01 15:07:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:5308 0 None None None 2020-12-01 15:07:50 UTC

Comment 4 Jianlin Shi 2020-11-11 09:58:04 UTC
reproduced on 20.09.0-9 with reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=1894048#c7:

[root@dell-per740-12 bz1776712_broadcast_limit]# rpm -qa | grep -E "ovn|openvswitch"                  
openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch
ovn2.13-central-20.09.0-9.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-common-1.0-13.noarch                                         
openvswitch2.13-2.13.0-59.el7fdp.x86_64
ovn2.13-20.09.0-9.el7fdp.x86_64
ovn2.13-host-20.09.0-9.el7fdp.x86_64                                                                  
kernel-kernel-networking-openvswitch-ovn-regression-bz1776712_broadcast_limit-1.0-1.noarch            
python3-openvswitch2.13-2.13.0-59.el7fdp.x86_64

:: [ 04:47:53 ] :: [  BEGIN   ] :: Running 'ovn-nbctl --wait=hv sync'
:: [ 04:48:03 ] :: [   PASS   ] :: Command 'ovn-nbctl --wait=hv sync' (Expected 0, got 0)
:: [ 04:48:03 ] :: [  BEGIN   ] :: Running 'ip netns exec vm1 ping 172.16.0.102 -c 1'
PING 172.16.0.102 (172.16.0.102) 56(84) bytes of data.                                                

--- 172.16.0.102 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms                                         

:: [ 04:48:13 ] :: [   FAIL   ] :: Command 'ip netns exec vm1 ping 172.16.0.102 -c 1' (Expected 0, got 1)
:: [ 04:48:13 ] :: [  BEGIN   ] :: Running 'ip netns exec vm1 ping 172.16.0.100 -c 1'
PING 172.16.0.100 (172.16.0.100) 56(84) bytes of data.
                                                                                                      
--- 172.16.0.100 ping statistics ---                                                                  
1 packets transmitted, 0 received, 100% packet loss, time 0ms
                                                                                                      
:: [ 04:48:23 ] :: [   FAIL   ] :: Command 'ip netns exec vm1 ping 172.16.0.100 -c 1' (Expected 0, got 1)
<=== ping failed

[root@dell-per740-12 bz1776712_broadcast_limit]# grep 4096 /var/log/openvswitch/ovs-vswitchd.log
2020-11-11T09:47:59.829Z|00054|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=00:de:ad:ff:02:71,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=172.16.2.71,arp_tpa=172.16.2.71,arp_op=1,arp_sha=00:de:ad:ff:02:71,arp_tha=00:00:00:00:00:00 
2020-11-11T09:47:59.834Z|00056|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=00:de:ad:ff:00:43,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=172.16.0.43,arp_tpa=172.16.0.43,arp_op=1,arp_sha=00:de:ad:ff:00:43,arp_tha=00:00:00:00:00:00 
2020-11-11T09:47:59.838Z|00058|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=00:de:ad:ff:02:69,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=172.16.2.69,arp_tpa=172.16.2.69,arp_op=1,arp_sha=00:de:ad:ff:02:69,arp_tha=00:00:00:00:00:00 
2020-11-11T09:47:59.841Z|00060|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=00:de:ad:ff:01:39,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=172.16.1.39,arp_tpa=172.16.1.39,arp_op=1,arp_sha=00:de:ad:ff:01:39,arp_tha=00:00:00:00:00:00
2020-11-11T09:47:59.845Z|00062|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=00:de:ad:ff:02:74,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=172.16.2.74,arp_tpa=172.16.2.74,arp_op=1,arp_sha=00:de:ad:ff:02:74,arp_tha=00:00:00:00:00:00

<=== 4k resubmit WARN

result on 20.09.0-10:

[root@dell-per740-12 bz1776712_broadcast_limit]# rpm -qa | grep -E "ovn|openvswitch"
openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch
ovn2.13-host-20.09.0-10.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-common-1.0-13.noarch
openvswitch2.13-2.13.0-59.el7fdp.x86_64
ovn2.13-central-20.09.0-10.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-regression-bz1776712_broadcast_limit-1.0-1.noarch
python3-openvswitch2.13-2.13.0-59.el7fdp.x86_64
ovn2.13-20.09.0-10.el7fdp.x86_6

:: [ 04:55:06 ] :: [  BEGIN   ] :: Running 'ovn-nbctl --wait=hv sync'
:: [ 04:56:06 ] :: [   PASS   ] :: Command 'ovn-nbctl --wait=hv sync' (Expected 0, got 0)
:: [ 04:56:06 ] :: [  BEGIN   ] :: Running 'ip netns exec vm1 ping 172.16.0.102 -c 1'
PING 172.16.0.102 (172.16.0.102) 56(84) bytes of data.                                                
64 bytes from 172.16.0.102: icmp_seq=1 ttl=62 time=56.3 ms                                            

--- 172.16.0.102 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 56.306/56.306/56.306/0.000 ms                                                  
:: [ 04:56:06 ] :: [   PASS   ] :: Command 'ip netns exec vm1 ping 172.16.0.102 -c 1' (Expected 0, got 0)
:: [ 04:56:06 ] :: [  BEGIN   ] :: Running 'ip netns exec vm1 ping 172.16.0.100 -c 1'
PING 172.16.0.100 (172.16.0.100) 56(84) bytes of data.
64 bytes from 172.16.0.100: icmp_seq=1 ttl=63 time=14.3 ms                                            

--- 172.16.0.100 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 14.312/14.312/14.312/0.000 ms
:: [ 04:56:06 ] :: [   PASS   ] :: Command 'ip netns exec vm1 ping 172.16.0.100 -c 1' (Expected 0, got 0)
<=== passed

[root@dell-per740-12 bz1776712_broadcast_limit]# grep 4096 /var/log/openvswitch/ovs-vswitchd.log      
2020-11-11T09:55:23.488Z|00055|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=00:de:ad:ff:03:03,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=172.16.3.3,arp_tpa=172.16.3.3,arp_op=1,arp_sha=00:de:ad:ff:03:03,arp_tha=00:00:00:00:00:00

<=== still get 4k resubmit WARN message

Comment 5 Jianlin Shi 2020-11-11 10:16:37 UTC
on rhel8 version:

[root@wsfd-advnetlab19 bz1776712_broadcast_limit]# rpm -qa | grep -E "openvswitch|ovn"                
kernel-kernel-networking-openvswitch-ovn-scenario-1.0-12.noarch
ovn2.13-central-20.09.0-10.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-common-1.0-13.noarch                                         
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch
kernel-kernel-networking-openvswitch-ovn-regression-bz1775778_kernel_unknown_flow-1.0-2.noarch
openvswitch2.13-2.13.0-67.el8fdp.x86_64
ovn2.13-host-20.09.0-10.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-basic-1.0-31.noarch                                          
openvswitch2.11-2.11.3-73.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-regression-bz1776712_broadcast_limit-1.0-1.noarch            
python3-openvswitch2.13-2.13.0-67.el8fdp.x86_64
ovn2.13-20.09.0-10.el8fdp.x86_64

:: [ 05:13:26 ] :: [  BEGIN   ] :: Running 'ovn-nbctl --wait=hv sync'
:: [ 05:14:20 ] :: [   PASS   ] :: Command 'ovn-nbctl --wait=hv sync' (Expected 0, got 0)
:: [ 05:14:20 ] :: [  BEGIN   ] :: Running 'ip netns exec vm1 ping 172.16.0.102 -c 1'
PING 172.16.0.102 (172.16.0.102) 56(84) bytes of data.
64 bytes from 172.16.0.102: icmp_seq=1 ttl=62 time=39.4 ms                                            

--- 172.16.0.102 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 39.378/39.378/39.378/0.000 ms
:: [ 05:14:20 ] :: [   PASS   ] :: Command 'ip netns exec vm1 ping 172.16.0.102 -c 1' (Expected 0, got 0)
:: [ 05:14:20 ] :: [  BEGIN   ] :: Running 'ip netns exec vm1 ping 172.16.0.100 -c 1'
PING 172.16.0.100 (172.16.0.100) 56(84) bytes of data.
64 bytes from 172.16.0.100: icmp_seq=1 ttl=63 time=17.2 ms
                                                                                                      
--- 172.16.0.100 ping statistics ---                                                                  
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 17.218/17.218/17.218/0.000 ms
:: [ 05:14:20 ] :: [   PASS   ] :: Command 'ip netns exec vm1 ping 172.16.0.100 -c 1' (Expected 0, got 0)

<=== ping passed

[root@wsfd-advnetlab19 ~]# grep 4096 /var/log/openvswitch/ovs-vswitchd.log                            
2020-11-11T10:13:39.047Z|00001|ofproto_dpif_xlate(handler70)|WARN|over 4096 resubmit actions on bridge br-int while processing icmp6,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::200:ff:fe00:1,ipv6_dst=ff02::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
2020-11-11T10:13:40.068Z|00001|ofproto_dpif_xlate(handler56)|WARN|over 4096 resubmit actions on bridge br-int while processing icmp6,in_port=LOCAL,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::c50:caff:feed:5840,ipv6_dst=ff02::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
2020-11-11T10:13:41.361Z|00049|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=00:de:ad:ff:03:03,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=172.16.3.3,arp_tpa=172.16.3.3,arp_op=1,arp_sha=00:de:ad:ff:03:03,arp_tha=00:00:00:00:00:00   
2020-11-11T10:13:53.893Z|00001|ofproto_dpif_xlate(handler53)|WARN|over 4096 resubmit actions on bridge br-int while processing icmp6,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::200:ff:fe00:1,ipv6_dst=ff02::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
2020-11-11T10:13:57.476Z|00001|ofproto_dpif_xlate(handler50)|WARN|over 4096 resubmit actions on bridge br-int while processing icmp6,in_port=LOCAL,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::c50:caff:feed:5840,ipv6_dst=ff02::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
2020-11-11T10:15:24.007Z|00003|ofproto_dpif_xlate(handler53)|WARN|over 4096 resubmit actions on bridge br-int while processing icmp6,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::200:ff:fe00:1,ipv6_dst=ff02::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0

<== also get 4k resubmit WARN message

Comment 6 Dumitru Ceara 2020-11-11 12:25:50 UTC
(In reply to Jianlin Shi from comment #5)
> on rhel8 version:
[...]
> 
> [root@wsfd-advnetlab19 ~]# grep 4096 /var/log/openvswitch/ovs-vswitchd.log  
> 
> 2020-11-11T10:13:39.047Z|00001|ofproto_dpif_xlate(handler70)|WARN|over 4096
> resubmit actions on bridge br-int while processing
> icmp6,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:00:
> 00:02,ipv6_src=fe80::200:ff:fe00:1,ipv6_dst=ff02::2,ipv6_label=0x00000,
> nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> 2020-11-11T10:13:40.068Z|00001|ofproto_dpif_xlate(handler56)|WARN|over 4096
> resubmit actions on bridge br-int while processing
> icmp6,in_port=LOCAL,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:
> 00:00:02,ipv6_src=fe80::c50:caff:feed:5840,ipv6_dst=ff02::2,
> ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> 2020-11-11T10:13:41.361Z|00049|ofproto_dpif_xlate|WARN|over 4096 resubmit
> actions on bridge br-int while processing
> arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=00:de:ad:ff:03:03,dl_dst=ff:ff:
> ff:ff:ff:ff,arp_spa=172.16.3.3,arp_tpa=172.16.3.3,arp_op=1,arp_sha=00:de:ad:
> ff:03:03,arp_tha=00:00:00:00:00:00   
> 2020-11-11T10:13:53.893Z|00001|ofproto_dpif_xlate(handler53)|WARN|over 4096
> resubmit actions on bridge br-int while processing
> icmp6,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:00:
> 00:02,ipv6_src=fe80::200:ff:fe00:1,ipv6_dst=ff02::2,ipv6_label=0x00000,
> nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> 2020-11-11T10:13:57.476Z|00001|ofproto_dpif_xlate(handler50)|WARN|over 4096
> resubmit actions on bridge br-int while processing
> icmp6,in_port=LOCAL,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:
> 00:00:02,ipv6_src=fe80::c50:caff:feed:5840,ipv6_dst=ff02::2,
> ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> 2020-11-11T10:15:24.007Z|00003|ofproto_dpif_xlate(handler53)|WARN|over 4096
> resubmit actions on bridge br-int while processing
> icmp6,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=33:33:00:00:
> 00:02,ipv6_src=fe80::200:ff:fe00:1,ipv6_dst=ff02::2,ipv6_label=0x00000,
> nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> 
> <== also get 4k resubmit WARN message

The IPv6 RS packets hit the 4k resubmit limit as discussed in https://bugzilla.redhat.com/show_bug.cgi?id=1894048#c9

The logs for ARPs correspond to GARPs originated by ovn-controller for logical router port owned IPs which happen to be sent before all logical flows and MC_FLOOD_L2 groups are installed in OVS.  This is a transient issue and shouldn't happen anymore once all flows get installed.  There's no easy way to delay ovn-controller from sending those GARPs until all flows are installed.  But this also shouldn't impact E-W, S-N traffic.

Once the flows are installed, running an ovn-trace simulating the GARPs we see that we definitely don't do 4k resubmits for GARPs:

# ovn-trace public 'inport=="public_r300" && eth.src==00:de:ad:ff:03:03 && eth.dst==ff:ff:ff:ff:ff:ff && arp && arp.op==1 &&  arp.spa == 172.16.3.3 && arp.tpa == 172.16.3.3 && arp.sha == 00:de:ad:ff:03:03'
# arp,reg14=0xfd,vlan_tci=0x0000,dl_src=00:de:ad:ff:03:03,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=172.16.3.3,arp_tpa=172.16.3.3,arp_op=1,arp_sha=00:de:ad:ff:03:03,arp_tha=00:00:00:00:00:00

ingress(dp="public", inport="public_r300")
------------------------------------------
 0. ls_in_port_sec_l2 (ovn-northd.c:4874): inport == "public_r300", priority 50, uuid 3467239b
    next;
 6. ls_in_acl_hint (ovn-northd.c:5326): !ct.trk, priority 5, uuid c938fca3
    reg0[8] = 1;
    reg0[9] = 1;
    next;
14. ls_in_arp_rsp (ovn-northd.c:6978): arp.tpa == 172.16.3.3 && arp.op == 1 && inport == "public_r300", priority 100, uuid d0eb7607
    next;
20. ls_in_l2_lkup (ovn-northd.c:6462): flags[1] == 0 && arp.op == 1 && arp.tpa == { 172.16.3.103, 172.16.3.3}, priority 80, uuid 6c02da99
    clone { outport = "public_r300"; output; };

    clone
    -----
            outport = "public_r300";
            output;
            /* omitting output because inport == outport && !flags.loopback */
    outport = "_MC_flood_l2";
    output;

multicast(dp="public", mcgroup="_MC_flood_l2")
----------------------------------------------

    egress(dp="public", inport="public_r300", outport="ln_p1")
    ----------------------------------------------------------
         4. ls_out_acl_hint (ovn-northd.c:5326): !ct.trk, priority 5, uuid 6faad367
            reg0[8] = 1;
            reg0[9] = 1;
            next;
        10. ls_out_port_sec_l2 (ovn-northd.c:4963): eth.mcast, priority 100, uuid ab8e9434
            output;
            /* output to "ln_p1", type "localnet" */

Comment 7 Jianlin Shi 2020-11-12 01:34:23 UTC
set VERIFIED per comment 6

Comment 9 errata-xmlrpc 2020-12-01 15:07:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5308


Note You need to log in before you can comment on or make changes to this bug.