Bug 2018179
Summary: | N/S incoming packets with length > 1500 dropped at br-int when MTU=9000 configured | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Eduardo Olivares <eolivare> |
Component: | ovn-2021 | Assignee: | lorenzo bianconi <lorenzo.bianconi> |
Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | FDP 21.I | CC: | ctrautma, ekuris, ihrachys, jiji, kfida, lorenzo.bianconi, mmichels, nusiddiq, rsafrono |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovn-2021-21.09.0-20.el8fdp | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-12-09 15:37:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eduardo Olivares
2021-10-28 12:58:10 UTC
@Eduardo, are you configuring the options:gateway_mtu on the logical router ports at all in this scenario? If so, are you setting that to 9000 or 1500? That's what controls the check_pkt_larger actions that are programmed into OVS. Please check the openflow dump on the chassis where the incoming ping is received, and see if there are check_pkt_larger actions in tables 8 or 23 that correspond with the MTU setting. If there are check_pkt_larger actions present, then are the arguments in there 9000 or 1500? Do they correspond with the options:gateway_mtu on the logical router ports? This will be easier to debug if we could get a copy of the northbound database that was used in this test. Is that available on any of the links in the first comment? Thanks. Yes, the affected networks are provider networks in OSP and have gateway_mtu set. It's set to 1500. We have another bug in the same environment that turned out to also be related to gateway_mtu: https://bugzilla.redhat.com/show_bug.cgi?id=2017424 Looks like this mechanism in 21.09 exposed several issues. Also note some suggestions in the other bug from Numan on Neutron usage of gateway_mtu. Hope this helps somewhat. OK, so this may or may not actually be the same issue as 2017424. I think that you should try setting the gateway_mtu on the logical router port to 9000 instead of 1500 and see if that helps. However, based on the findings in 2017424, this may not actually fix the problem if there is an underlying issue in check_pkt_larger. Or, you may find it works sometimes and not others, based on that conversation. The other thing to check is to try removing the gateway_mtu altogether on the logical router port and seeing if that causes traffic to flow properly. This is the port that issues pings: _uuid : d9c5cfd5-f4d5-4155-b64a-3425f253a3ef addresses : ["fa:16:3e:0d:72:b1 10.100.0.4"] dhcpv4_options : 6b2bcbef-2e9a-4cd9-97c6-9c21de40c872 dhcpv6_options : [] dynamic_addresses : [] enabled : true external_ids : {"neutron:cidrs"="10.100.0.4/28", "neutron:device_id"="8f1ce9ab-c076-4a3a-9997-43057d1803d6", "neutron:device_owner"="compute:nova", "neutron:network_name"=neutron-01c328b6-8966-4e69-9396-b659fee5715b, "neutron:port_fip"="10.218.0.200", "neutron:port_name"="", "neutron:project_id"=c92bb2e2db754f09820cd78ad98526b2, "neutron:revision_number"="4", "neutron:security_group_ids"="726a387c-07ab-4064-a9e0-46b9f3e89ce1"} ha_chassis_group : [] name : "16f538c5-3810-4a60-b1d6-34460bb41f54" options : {mcast_flood_reports="true", requested-chassis=compute-0.redhat.local} parent_name : [] port_security : ["fa:16:3e:0d:72:b1 10.100.0.4"] tag : [] tag_request : [] type : "" up : true @lorenzo.bianconi can you please provide a build/rpm for Neutron to test tested with following steps: server setup: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.40.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.40.25 systemctl restart ovn-controller ovn-nbctl lr-add R1 ovn-nbctl ls-add sw0 ovn-nbctl ls-add public ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 192.168.1.1/24 ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 1000::a/64 \ -- lrp-set-gateway-chassis rp-public hv0 ovs-vsctl add-br br-ext ovs-vsctl add-port br-ext ens4f1 ip link set ens4f1 up ip link set ens4f1 mtu 1500 ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \ type=router options:router-port=rp-sw0 \ -- lsp-set-addresses sw0-rp router ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ type=router options:router-port=rp-public \ -- lsp-set-addresses public-rp router ovs-vsctl add-port br-int sw01 -- set interface sw01 type=internal external_ids:iface-id=sw01 ip netns add sw01 ip link set sw01 netns sw01 ip netns exec sw01 ip link set sw01 address f0:00:00:01:02:03 ip netns exec sw01 ip link set sw01 up ip netns exec sw01 ip link set sw01 mtu 8942 ip netns exec sw01 ip addr add 192.168.1.2/24 dev sw01 ip netns exec sw01 ip route add default via 192.168.1.1 dev sw01 ovn-nbctl lsp-add sw0 sw01 \ -- lsp-set-addresses sw01 "f0:00:00:01:02:03 192.168.1.2" ovs-vsctl add-port br-ext server -- set interface server type=internal ip netns add server ip netns exec server ip link set lo up ip link set server netns server ip netns exec server ip link set server mtu 9000 ip netns exec server ip link set server up ip netns exec server ip addr add 172.16.1.50/24 dev server ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext ovn-nbctl lsp-add public public1 \ -- lsp-set-addresses public1 unknown \ -- lsp-set-type public1 localnet \ -- lsp-set-options public1 network_name=phynet ovn-nbctl lr-nat-add R1 dnat_and_snat 172.16.1.10 192.168.1.2 sw01 00:00:02:01:02:03 ovn-nbctl set logical_router_port rp-public options:gateway_mtu=1500 client setup: systemctl start openvswitch ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:20.0.40.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.40.26 systemctl restart ovn-controller ovs-vsctl add-br br-ext ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext ovs-vsctl add-port br-ext ens3f1 ip link set ens3f1 up ip link set ens3f1 mtu 1500 ping on server: sleep 2 ovn-nbctl --wait=hv sync ip netns exec sw01 ping 172.16.1.50 -c 1 ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1472 ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1476 ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 reproduced on ovn-2021-21.09.0-12.el8: [root@dell-per740-12 bz2018179]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" ovn-2021-central-21.09.0-12.el8fdp.x86_64 ovn-2021-21.09.0-12.el8fdp.x86_64 ovn-2021-host-21.09.0-12.el8fdp.x86_64 python3-openvswitch2.15-2.15.0-51.el8fdp.x86_64 openvswitch2.15-2.15.0-51.el8fdp.x86_64 [root@dell-per740-12 bz2018179]# bash -x ping.sh + ovn-nbctl --wait=hv sync + ip netns exec sw01 ping 172.16.1.50 -c 1 PING 172.16.1.50 (172.16.1.50) 56(84) bytes of data. 64 bytes from 172.16.1.50: icmp_seq=1 ttl=63 time=4.61 ms --- 172.16.1.50 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 4.606/4.606/4.606/0.000 ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1472 PING 172.16.1.50 (172.16.1.50) 1472(1500) bytes of data. 1480 bytes from 172.16.1.50: icmp_seq=1 ttl=63 time=0.737 ms --- 172.16.1.50 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.737/0.737/0.737/0.000 ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1476 PING 172.16.1.50 (172.16.1.50) 1476(1504) bytes of data. 1484 bytes from 172.16.1.50: icmp_seq=1 ttl=63 time=0.129 ms --- 172.16.1.50 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.129/0.129/0.129/0.000 ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 PING 172.16.1.50 (172.16.1.50) 1477(1505) bytes of data. From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1500) --- 172.16.1.50 ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 PING 172.16.1.50 (172.16.1.50) 1477(1505) bytes of data. --- 172.16.1.50 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 PING 172.16.1.50 (172.16.1.50) 1477(1505) bytes of data. --- 172.16.1.50 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 PING 172.16.1.50 (172.16.1.50) 1477(1505) bytes of data. --- 172.16.1.50 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms Verified on ovn-2021-21.09.0-20.el8: [root@dell-per740-12 bz2018179]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" ovn-2021-host-21.09.0-20.el8fdp.x86_64 ovn-2021-central-21.09.0-20.el8fdp.x86_64 python3-openvswitch2.15-2.15.0-51.el8fdp.x86_64 ovn-2021-21.09.0-20.el8fdp.x86_64 openvswitch2.15-2.15.0-51.el8fdp.x86_64 [root@dell-per740-12 bz2018179]# bash -x ping.sh + ovn-nbctl --wait=hv sync + ip netns exec sw01 ping 172.16.1.50 -c 1 PING 172.16.1.50 (172.16.1.50) 56(84) bytes of data. 64 bytes from 172.16.1.50: icmp_seq=1 ttl=63 time=4.31 ms --- 172.16.1.50 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 4.305/4.305/4.305/0.000 ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1472 PING 172.16.1.50 (172.16.1.50) 1472(1500) bytes of data. 1480 bytes from 172.16.1.50: icmp_seq=1 ttl=63 time=0.726 ms --- 172.16.1.50 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.726/0.726/0.726/0.000 ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1476 PING 172.16.1.50 (172.16.1.50) 1476(1504) bytes of data. 1484 bytes from 172.16.1.50: icmp_seq=1 ttl=63 time=0.123 ms --- 172.16.1.50 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.123/0.123/0.123/0.000 ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 PING 172.16.1.50 (172.16.1.50) 1477(1505) bytes of data. From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1500) --- 172.16.1.50 ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 PING 172.16.1.50 (172.16.1.50) 1477(1505) bytes of data. --- 172.16.1.50 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 PING 172.16.1.50 (172.16.1.50) 1477(1505) bytes of data. 1485 bytes from 172.16.1.50: icmp_seq=1 ttl=63 time=1.21 ms --- 172.16.1.50 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.205/1.205/1.205/0.000 ms + ip netns exec sw01 ping 172.16.1.50 -c 1 -s 1477 PING 172.16.1.50 (172.16.1.50) 1477(1505) bytes of data. 1485 bytes from 172.16.1.50: icmp_seq=1 ttl=63 time=0.149 ms --- 172.16.1.50 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.149/0.149/0.149/0.000 ms set VERIFIED per comment 13 https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-16.2_director-rhel-virthost-3cont_2comp_3net-ipv4-geneve-composable-vlan-provider-network/29/testReport/neutron_plugin.tests.scenario.test_multicast/MulticastTestIPv4Common/test_igmp_snooping_same_network_and_unsubscribe_id_9f6cd7af_ca52_4979_89e8_ab7436905712_/ looks like the issue is fixed openvswitch2.15.x86_64 2.15.0-42.el8fdp @download-node-02.eng.bos.redhat.com_rhel-8_nightly_updates_FDP_latest-FDP-8-RHEL-8_compose_Server_x86_64_os ovn-2021.x86_64 21.09.0-20.el8fdp @download-node-02.eng.bos.redhat.com_rhel-8_nightly_updates_FDP_latest-FDP-8-RHEL-8_compose_Server_x86_64_os there few other failures but I think they are related to bz : https://bugzilla.redhat.com/show_bug.cgi?id=2018365 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5059 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |