+++ This bug was initially created as a clone of Bug #2099811 +++ This is one issue found while working on https://bugzilla.redhat.com/show_bug.cgi?id=2087604#16 Note: The iperf3 test run between PODs on the same node. Some analysis: The iperf3 test started at: Time: Fri, 13 May 2022 06:22:17 GMT and ran for 600 seconds. I can't see anything out of the ordinary in the OVS logs at this time to give a clue. For example correlating: $ grep -e Fri -e ovs_dp_process_packet dropwatch2.out|grep ovs_dp_process_packet -B 1 Fri May 13 06:22:25 2022 2338 packets dropped at ovs_dp_process_packet I don't see a large increase of flow_mods, (wouldn't we expect that if it was due to the classifier issue that AaronC fixed?): ovs-vswitchd.log-20220414: 2022-04-13T06:22:10.318Z|98838|connmgr|INFO|br-int<->unix#231439: 1 flow_mods 10 s ago (1 adds) 2022-04-13T06:22:16.889Z|98839|connmgr|INFO|br-ex<->unix#243278: 2 flow_mods in the last 0 s (2 adds) 2022-04-13T06:22:31.922Z|98840|connmgr|INFO|br-ex<->unix#243281: 2 flow_mods in the last 0 s (2 adds) 2022-04-13T06:22:46.955Z|98841|connmgr|INFO|br-ex<->unix#243284: 2 flow_mods in the last 0 s (2 adds) ovsdb-server.log-20220414: 2022-04-13T06:22:01.821Z|23014|jsonrpc|WARN|unix#263088: receive error: Connection reset by peer 2022-04-13T06:22:01.821Z|23015|reconnect|WARN|unix#263088: connection dropped (Connection reset by peer) 2022-04-13T06:23:01.961Z|23016|jsonrpc|WARN|unix#263098: receive error: Connection reset by peer 2022-04-13T06:23:01.961Z|23017|reconnect|WARN|unix#263098: connection dropped (Connection reset by peer) 2022-04-13T06:24:47.208Z|23018|jsonrpc|WARN|unix#263115: receive error: Connection reset by peer But the dropwatch2.out file is full of upcall drops: $ grep -e Fri -e ovs_dp_process_packet dropwatch2.out|grep ovs_dp_process_packet -B 1 Fri May 13 06:22:25 2022 2338 packets dropped at ovs_dp_process_packet Fri May 13 06:22:30 2022 3230 packets dropped at ovs_dp_process_packet -- Fri May 13 06:24:15 2022 1847 packets dropped at ovs_dp_process_packet -- Fri May 13 06:25:20 2022 1821 packets dropped at ovs_dp_process_packet Fri May 13 06:25:25 2022 1620 packets dropped at ovs_dp_process_packet Fri May 13 06:25:30 2022 2142 packets dropped at ovs_dp_process_packet Fri May 13 06:25:35 2022 259 packets dropped at ovs_dp_process_packet -- Fri May 13 06:26:25 2022 783 packets dropped at ovs_dp_process_packet Fri May 13 06:26:30 2022 230 packets dropped at ovs_dp_process_packet -- Fri May 13 06:26:55 2022 5052 packets dropped at ovs_dp_process_packet Fri May 13 06:27:00 2022 82 packets dropped at ovs_dp_process_packet -- Fri May 13 06:27:45 2022 1077 packets dropped at ovs_dp_process_packet -- Fri May 13 06:28:00 2022 86 packets dropped at ovs_dp_process_packet -- Fri May 13 06:28:20 2022 1760 packets dropped at ovs_dp_process_packet Fri May 13 06:28:25 2022 611 packets dropped at ovs_dp_process_packet -- Fri May 13 06:28:45 2022 1306 packets dropped at ovs_dp_process_packet -- Fri May 13 06:31:00 2022 996 packets dropped at ovs_dp_process_packet -- Fri May 13 06:31:35 2022 3511 packets dropped at ovs_dp_process_packet -- Fri May 13 06:31:45 2022 417 packets dropped at ovs_dp_process_packet Maybe running the test with OVS debug enabled may give better clues. =========================== The upcall buffer was identified as too small in upstream and got bumped in the following commit: https://github.com/openvswitch/ovs/commit/b4a9c9cd848b56d538f17f94cde78d5a139c7d90 Downstream fix is available in OVS 2.16: https://gitlab.cee.redhat.com/nst/openvswitch/openvswitch2.16/-/commit/b4a9c9cd848b56d538f17f94cde78d5a139c7d90 --- Additional comment from Dan Williams on 2022-06-21 13:29:48 CDT --- This fix has been part of OCP 4.11 since mid-February 2022 via https://github.com/openshift/os/pull/715
Tested and verified in 4.10.0-0.nightly-2022-07-13-131411 #### Test log from packet dualstack cluster using 4.10.0-0.nightly-2022-07-13-131411 $ iperf3 -V -f m -l 265 -b 200m -c fd01:0:0:5::14 -u -i l -t 600 -p 59554 [ 5] 0.00-600.00 sec 14.0 GBytes 200 Mbits/sec 0.002 ms 36334/56603715 (0.064%) receiver [ 5] 0.00-600.00 sec 14.0 GBytes 200 Mbits/sec 0.002 ms 38169/56603715 (0.067%) receiver [ 5] 0.00-600.00 sec 14.0 GBytes 200 Mbits/sec 0.002 ms 41949/56603718 (0.074%) receiver
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.23 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5568