Description of problem: When using SDN, there was a phenomenon that the connection between Pods was intermittently slow, and it was solved by applying the workaround[1] below. But customers changed to OVN-kubernetes and the same thing happened. [1]KCS https://access.redhat.com/solutions/6217361 iptables -t raw -A OUTPUT -p udp --dport 4789 -j NOTRACK iptables -t raw -A PREROUTING -p udp --dport 4789 -j NOTRACK Version-Release number of selected component (if applicable): OCP 4.8.13(OVN-kubernetes, only IPv6) How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: A delay occurs when connecting between Pods for the first time, and Pod connection and disconnection are repeated. Expected results: The connection between Pods works normally. Additional info: I tried in OVN-k environment because the setting below worked in SDN environment, but it didn't work. ip6tables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK ip6tables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK - /var/log/openvswitch/ovs-vswitchd.log (1 failed (Invalid argument) on packet udp6) 2021-11-01T04:47:18.273Z|27320|dpif(handler54)|WARN|Dropped 11 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate 2021-11-01T04:47:18.273Z|27321|dpif(handler54)|WARN|system@ovs-system: execute ct(commit,zone=198,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x67,ipv6_dst=2001:dc1:a01:2320:172:23:22:76,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x100c1}),flags(df|csum|key))),set(eth(src=0a:58:10:73:20:58,dst=0a:58:54:52:38:fa)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:8e:33:6f:6e,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::40f,ipv6_dst=fd01:0:0:24::31e,ipv6_label=0xdc38c,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:1d80 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xc6),ct_tuple6(src=fd01:0:0:9::40f,dst=fd01:0:0:24::31e,proto=17,src_port=17839,dst_port=16592),in_port(30) mtu 0 2021-11-01T04:47:20.528Z|31638|dpif(handler4)|WARN|Dropped 8 log messages in last 2 seconds (most recently, 2 seconds ago) due to excessive rate 2021-11-01T04:47:20.529Z|31639|dpif(handler4)|WARN|system@ovs-system: execute ct(commit,zone=185,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:99:31:30:8d,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::402,ipv6_dst=fd01:0:0:b::354,ipv6_label=0xd0b71,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:5896 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb9),ct_tuple6(src=fd01:0:0:9::402,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(22) mtu 0 2021-11-01T04:47:20.539Z|26593|dpif(handler52)|WARN|system@ovs-system: execute ct(commit,zone=186,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:04:d3:c8:99,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::403,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x767fe,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=1245,tp_dst=16592 udp_csum:1137 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xba),ct_tuple6(src=fd01:0:0:9::403,dst=fd01:0:0:b::354,proto=17,src_port=1245,dst_port=16592),in_port(14) mtu 0 2021-11-01T04:47:21.458Z|30183|dpif(handler55)|WARN|Dropped 10 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate 2021-11-01T04:47:21.458Z|30184|dpif(handler55)|WARN|system@ovs-system: execute ct(commit,zone=191,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:85:42:3f:fa,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::3fe,ipv6_dst=fd01:0:0:b::354,ipv6_label=0xef2b5,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:7c62 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xbf),ct_tuple6(src=fd01:0:0:9::3fe,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(17) mtu 0 2021-11-01T04:47:24.189Z|19984|dpif(handler20)|WARN|system@ovs-system: execute ct(commit,zone=181,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:37:fe:73:00,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::3ff,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x71507,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=1589,tp_dst=16592 udp_csum:8c8f with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb5),ct_tuple6(src=fd01:0:0:9::3ff,dst=fd01:0:0:b::354,proto=17,src_port=1589,dst_port=16592),in_port(13) mtu 0 2021-11-01T04:47:24.227Z|22406|dpif(handler48)|WARN|system@ovs-system: execute ct(commit,zone=176,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:30:16:27:ca,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::401,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x41506,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:f4d1 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb0),ct_tuple6(src=fd01:0:0:9::401,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(18) mtu 0 2021-11-01T04:47:24.273Z|10585|dpif(handler35)|WARN|system@ovs-system: execute ct(commit,zone=198,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x67,ipv6_dst=2001:dc1:a01:2320:172:23:22:76,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x100c1}),flags(df|csum|key))),set(eth(src=0a:58:10:73:20:58,dst=0a:58:54:52:38:fa)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:8e:33:6f:6e,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::40f,ipv6_dst=fd01:0:0:24::31e,ipv6_label=0xdc38c,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:1d80 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xc6),ct_tuple6(src=fd01:0:0:9::40f,dst=fd01:0:0:24::31e,proto=17,src_port=17839,dst_port=16592),in_port(30) mtu 0 2021-11-01T04:47:25.227Z|17807|dpif(handler23)|WARN|Dropped 10 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate 2021-11-01T04:47:25.227Z|17808|dpif(handler23)|WARN|system@ovs-system: execute ct(commit,zone=176,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:30:16:27:ca,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::401,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x41506,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:f4d1 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb0),ct_tuple6(src=fd01:0:0:9::401,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(18) mtu 0 2021-11-01T04:47:26.273Z|27369|dpif(handler15)|WARN|Dropped 15 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate 2021-11-01T04:47:26.273Z|27370|dpif(handler15)|WARN|system@ovs-system: execute ct(commit,zone=198,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x67,ipv6_dst=2001:dc1:a01:2320:172:23:22:76,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x100c1}),flags(df|csum|key))),set(eth(src=0a:58:10:73:20:58,dst=0a:58:54:52:38:fa)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:8e:33:6f:6e,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::40f,ipv6_dst=fd01:0:0:24::31e,ipv6_label=0xdc38c,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:1d80 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xc6),ct_tuple6(src=fd01:0:0:9::40f,dst=fd01:0:0:24::31e,proto=17,src_port=17839,dst_port=16592),in_port(30) mtu 0 2021-11-01T04:47:27.273Z|27371|dpif(handler15)|WARN|Dropped 11 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate 2021-11-01T04:47:27.273Z|27372|dpif(handler15)|WARN|system@ovs-system: execute ct(commit,zone=198,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x67,ipv6_dst=2001:dc1:a01:2320:172:23:22:76,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x100c1}),flags(df|csum|key))),set(eth(src=0a:58:10:73:20:58,dst=0a:58:54:52:38:fa)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:8e:33:6f:6e,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::40f,ipv6_dst=fd01:0:0:24::31e,ipv6_label=0xdc38c,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:1d80 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xc6),ct_tuple6(src=fd01:0:0:9::40f,dst=fd01:0:0:24::31e,proto=17,src_port=17839,dst_port=16592),in_port(30) mtu 0 2021-11-01T04:47:28.227Z|22407|dpif(handler48)|WARN|Dropped 12 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate 2021-11-01T04:47:28.227Z|22408|dpif(handler48)|WARN|system@ovs-system: execute ct(commit,zone=176,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:30:16:27:ca,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::401,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x41506,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:f4d1 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb0),ct_tuple6(src=fd01:0:0:9::401,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(18) mtu 0
On a worker node, could you grab the output of: iptables-save ovs-ofctl dump-flows br-ex
(In reply to Dan Williams from comment #1) > On a worker node, could you grab the output of: > > iptables-save > ovs-ofctl dump-flows br-ex Nevermind, I see they are contained in the sosreport attached to the customer case.
Is the pcap attached to comments 68 and 69 the same as the existing and provided pcap?
*** Bug 2054102 has been marked as a duplicate of this bug. ***
If there is still a delay of 200ms with the kernel that has the backport [1], it should be easy to produce pcaps which support this The backport of this fix is supposed to fix the 200ms delay (make the re-xmit faster, not 200ms later.) If the kernel isn't re-transmitting packets due to ~100 duplicate acks being received with [1] this still has to be fixed in Linux. [1] https://github.com/torvalds/linux/commit/a29cb6914681a55667436a9eb7a42e28da8cf387
(In reply to mcambria from comment #103) > > If there is still a delay of 200ms with the kernel that has the backport > [1], it should be easy to produce pcaps which support this > > The backport of this fix is supposed to fix the 200ms delay (make the > re-xmit faster, not 200ms later.) > There are still some instances of the 200ms retransmissions in the pcaps with the patched kernel containing: net: tcp better handling of reordering then loss cases There are some of varying times like 39ms where RACK calculated the optimal RTO interval based on heuristics as well. There are some regular retransmissions that had an interval of few ms and recovered very fast. > If the kernel isn't re-transmitting packets due to ~100 duplicate acks being > received with [1] this still has to be fixed in Linux. > > > [1] > https://github.com/torvalds/linux/commit/ > a29cb6914681a55667436a9eb7a42e28da8cf387 I was reading the RACK RFCs and looking at the code that Yuchung implemented. It's a lot more complex than the older pre-RACK tcp dupack recovery mechinism. AFAICT it's supposed to be a lot smarter than the basic 3 dupacks will trigger a fast retransmission. I don't think RACK will guarantee that there will never be 200ms timer based retransmissions. The commit improved some situations but there are other valid reasons that 200ms restransmit can occur.