Bug 2018930

Summary:	[OCP 4.8][OVN] packet delay in communication between Pods
Product:	OpenShift Container Platform	Reporter:	Aaron Park <aapark>
Component:	Networking	Assignee:	mcambria <mcambria>
Networking sub component:	ovn-kubernetes	QA Contact:	Anurag saxena <anusaxen>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	urgent
Priority:	urgent	CC:	aconole, akaiser, bpickard, bzvonar, cgoncalves, cverna, dcbw, ealcaniz, fleitner, fpaoline, jajeon, jcaamano, jinjli, jmaxwell, jseunghw, mcambria, mleitner, mwasher, nusiddiq, openshift-bugs-escalate, pweil, tidawson, vlaad, zzhao
Version:	4.8	Keywords:	TestCannotAutomate
Target Milestone:	---	Flags:	bpickard: needinfo- bpickard: needinfo- mcambria: needinfo? mcambria: needinfo?
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-10-31 16:50:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2074566
Bug Blocks:

Description Aaron Park 2021-11-01 09:57:09 UTC

Description of problem:

When using SDN, there was a phenomenon that the connection between Pods was intermittently slow, and it was solved by applying the workaround[1] below.
But customers changed to OVN-kubernetes and the same thing happened.

[1]KCS 
https://access.redhat.com/solutions/6217361
iptables -t raw -A OUTPUT -p udp --dport 4789 -j NOTRACK
iptables -t raw -A PREROUTING -p udp --dport 4789 -j NOTRACK

Version-Release number of selected component (if applicable):

OCP 4.8.13(OVN-kubernetes, only IPv6)

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

A delay occurs when connecting between Pods for the first time, and Pod connection and disconnection are repeated.


Expected results:

The connection between Pods works normally.

Additional info:

I tried in OVN-k environment because the setting below worked in SDN environment, but it didn't work.
ip6tables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK
ip6tables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK


- /var/log/openvswitch/ovs-vswitchd.log (1 failed (Invalid argument) on packet udp6)
2021-11-01T04:47:18.273Z|27320|dpif(handler54)|WARN|Dropped 11 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2021-11-01T04:47:18.273Z|27321|dpif(handler54)|WARN|system@ovs-system: execute ct(commit,zone=198,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x67,ipv6_dst=2001:dc1:a01:2320:172:23:22:76,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x100c1}),flags(df|csum|key))),set(eth(src=0a:58:10:73:20:58,dst=0a:58:54:52:38:fa)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:8e:33:6f:6e,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::40f,ipv6_dst=fd01:0:0:24::31e,ipv6_label=0xdc38c,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:1d80
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xc6),ct_tuple6(src=fd01:0:0:9::40f,dst=fd01:0:0:24::31e,proto=17,src_port=17839,dst_port=16592),in_port(30) mtu 0
2021-11-01T04:47:20.528Z|31638|dpif(handler4)|WARN|Dropped 8 log messages in last 2 seconds (most recently, 2 seconds ago) due to excessive rate
2021-11-01T04:47:20.529Z|31639|dpif(handler4)|WARN|system@ovs-system: execute ct(commit,zone=185,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:99:31:30:8d,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::402,ipv6_dst=fd01:0:0:b::354,ipv6_label=0xd0b71,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:5896
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb9),ct_tuple6(src=fd01:0:0:9::402,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(22) mtu 0
2021-11-01T04:47:20.539Z|26593|dpif(handler52)|WARN|system@ovs-system: execute ct(commit,zone=186,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:04:d3:c8:99,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::403,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x767fe,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=1245,tp_dst=16592 udp_csum:1137
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xba),ct_tuple6(src=fd01:0:0:9::403,dst=fd01:0:0:b::354,proto=17,src_port=1245,dst_port=16592),in_port(14) mtu 0
2021-11-01T04:47:21.458Z|30183|dpif(handler55)|WARN|Dropped 10 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2021-11-01T04:47:21.458Z|30184|dpif(handler55)|WARN|system@ovs-system: execute ct(commit,zone=191,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:85:42:3f:fa,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::3fe,ipv6_dst=fd01:0:0:b::354,ipv6_label=0xef2b5,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:7c62
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xbf),ct_tuple6(src=fd01:0:0:9::3fe,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(17) mtu 0
2021-11-01T04:47:24.189Z|19984|dpif(handler20)|WARN|system@ovs-system: execute ct(commit,zone=181,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:37:fe:73:00,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::3ff,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x71507,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=1589,tp_dst=16592 udp_csum:8c8f
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb5),ct_tuple6(src=fd01:0:0:9::3ff,dst=fd01:0:0:b::354,proto=17,src_port=1589,dst_port=16592),in_port(13) mtu 0
2021-11-01T04:47:24.227Z|22406|dpif(handler48)|WARN|system@ovs-system: execute ct(commit,zone=176,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:30:16:27:ca,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::401,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x41506,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:f4d1
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb0),ct_tuple6(src=fd01:0:0:9::401,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(18) mtu 0
2021-11-01T04:47:24.273Z|10585|dpif(handler35)|WARN|system@ovs-system: execute ct(commit,zone=198,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x67,ipv6_dst=2001:dc1:a01:2320:172:23:22:76,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x100c1}),flags(df|csum|key))),set(eth(src=0a:58:10:73:20:58,dst=0a:58:54:52:38:fa)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:8e:33:6f:6e,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::40f,ipv6_dst=fd01:0:0:24::31e,ipv6_label=0xdc38c,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:1d80
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xc6),ct_tuple6(src=fd01:0:0:9::40f,dst=fd01:0:0:24::31e,proto=17,src_port=17839,dst_port=16592),in_port(30) mtu 0
2021-11-01T04:47:25.227Z|17807|dpif(handler23)|WARN|Dropped 10 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2021-11-01T04:47:25.227Z|17808|dpif(handler23)|WARN|system@ovs-system: execute ct(commit,zone=176,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:30:16:27:ca,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::401,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x41506,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:f4d1
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb0),ct_tuple6(src=fd01:0:0:9::401,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(18) mtu 0
2021-11-01T04:47:26.273Z|27369|dpif(handler15)|WARN|Dropped 15 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2021-11-01T04:47:26.273Z|27370|dpif(handler15)|WARN|system@ovs-system: execute ct(commit,zone=198,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x67,ipv6_dst=2001:dc1:a01:2320:172:23:22:76,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x100c1}),flags(df|csum|key))),set(eth(src=0a:58:10:73:20:58,dst=0a:58:54:52:38:fa)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:8e:33:6f:6e,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::40f,ipv6_dst=fd01:0:0:24::31e,ipv6_label=0xdc38c,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:1d80
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xc6),ct_tuple6(src=fd01:0:0:9::40f,dst=fd01:0:0:24::31e,proto=17,src_port=17839,dst_port=16592),in_port(30) mtu 0
2021-11-01T04:47:27.273Z|27371|dpif(handler15)|WARN|Dropped 11 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2021-11-01T04:47:27.273Z|27372|dpif(handler15)|WARN|system@ovs-system: execute ct(commit,zone=198,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x67,ipv6_dst=2001:dc1:a01:2320:172:23:22:76,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x100c1}),flags(df|csum|key))),set(eth(src=0a:58:10:73:20:58,dst=0a:58:54:52:38:fa)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:8e:33:6f:6e,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::40f,ipv6_dst=fd01:0:0:24::31e,ipv6_label=0xdc38c,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:1d80
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xc6),ct_tuple6(src=fd01:0:0:9::40f,dst=fd01:0:0:24::31e,proto=17,src_port=17839,dst_port=16592),in_port(30) mtu 0
2021-11-01T04:47:28.227Z|22407|dpif(handler48)|WARN|Dropped 12 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2021-11-01T04:47:28.227Z|22408|dpif(handler48)|WARN|system@ovs-system: execute ct(commit,zone=176,label=0/0x1),ct_clear,ct_clear,set(tunnel(tun_id=0x84,ipv6_dst=2001:dc1:a01:2320:172:23:22:74,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x200fb}),flags(df|csum|key))),set(eth(src=0a:58:8b:4b:a2:1d,dst=0a:58:e9:7e:cb:a1)),set(ipv6(hlimit=63)),1 failed (Invalid argument) on packet udp6,vlan_tci=0x0000,dl_src=0a:58:30:16:27:ca,dl_dst=0a:58:f0:79:ce:fe,ipv6_src=fd01:0:0:9::401,ipv6_dst=fd01:0:0:b::354,ipv6_label=0x41506,nw_tos=104,nw_ecn=0,nw_ttl=64,tp_src=17839,tp_dst=16592 udp_csum:f4d1
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xb0),ct_tuple6(src=fd01:0:0:9::401,dst=fd01:0:0:b::354,proto=17,src_port=17839,dst_port=16592),in_port(18) mtu 0

Comment 1 Dan Williams 2021-11-01 15:58:07 UTC

On a worker node, could you grab the output of:

iptables-save
ovs-ofctl dump-flows br-ex

Comment 2 Dan Williams 2021-11-01 16:20:01 UTC

(In reply to Dan Williams from comment #1)
> On a worker node, could you grab the output of:
> 
> iptables-save
> ovs-ofctl dump-flows br-ex

Nevermind, I see they are contained in the sosreport attached to the customer case.

Comment 73 Aaron Park 2022-02-02 12:50:52 UTC

Is the pcap attached to comments 68 and 69 the same as the existing and provided pcap?

Comment 93 Andreas Karis 2022-03-11 17:53:42 UTC

*** Bug 2054102 has been marked as a duplicate of this bug. ***

Comment 103 mcambria@redhat.com 2022-04-19 23:31:59 UTC


If there is still a delay of 200ms with the kernel that has the backport [1], it should be easy to produce pcaps which support this

The backport of this fix is supposed to fix the 200ms delay (make the re-xmit faster, not 200ms later.)

If the kernel isn't re-transmitting packets due to ~100 duplicate acks being received with [1] this still has to be fixed in Linux.


[1] https://github.com/torvalds/linux/commit/a29cb6914681a55667436a9eb7a42e28da8cf387

Comment 104 Jonathan Maxwell 2022-04-21 04:58:45 UTC

(In reply to mcambria from comment #103)
> 
> If there is still a delay of 200ms with the kernel that has the backport
> [1], it should be easy to produce pcaps which support this
> 
> The backport of this fix is supposed to fix the 200ms delay (make the
> re-xmit faster, not 200ms later.)
> 

There are still some instances of the 200ms retransmissions in the pcaps with the patched kernel containing:

net: tcp better handling of reordering then loss cases

There are some of varying times like 39ms where RACK calculated the optimal RTO interval based on heuristics as well. There are some regular retransmissions that had an interval of few ms and recovered very fast. 

> If the kernel isn't re-transmitting packets due to ~100 duplicate acks being
> received with [1] this still has to be fixed in Linux.
> 
> 
> [1]
> https://github.com/torvalds/linux/commit/
> a29cb6914681a55667436a9eb7a42e28da8cf387

I was reading the RACK RFCs and looking at the code that Yuchung implemented. It's a lot more complex than the older pre-RACK tcp dupack recovery mechinism. AFAICT it's supposed to be a lot smarter than the basic 3 dupacks will trigger a fast retransmission. 

I don't think RACK will guarantee that there will never be 200ms timer based retransmissions. The commit improved some situations but there are other valid reasons that 200ms restransmit can occur.