[sig-network] pods should successfully create sandboxes by adding pod to network has begun gailing in CI on ovn, see: https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=%5Bsig-network%5D%20pods%20should%20successfully%20create%20sandboxes%20by%20adding%20pod%20to%20network error adding container to network "ovn-kubernetes": CNI request failed with status 400: failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:80:02:06 [10.128.2.6/23] Possibly from https://github.com/openshift/ovn-kubernetes/pull/1064#issuecomment-1114212169, possibly from RHCOS 8.6. We do not yet know why.
The cause of the "timed out waiting for OVS port binding" is because ovn-controller is not receiving packets from the OVN SBDB for a period of time after it starts. ovn-controller has a default inactivity timeout of 180 seconds, so for 3 minutes after the node starts (and kubelet tries to launch new containers) all of them fail. We can see that ovn-controller complains here that is getting nothing from sbdb: [trozet@fedora must-gather.local.7812632480284007845]$ omg logs ovnkube-node-h8vzv -c ovn-controller | grep 10.0.170.147 2022-05-09T22:22:57.520089247Z 2022-05-09T22:22:57.520Z|00006|reconnect|INFO|ssl:10.0.170.147:9642: connecting... 2022-05-09T22:22:57.527961633Z 2022-05-09T22:22:57.527Z|00008|reconnect|INFO|ssl:10.0.170.147:9642: connected 2022-05-09T22:25:16.528231692Z 2022-05-09T22:25:16.528Z|00031|reconnect|ERR|ssl:10.0.170.147:9642: no response to inactivity probe after 90 seconds, disconnecting 2022-05-09T22:25:16.528231692Z 2022-05-09T22:25:16.528Z|00032|reconnect|INFO|ssl:10.0.170.147:9642: connection dropped ^ 3 minutes of outage During the same time period, SBDB also thinks its getting nothing from ovn-controller: [trozet@fedora must-gather.local.7812632480284007845]$ omg logs ovnkube-master-l85dd -c sbdb | grep 10.0.247.77 2022-05-09T22:23:57.696598680Z 2022-05-09T22:23:57.696Z|00125|reconnect|ERR|ssl:10.0.247.77:47710: no response to inactivity probe after 30 seconds, disconnecting We then reproduced this issue while running tcpdump on the br-ex kernel interfaces to see what actual traffic is being sent between the nodes. From the packet captures I can see that SBDB is continuously re-transmitting packets that make it into ovn-controller's node, but the node never sends back an ACK. The final packets we see from ovn-controller are a TCP window size increase and a few more packets and then it stops responding all together. The question is why the kernel stops ACKing packets that are arriving into the node? The only explanation I can come up with for that is perhaps the receive queue is full and the kernel is refusing to ACK any new packets that come in. My theory is the increase in window size (receive buffer) helps support this, but I really need help from a kernel network stack expert to debug further. With all that being said, this problem was only seen when moving to RHCOS 8.6 and not seen in earlier versions. After reverting the bump to RHCOS 8.6 it seems to have fixed the issue. This is either a bug in the 8.6 kernel or some compatibility problem with the same OVN version on the new kernel. Going to open a bugzilla on the kernel team to help in investigate.
8.5 kernel - 4.18.0-348.23.1.el8_5.x86_64 8.6 kernel - 4.18.0-369.el8.x86_64 Florian is looking into this from the kernel side.
Linking to an example failure for future me: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn-upgrade/1521037775749517312 . The ci search results will clear in another five days or so.
Florian used tcpreplay on the capture and didn't find any irregularities with the affected kernel. I think the next step is to reproduce with net.netfilter.nf_conntrack_log_invalid=6 set so we can see if CT is dropping the packet as invalid, as well as enable some iptables tracing to see if the packet is making into/through iptables.
After some further analysis I think we know whats happening here. Somehow mid TCP conversation (following a TCP window update) the client starts sending TCP packets on with a *different* TCP source port that are part of the *same* TCP conversation: 1337 2022-05-13 12:54:54.629149 10.0.201.4 → 10.0.172.7 TCP 66 54728 → 9642 [ACK] Seq=5592 Ack=1212160 Win=162176 Len=0 TSval=1001755650 TSecr=4035653492 1338 2022-05-13 12:54:54.629436 10.0.172.7 → 10.0.201.4 TLSv1.3 9015 Application Data [TCP segment of a reassembled PDU] 1339 2022-05-13 12:54:54.629805 10.0.172.7 → 10.0.201.4 TLSv1.3 26913 Application Data [TCP segment of a reassembled PDU] 1340 2022-05-13 12:54:54.629852 10.0.201.4 → 10.0.172.7 TCP 66 54728 → 9642 [ACK] Seq=5592 Ack=1247956 Win=174592 Len=0 TSval=1001755651 TSecr=4035653493 1341 2022-05-13 12:54:54.629882 10.0.172.7 → 10.0.201.4 TLSv1.3 62709 Application Data, Application Data, Application Data, Application Data 1342 2022-05-13 12:54:54.629892 10.0.172.7 → 10.0.201.4 TLSv1.3 26288 Application Data, Application Data, Application Data, Application Data 1343 2022-05-13 12:54:54.630144 10.0.201.4 → 10.0.172.7 TCP 66 54728 → 9642 [ACK] Seq=5592 Ack=1336821 Win=112768 Len=0 TSval=1001755651 TSecr=4035653493 1344 2022-05-13 12:54:54.633146 10.0.201.4 → 10.0.172.7 TCP 66 [TCP Window Update] 54728 → 9642 [ACK] Seq=5592 Ack=1336821 Win=248576 Len=0 TSval=1001755654 TSecr=4035653493 1345 2022-05-13 12:54:55.077203 10.0.172.7 → 10.0.201.4 TLSv1.3 229 Application Data 1346 2022-05-13 12:54:55.077285 10.0.172.7 → 10.0.201.4 TLSv1.3 1168 Application Data 1347 2022-05-13 12:54:55.083290 10.0.172.7 → 10.0.201.4 TCP 1168 [TCP Retransmission] 9642 → 54728 [PSH, ACK] Seq=1336984 Ack=5592 Win=54400 Len=1102 TSval=4035653947 TSecr=1001755654 1348 2022-05-13 12:54:55.083378 10.0.201.4 → 10.0.172.7 TCP 78 50464 → 9642 [ACK] Seq=1 Ack=1 Win=4144 Len=0 TSval=1001756104 TSecr=4035653940 SLE=4294966195 SRE=1 1349 2022-05-13 12:54:55.124728 10.0.172.7 → 10.0.201.4 TLSv1.3 229 Application Data 1350 2022-05-13 12:54:55.124766 10.0.201.4 → 10.0.172.7 TCP 66 50464 → 9642 [ACK] Seq=1 Ack=164 Win=4284 Len=0 TSval=1001756146 TSecr=4035653988 1351 2022-05-13 12:54:55.124780 10.0.172.7 → 10.0.201.4 TLSv1.3 1117 Application Data 1352 2022-05-13 12:54:55.124803 10.0.201.4 → 10.0.172.7 TCP 66 50464 → 9642 [ACK] Seq=1 Ack=1215 Win=4424 Len=0 TSval=1001756146 TSecr=4035653988 1353 2022-05-13 12:54:55.284294 10.0.172.7 → 10.0.201.4 TCP 2545 [TCP Retransmission] 9642 → 54728 [PSH, ACK] Seq=1336821 Ack=5592 Win=54400 Len=2479 TSval=4035654148 TSecr=1001755654 1354 2022-05-13 12:54:55.284328 10.0.201.4 → 10.0.172.7 TCP 78 [TCP Dup ACK 1352#1] 50464 → 9642 [ACK] Seq=1 Ack=1215 Win=4424 Len=0 TSval=1001756305 TSecr=4035654148 SLE=4294966032 SRE=1215 1355 2022-05-13 12:54:55.692288 10.0.172.7 → 10.0.201.4 TCP 2545 [TCP Retransmission] 9642 → 54728 [PSH, ACK] Seq=1336821 Ack=5592 Win=54400 Len=2479 TSval=4035654556 TSecr=1001755654 1356 2022-05-13 12:54:55.692326 10.0.201.4 → 10.0.172.7 TCP 78 [TCP Dup ACK 1352#2] 50464 → 9642 [ACK] Seq=1 Ack=1215 Win=4424 Len=0 TSval=1001756713 TSecr=4035654556 SLE=4294966032 SRE=1215 1357 2022-05-13 12:54:56.525308 10.0.172.7 → 10.0.201.4 TCP 2545 [TCP Retransmission] 9642 → 54728 [PSH, ACK] Seq=1336821 Ack=5592 Win=54400 Len=2479 TSval=4035655389 TSecr=1001755654 1358 2022-05-13 12:54:56.525365 10.0.201.4 → 10.0.172.7 TCP 78 [TCP Dup ACK 1352#3] 50464 → 9642 [ACK] Seq=1 Ack=1215 Win=4424 Len=0 TSval=1001757546 TSecr=4035655389 SLE=4294966032 SRE=1215 In the above output, 10.0.201.4 is the ovn-controller (client) and 10.0.172.7 is the SBDB (server). The original connection started with the client connecting using port 54728. After sending a window update in packet 1344, the client starts ACK'ing further packets using a different TCP port (50464)! Upon examining the sequence/ack numbers in these packets they match perfectly with the packets being sent on the original connection of 54728. This is definitely a bug in the kernel. It's hard to say whether it is rooted in conntrack somehow changing the source port or if something gets mangled down in the TCP data structure referencing this socket. I'm going to try to reproduce again and capture conntrack events and/or try to bypass conntrack to rule that out, but ultimately will need help from the kernel team to try to identify what is happening with this kernel. Will attach the pcap to this bz.
Created attachment 1879529 [details] pcap
Just to clarify the sequence numbers shown in the above output are relative. Since tshark thinks there are 2 different conversations happening it starts the second stream at 1. However, if you look at the raw sequence number you can see they match: [trozet@fedora ovnkube-node-q7pq5]$ tshark -r tcpdump_capture_br-ex-05-13-16-20-22-1652458822.pcap -Y "frame.number==1343 || frame.number==1348" -T fields -E separator=, -e ip.src -e ip.dst -e tcp.srcport -e tcp.dstport -e frame.len -e tcp.flags -e tcp.seq_raw -e _ws.col.Info -E header=y -E quote=d -E occurrence=f ip.src,ip.dst,tcp.srcport,tcp.dstport,frame.len,tcp.flags,tcp.seq_raw,_ws.col.Info "10.0.201.4","10.0.172.7","54728","9642","66","0x0010","3698430526","54728 → 9642 [ACK] Seq=5592 Ack=1336821 Win=112768 Len=0 TSval=1001755651 TSecr=4035653493" "10.0.201.4","10.0.172.7","50464","9642","78","0x0010","3698430526","50464 → 9642 [ACK] Seq=1 Ack=1 Win=4144 Len=0 TSval=1001756104 TSecr=4035653940 SLE=4294966195 SRE=1" ^ sequence number on both streams: 3698430526
Managed to reproduce while monitoring conntrack events. It looks like this is caused by conntrack. In this example the original connection is: 10.0.173.53:9642 (SBDB server) <----> 10.0.171.17:48804 (client ovn-controller) When the problem happens, client changes its port to be 65349, and in conntrack I see: [trozet@fedora 2nd]$ cat conntrack.log | egrep 'port=65349|port=48804' May 13 22:04:25 ip-10-0-171-17 conntrack[3178]: [NEW] tcp 6 120 SYN_SENT src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 May 13 22:04:25 ip-10-0-171-17 conntrack[3178]: [NEW] tcp 6 120 SYN_SENT src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 mark=2 zone=64000 May 13 22:04:25 ip-10-0-171-17 conntrack[3178]: [UPDATE] tcp 6 60 SYN_RECV src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 May 13 22:04:25 ip-10-0-171-17 conntrack[3178]: [UPDATE] tcp 6 432000 ESTABLISHED src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [ASSURED] May 13 22:04:25 ip-10-0-171-17 conntrack[3178]: [DESTROY] tcp 6 431999 ESTABLISHED src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [ASSURED] May 13 22:04:51 ip-10-0-171-17 conntrack[3178]: [NEW] tcp 6 300 ESTABLISHED src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [UNREPLIED] src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=61029 May 13 22:04:51 ip-10-0-171-17 conntrack[3178]: [NEW] tcp 6 300 ESTABLISHED src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=65349 ^^^^^^^^^^^^^^^^^ May 13 22:04:51 ip-10-0-171-17 conntrack[3178]: [NEW] tcp 6 300 ESTABLISHED src=10.0.171.17 dst=10.0.173.53 sport=65349 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=65349 mark=2 zone=64000 May 13 22:05:25 ip-10-0-171-17 conntrack[3178]: [UPDATE] tcp 6 120 FIN_WAIT src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [UNREPLIED] src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=61029 May 13 22:05:25 ip-10-0-171-17 conntrack[3178]: [UPDATE] tcp 6 120 FIN_WAIT src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=65349 May 13 22:05:25 ip-10-0-171-17 conntrack[3178]: [UPDATE] tcp 6 30 LAST_ACK src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=65349 May 13 22:05:43 ip-10-0-171-17 conntrack[3178]: [UPDATE] tcp 6 30 LAST_ACK src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [UNREPLIED] src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=61029 May 13 22:05:43 ip-10-0-171-17 conntrack[3178]: [UPDATE] tcp 6 120 TIME_WAIT src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=65349 May 13 22:06:24 ip-10-0-171-17 conntrack[3178]: [DESTROY] tcp 6 LAST_ACK src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [ASSURED] mark=2 zone=64000 May 13 22:06:27 ip-10-0-171-17 conntrack[3178]: [DESTROY] tcp 6 LAST_ACK src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [UNREPLIED] src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=61029 For some reason conntrack thinks the connection is destroyed, even though it isn't then creates 2 new entries, one of which changes the source port. Will attach this pcap as well.
iptables output: sh-4.4# iptables-save # Generated by iptables-save v1.8.4 on Fri May 13 22:24:12 2022 *raw :PREROUTING ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A PREROUTING -p tcp -m tcp --sport 9642 -j TRACE -A PREROUTING -p udp -m udp --dport 6081 -j NOTRACK -A OUTPUT -p udp -m udp --dport 6081 -j NOTRACK COMMIT # Completed on Fri May 13 22:24:12 2022 # Generated by iptables-save v1.8.4 on Fri May 13 22:24:12 2022 *mangle :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] :KUBE-KUBELET-CANARY - [0:0] :OVN-KUBE-ITP - [0:0] -A PREROUTING -p tcp -m tcp --sport 9642 -j TRACE -A OUTPUT -j OVN-KUBE-ITP COMMIT # Completed on Fri May 13 22:24:12 2022 # Generated by iptables-save v1.8.4 on Fri May 13 22:24:12 2022 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :KUBE-FIREWALL - [0:0] :KUBE-KUBELET-CANARY - [0:0] -A INPUT -j KUBE-FIREWALL -A INPUT -p tcp -m tcp --sport 9642 -j TRACE -A FORWARD -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable -A FORWARD -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable -A OUTPUT -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable -A OUTPUT -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp-port-unreachable -A OUTPUT -j KUBE-FIREWALL -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP -A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP COMMIT # Completed on Fri May 13 22:24:12 2022 # Generated by iptables-save v1.8.4 on Fri May 13 22:24:12 2022 *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :KUBE-MARK-MASQ - [0:0] :KUBE-MARK-DROP - [0:0] :KUBE-POSTROUTING - [0:0] :KUBE-KUBELET-CANARY - [0:0] :OVN-KUBE-SNAT-MGMTPORT - [0:0] :OVN-KUBE-ITP - [0:0] :OVN-KUBE-NODEPORT - [0:0] :OVN-KUBE-EXTERNALIP - [0:0] :OVN-KUBE-ETP - [0:0] -A PREROUTING -j OVN-KUBE-ETP -A PREROUTING -j OVN-KUBE-EXTERNALIP -A PREROUTING -j OVN-KUBE-NODEPORT -A POSTROUTING -o ovn-k8s-mp0 -j OVN-KUBE-SNAT-MGMTPORT -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING -A OUTPUT -j OVN-KUBE-EXTERNALIP -A OUTPUT -j OVN-KUBE-NODEPORT -A OUTPUT -j OVN-KUBE-ITP -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000 -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000 -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN -A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0 -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully -A OVN-KUBE-SNAT-MGMTPORT -o ovn-k8s-mp0 -m comment --comment "OVN SNAT to Management Port" -j SNAT --to-source 10.131.0.2 -A OVN-KUBE-SNAT-MGMTPORT -p tcp -m tcp --dport 30858 -j RETURN -A OVN-KUBE-SNAT-MGMTPORT -p tcp -m tcp --dport 31285 -j RETURN -A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30858 -j DNAT --to-destination 172.30.37.190:443 -A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 31285 -j DNAT --to-destination 172.30.37.190:80 -A OVN-KUBE-ETP -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 30858 -j DNAT --to-destination 169.254.169.3:30858 -A OVN-KUBE-ETP -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 31285 -j DNAT --to-destination 169.254.169.3:31285 COMMIT # Completed on Fri May 13 22:24:12 2022
Created attachment 1879578 [details] 2nd pcap with ports 48804 and 65349
The problem no longer occurs if conntrack is skipped for these connections: sh-4.4# iptables -t raw -L Chain PREROUTING (policy ACCEPT) target prot opt source destination CT tcp -- anywhere anywhere tcp spt:9642 NOTRACK TRACE tcp -- anywhere anywhere tcp spt:9642 CT udp -- anywhere anywhere udp dpt:geneve NOTRACK Chain OUTPUT (policy ACCEPT) target prot opt source destination CT tcp -- anywhere anywhere tcp dpt:9642 NOTRACK CT udp -- anywhere anywhere udp dpt:geneve NOTRACK
(In reply to Tim Rozet from comment #10) > Managed to reproduce while monitoring conntrack events. It looks like this > is caused by conntrack. In this example the original connection is: > 10.0.173.53:9642 (SBDB server) <----> 10.0.171.17:48804 (client > ovn-controller) > > When the problem happens, client changes its port to be 65349, and in > conntrack I see: > [trozet@fedora 2nd]$ cat conntrack.log | egrep 'port=65349|port=48804' [NEW] tcp 120 SYN_SENT src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [NEW] tcp 120 SYN_SENT src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 mark=2 zone=64000 [UPDATE] tcp 60 SYN_RECV src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [UPDATE] tcp 432000 ESTABLISHED src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [ASSURED] [DESTROY] tcp 431999 ESTABLISHED src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [ASSURED] [NEW] tcp 300 ESTABLISHED src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=48804 [UNREPLIED] src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=61029 [NEW] tcp 300 ESTABLISHED src=10.0.171.17 dst=10.0.173.53 sport=48804 dport=9642 [UNREPLIED] src=10.0.173.53 dst=10.0.171.17 sport=9642 dport=65349 Ok, thanks, this is great catch. So the culprit is bug #2006167. As a workaround you could disable the mid-stream pickup net.netfilter.nf_conntrack_tcp_loose=0. NOTRACK works as well of course. Whats happening here is that (for as-of-yet unknown reason) the conntrack entry is removed, and then the next packet is picked up as new in the *wrong* direction, sport is seen as 9642, dest as 48804. This then triggers the auto-port rewrite that was supposed to work around port shadowing. This will be reverted in 8.6 zstream via #2065266.
Talking with Florian it sounds like the cause of the conntrack deletion should be root-caused, as that may prevent the port rewrite in the first place.
So on my F35 with kernel 5.16.18-200.fc35.x86_64 I see similarly odd behavior: [UPDATE] tcp 6 120 FIN_WAIT src=172.18.0.5 dst=172.18.0.3 sport=41002 dport=6642 src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41002 [ASSURED] [UPDATE] tcp 6 30 LAST_ACK src=172.18.0.5 dst=172.18.0.3 sport=41002 dport=6642 src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41002 [ASSURED] [UPDATE] tcp 6 120 TIME_WAIT src=172.18.0.5 dst=172.18.0.3 sport=41002 dport=6642 src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41002 [ASSURED] ###ovn-controller container restarted, 41002 is old port, 41004 is new port [NEW] tcp 6 120 SYN_SENT src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=6642 [UNREPLIED] src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41004 [NEW] tcp 6 120 SYN_SENT src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=6642 [UNREPLIED] src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41004 mark=2 zone=64000 [UPDATE] tcp 6 60 SYN_RECV src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=6642 src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41004 [UPDATE] tcp 6 432000 ESTABLISHED src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=6642 src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41004 [ASSURED] [DESTROY] tcp 6 431999 ESTABLISHED src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=6642 src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41004 [ASSURED] <-------------Destroyed establish session?? [DESTROY] tcp 6 107 TIME_WAIT src=172.18.0.5 dst=172.18.0.3 sport=41002 dport=6642 src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41002 [ASSURED] [NEW] tcp 6 300 ESTABLISHED src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41004 [UNREPLIED] src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=53254 <-------------Random new source port [NEW] tcp 6 300 ESTABLISHED src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=6642 [UNREPLIED] src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=31963 <-------------Random new source port [NEW] tcp 6 300 ESTABLISHED src=172.18.0.5 dst=172.18.0.3 sport=31963 dport=6642 [UNREPLIED] src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=31963 mark=2 zone=64000 [DESTROY] tcp 6 300 CLOSE src=172.18.0.5 dst=172.18.0.3 sport=31963 dport=6642 [UNREPLIED] src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=31963 mark=2 zone=64000 [DESTROY] tcp 6 300 CLOSE src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=6642 [UNREPLIED] src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=31963 [DESTROY] tcp 6 300 CLOSE src=172.18.0.3 dst=172.18.0.5 sport=6642 dport=41004 [UNREPLIED] src=172.18.0.5 dst=172.18.0.3 sport=41004 dport=53254 [NEW] tcp 6 120 SYN_SENT src=172.18.0.5 dst=172.18.0.2 sport=49500 dport=6642 [UNREPLIED] src=172.18.0.2 dst=172.18.0.5 sport=6642 dport=49500 [NEW] tcp 6 120 SYN_SENT src=172.18.0.5 dst=172.18.0.2 sport=49500 dport=6642 [UNREPLIED] src=172.18.0.2 dst=172.18.0.5 sport=6642 dport=49500 mark=2 zone=64000 [UPDATE] tcp 6 60 SYN_RECV src=172.18.0.5 dst=172.18.0.2 sport=49500 dport=6642 src=172.18.0.2 dst=172.18.0.5 sport=6642 dport=49500 [UPDATE] tcp 6 432000 ESTABLISHED src=172.18.0.5 dst=172.18.0.2 sport=49500 dport=6642 src=172.18.0.2 dst=172.18.0.5 sport=6642 dport=49500 [ASSURED] Pcap output: 15:27:55.592623 02:42:ac:12:00:05 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 41575, offset 0, flags [DF], proto TCP (6), length 60) 172.18.0.5.41004 > 172.18.0.3.6642: Flags [S], cksum 0x585b (incorrect -> 0x3c34), seq 1643667136, win 64240, options [mss 1460,sackOK,TS val 2757007746 ecr 0,nop,wscale 7], length 0 15:27:55.592695 02:42:ac:12:00:03 > 02:42:ac:12:00:05, ethertype IPv4 (0x0800), length 74: (tos 0xc0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 172.18.0.3.6642 > 172.18.0.5.41004: Flags [S.], cksum 0x585b (incorrect -> 0x724e), seq 3988055409, ack 1643667137, win 65160, options [mss 1460,sackOK,TS val 2292607599 ecr 2757007746,nop,wscale 7], length 0 15:27:55.592715 02:42:ac:12:00:05 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 41576, offset 0, flags [DF], proto TCP (6), length 52) 172.18.0.5.41004 > 172.18.0.3.6642: Flags [.], cksum 0x5853 (incorrect -> 0x9dac), seq 1, ack 1, win 502, options [nop,nop,TS val 2757007747 ecr 2292607599], length 0 ...rest of normal push/acks omitted for brevity 15:28:05.077279 02:42:ac:12:00:03 > 02:42:ac:12:00:05, ethertype IPv4 (0x0800), length 206: (tos 0xc0, ttl 64, id 55243, offset 0, flags [DF], proto TCP (6), length 192) 172.18.0.3.6642 > 172.18.0.5.41004: Flags [P.], cksum 0x58df (incorrect -> 0x3239), seq 269440:269580, ack 3044, win 501, options [nop,nop,TS val 2292617083 ecr 2757007756], length 140 15:28:05.077338 02:42:ac:12:00:05 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 41596, offset 0, flags [DF], proto TCP (6), length 52) 172.18.0.5.31963 > 172.18.0.3.6642: Flags [.], cksum 0x5853 (incorrect -> 0x4d74), seq 1643670180, ack 3988324989, win 629, options [nop,nop,TS val 2757017231 ecr 2292617083], length 0 ^^^wrong port in ACK!!! 15:28:05.077398 02:42:ac:12:00:03 > 02:42:ac:12:00:05, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 172.18.0.3.6642 > 172.18.0.5.31963: Flags [R], cksum 0xd0af (correct), seq 3988324989, win 0, length 0 15:28:05.077493 02:42:ac:12:00:03 > 02:42:ac:12:00:05, ethertype IPv4 (0x0800), length 292: (tos 0xc0, ttl 64, id 55244, offset 0, flags [DF], proto TCP (6), length 278) 172.18.0.3.6642 > 172.18.0.5.41004: Flags [P.], cksum 0x5935 (incorrect -> 0x23e2), seq 269580:269806, ack 3044, win 501, options [nop,nop,TS val 2292617083 ecr 2757007756], length 226 15:28:05.077510 02:42:ac:12:00:05 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 54: (tos 0xc0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 172.18.0.5.41004 > 172.18.0.3.6642: Flags [R], cksum 0xd0f8 (correct), seq 1643670180, win 0, length 0 15:28:05.077820 02:42:ac:12:00:05 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 44267, offset 0, flags [DF], proto TCP (6), length 60) 172.18.0.5.49500 > 172.18.0.2.6642: Flags [S], cksum 0x585a (incorrect -> 0x2718), seq 3594600488, win 64240, options [mss 1460,sackOK,TS val 508935613 ecr 0,nop,wscale 7], length 0 15:28:05.078033 02:42:ac:12:00:02 > 02:42:ac:12:00:05, ethertype IPv4 (0x0800), length 74: (tos 0xc0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 172.18.0.2.6642 > 172.18.0.5.49500: Flags [S.], cksum 0x585a (incorrect -> 0x3221), seq 4128814769, ack 3594600489, win 65160, options [mss 1460,sackOK,TS val 3827130469 ecr 508935613,nop,wscale 7], length 0 15:28:05.078068 02:42:ac:12:00:05 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 44268, offset 0, flags [DF], proto TCP (6), length 52) 172.18.0.5.49500 > 172.18.0.2.6642: Flags [.], cksum 0x5852 (incorrect -> 0x5d80), seq 1, ack 1, win 502, options [nop,nop,TS val 508935613 ecr 3827130469], length 0 Florian, does my kernel version contain the fix? Anything else we can do to help debug this? Do we need to clone this to a kernel bz?
(In reply to Tim Rozet from comment #16) > Florian, does my kernel version contain the fix? Anything else we can do to > help debug this? Do we need to clone this to a kernel bz? It's already cloned to a kernel BZ: bug 2065266 But it's my understanding that won't fix whatever is deleting the conntrack entry from deleting it; so (I think?) we'll still maybe get a connection reset but at least it won't wait for the full idle timeout before reconnecting. Could be wrong; Florian can correct me.
Tim, is it possible it's ovn-controller reconnecting to the OVS bridge when it restarts? <kill ovn-controller> May 17 01:48:38 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22533|stream_fd|DBG|recv: Connection reset by peer May 17 01:48:38 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22534|rconn|WARN|br-int<->unix#119078: connection dropped (Connection reset by peer) May 17 01:48:38 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22535|rconn|DBG|br-int<->unix#119078: entering DISCONNECTED May 17 01:48:38 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22536|rconn|DBG|br-int<->unix#119079: connection closed by peer May 17 01:48:38 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22537|rconn|DBG|br-int<->unix#119079: entering DISCONNECTED May 17 01:48:38 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22538|rconn|DBG|br-int<->unix#119080: connection closed by peer May 17 01:48:38 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22539|rconn|DBG|br-int<->unix#119080: entering DISCONNECTED <ovn-controller reconnects> May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22819|poll_loop|DBG|wakeup due to [POLLIN] on fd 49 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:274 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22820|rconn|DBG|br-int<->unix#119114: entering CONNECTING May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22821|poll_loop|DBG|wakeup due to [POLLOUT] on fd 10 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:153 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22822|vconn|DBG|unix#119114: sent (Success): OFPT_HELLO (OF1.5) (xid=0x1a88e): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22823|poll_loop|DBG|wakeup due to [POLLIN] on fd 10 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22824|vconn|DBG|unix#119114: received: OFPT_HELLO (OF1.5) (xid=0x1): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: version bitmap: 0x06 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22825|vconn|DBG|unix#119114: negotiated OpenFlow version 0x06 (we support version 0x06 and earlier, peer supports version 0x06) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22826|rconn|DBG|br-int<->unix#119114: connected May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22827|rconn|DBG|br-int<->unix#119114: entering ACTIVE May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22828|vconn|DBG|unix#119114: received: OFPST_METER_FEATURES request (OF1.5) (xid=0x2): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22829|vconn|DBG|unix#119114: sent (Success): OFPST_METER_FEATURES reply (OF1.5) (xid=0x2): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: max_meter:200000 max_bands:1 max_color:0 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: band_types: drop May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: capabilities: kbps pktps burst stats May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22830|rconn|DBG|br-int<->unix#119115: entering CONNECTING May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22831|poll_loop|DBG|wakeup due to [POLLOUT] on fd 60 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:153 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22832|vconn|DBG|unix#119115: sent (Success): OFPT_HELLO (OF1.5) (xid=0x1a88f): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22833|vconn|DBG|unix#119115: received: OFPT_HELLO (OF1.5) (xid=0x3): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: version bitmap: 0x06 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22834|vconn|DBG|unix#119115: negotiated OpenFlow version 0x06 (we support version 0x06 and earlier, peer supports version 0x06) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22835|rconn|DBG|br-int<->unix#119115: connected May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22836|rconn|DBG|br-int<->unix#119115: entering ACTIVE May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22837|poll_loop|DBG|wakeup due to [POLLIN] on fd 10 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22838|vconn|DBG|unix#119114: received: OFPST_METER_FEATURES request (OF1.5) (xid=0x4): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22839|vconn|DBG|unix#119114: sent (Success): OFPST_METER_FEATURES reply (OF1.5) (xid=0x4): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: max_meter:200000 max_bands:1 max_color:0 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: band_types: drop May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: capabilities: kbps pktps burst stats May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22840|poll_loop|DBG|wakeup due to [POLLIN] on fd 60 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22841|vconn|DBG|unix#119115: received: NXT_TLV_TABLE_REQUEST (OF1.5) (xid=0x5): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22842|vconn|DBG|unix#119115: sent (Success): NXT_TLV_TABLE_REPLY (OF1.5) (xid=0x5): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: max option space=256 max fields=64 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: allocated option space=4 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: mapping table: May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: class type length match field May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ------ ---- ------ -------------- May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: 0x102 0x80 4 tun_metadata0 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22843|poll_loop|DBG|wakeup due to [POLLIN] on fd 60 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22844|vconn|DBG|unix#119115: received: OFPT_FLOW_MOD (OF1.5) (xid=0x6): DEL table:255 priority=0 actions=drop May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22845|vconn|DBG|unix#119115: received: OFPT_GROUP_MOD (OF1.5) (xid=0x7): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: DEL group_id=4294967292,type=all May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22846|vconn|DBG|unix#119115: received: OFPT_METER_MOD (OF1.5) (xid=0x8): DEL meter=all bands= May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22847|poll_loop|DBG|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif.c:1963 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|01858|poll_loop(urcu5)|DBG|wakeup due to [POLLIN] on fd 40 (FIFO pipe:[18354]) at ../lib/ovs-rcu.c:363 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22848|hmap|DBG|../ofproto/ofproto-dpif-xlate.c:889: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (16 nodes total across 16 buckets) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|01859|poll_loop(urcu5)|DBG|wakeup due to [POLLIN] on fd 40 (FIFO pipe:[18354]) at ../lib/ovs-rcu.c:236 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|03015|poll_loop(revalidator55)|DBG|wakeup due to [POLLIN] on fd 46 (FIFO pipe:[42171]) at ../ofproto/ofproto-dpif-upcall.c:1021 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|01860|dpif(urcu5)|DBG|Dropped 37 log messages in last 0 seconds (most recently, 0 seconds ago) due to excessive rate May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|03016|dpif(revalidator55)|DBG|system@ovs-system: get_stats success May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|01861|dpif(urcu5)|DBG|system@ovs-system: DPIF meter 1 deleted May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|00961|poll_loop(revalidator54)|DBG|wakeup due to [POLLIN] on fd 44 (FIFO pipe:[44105]) at ../lib/ovs-thread.c:341 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|00996|poll_loop(revalidator53)|DBG|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[41383]) at ../lib/ovs-thread.c:341 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|00962|poll_loop(revalidator54)|DBG|wakeup due to [POLLIN] on fd 44 (FIFO pipe:[44105]) at ../lib/ovs-thread.c:341 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|00997|poll_loop(revalidator53)|DBG|wakeup due to [POLLIN] on fd 42 (FIFO pipe:[41383]) at ../lib/ovs-thread.c:341 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|00998|poll_loop(revalidator53)|DBG|wakeup due to 0-ms timeout at ../lib/ovs-thread.c:341 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|03017|poll_loop(revalidator55)|DBG|wakeup due to [POLLIN] on fd 46 (FIFO pipe:[42171]) at ../lib/ovs-thread.c:341 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22849|poll_loop|DBG|wakeup due to [POLLIN] on fd 54 (FIFO pipe:[28824]) at ../vswitchd/bridge.c:423 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|01862|poll_loop(urcu5)|DBG|wakeup due to [POLLIN] on fd 40 (FIFO pipe:[18354]) at ../lib/ovs-rcu.c:236 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22850|poll_loop|DBG|wakeup due to [POLLIN] on fd 49 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:274 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22851|rconn|DBG|br-int<->unix#119116: entering CONNECTING May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22852|poll_loop|DBG|wakeup due to [POLLOUT] on fd 62 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:153 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22853|vconn|DBG|unix#119116: sent (Success): OFPT_HELLO (OF1.5) (xid=0x1a890): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: version bitmap: 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22854|vconn|DBG|unix#119116: received: OFPT_HELLO (OF1.5) (xid=0xb): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: version bitmap: 0x06 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22855|vconn|DBG|unix#119116: negotiated OpenFlow version 0x06 (we support version 0x06 and earlier, peer supports version 0x06) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22856|rconn|DBG|br-int<->unix#119116: connected May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22857|rconn|DBG|br-int<->unix#119116: entering ACTIVE May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22858|vconn|DBG|unix#119116: received: OFPT_GET_CONFIG_REQUEST (OF1.5) (xid=0x36): May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22859|vconn|DBG|unix#119116: sent (Success): OFPT_GET_CONFIG_REPLY (OF1.5) (xid=0x36): frags=normal miss_send_len=0 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22860|vconn|DBG|unix#119116: received: NXT_SET_PACKET_IN_FORMAT (OF1.5) (xid=0x37): format=nxt_packet_in2 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22861|vconn|DBG|unix#119116: received: OFPT_PACKET_OUT (OF1.5) (xid=0x39): in_port=CONTROLLER actions=set_field:0xc->metadata,set_field:0x8000->reg15,set_field:0x10000/0x10000->reg10,resubmit(CONTROLLER,38) data_len=46 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: igmp,vlan_tci=0x0000,dl_src=0a:58:0a:83:00:01,dl_dst=01:00:5e:00:00:01,nw_src=10.131.0.1,nw_dst=224.0.0.1,nw_tos=0,nw_ecn=0,nw_ttl=1,igmp_type=17,igmp_code=10 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22862|vconn|DBG|unix#119116: received: OFPT_SET_CONFIG (OF1.5) (xid=0x43): frags=normal miss_send_len=65535 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22863|poll_loop|DBG|wakeup due to [POLLIN] on fd 60 (/var/run/openvswitch/br-int.mgmt<->) at ../lib/stream-fd.c:157 (0% CPU usage) May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22864|vconn|DBG|unix#119115: received: NXT_CT_FLUSH_ZONE (OF1.5) (xid=0x9): zone_id=0 May 17 01:49:03 ip-10-0-171-17 ovs-vswitchd[1361]: ovs|22865|ct_dpif|DBG|system@ovs-system: ct_flush: zone 0 note the ct_flush for zone 0...
Yes it seems to be ovn-controller reconnect that causes this, confirmed this on KIND: [DESTROY] tcp 6 96 TIME_WAIT src=172.18.0.4 dst=172.18.0.3 sport=6642 dport=55346 src=172.18.0.3 dst=172.18.0.4 sport=55346 dport=6642 delta-time=48 [start=Tue May 17 11:50:50 2022] [stop=Tue May 17 11:51:38 2022] [DESTROY] tcp 6 431999 ESTABLISHED src=172.18.0.3 dst=172.18.0.4 sport=55350 dport=6642 src=172.18.0.4 dst=172.18.0.3 sport=6642 dport=55350 [ASSURED] delta-time=0 [start=Tue May 17 11:51:38 2022] [stop=Tue May 17 11:51:38 2022] [NEW] tcp 6 300 ESTABLISHED src=172.18.0.4 dst=172.18.0.3 sport=6642 dport=55350 [UNREPLIED] src=172.18.0.3 dst=172.18.0.4 sport=55350 dport=6642 (I seem to have the fix Florian mentioned so my new entry gets the same port ^ and things work fine) but point being the secondary entry gets destroyed at 11:51:38 and indeed at the same time in OVS logs we see: 2022-05-17T11:51:38.895Z|06747|vconn|DBG|unix#5018: received: NXT_CT_FLUSH_ZONE (OF1.5) (xid=0x8): zone_id=0
OVN bug: https://bugzilla.redhat.com/show_bug.cgi?id=2087194
Follow-up... ovn-controller flushes specific conntrack zones when it restarts to ensure known state. It doesn't know whether it's been quickly restarted (like on OCP upgrade) or whether it's a fresh start or what; only the thing doing the restarting really knows that. It will flush the conntrack zones chosen for each LSP and those configured by the controller. ovn-kube sets LSP.options:snat-ct-zone=0 for the GW router port (since ovnkube shares the host IP with OVN) which is why it's flushing zone 0 on restart. We're discussing preventing flushing all zones by CMS/ovnkube choice, or never flushing zone 0, or other options. This zone flush has always happened in ovn-controller, but the bad behavior is only surfaced because of the conntrack issue in RHEL 8.6 that is solved by the revert in https://bugzilla.redhat.com/show_bug.cgi?id=2065266
Dan, I'm not sure this flush has always happened in ovn-controller. I am not able to reproduce in an older version of OVN. See https://bugzilla.redhat.com/show_bug.cgi?id=2087194#c3
(In reply to Tim Rozet from comment #22) > Dan, I'm not sure this flush has always happened in ovn-controller. I am not > able to reproduce in an older version of OVN. See > https://bugzilla.redhat.com/show_bug.cgi?id=2087194#c3 It does look like it's been there for zone 0 forever. Mis-use of atoi() and checking whether the result is 0 (which of course is the case for zone 0). http://patchwork.ozlabs.org/project/ovn/patch/20220518160952.9998-1-dceara@redhat.com/
Yeah it does actually happen in the older version of OVN, its just our kind deployment back then used local gateway mode by default instead of shared. In local gw mode this isn't an issue. It affects shared gateway mode which was supported in 4.8 and later.
As we get close to cutting 4.11 feature candidates, I'm wondering if there is a risk here for 4.10 -> 4.11 updates. So I'm asking the following questions to evaluate whether or not this bug warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the ImpactStatementRequested label has been added to this bug. When responding, please remove ImpactStatementRequested and set the ImpactStatementProposed label. The expectation is that the assignee answers these questions. Which 4.y.z to 4.y'.z' updates increase vulnerability? Which types of clusters? * reasoning: This allows us to populate from, to, and matchingRules in conditional update recommendations for "the $SOURCE_RELEASE to $TARGET_RELEASE update is not recommended for clusters like $THIS". * example: Customers upgrading from 4.10 to 4.11 running OVN. Check your vulnerability with 'oc get -o jsonpath='{.status.networkType}{"\n"}' network cluster' or the following PromQL ... Wondering about PromQL, is there an RFE somewhere about growing a metric that exposes the cluster Network's status.networkType? Because we're still dragging around [2] as a recording rule for ... , and that seems pretty awkward to use in-cluster, when the network type string seems like something that would be a great way to distinguish "is this cluster vulnerable to bug X?". What is the impact? Is it serious enough to warrant removing update recommendations? * reasoning: This allows us to populate name and message in conditional update recommendations for "...because if you update, $THESE_CONDITIONS may cause $THESE_UNFORTUNATE_SYMPTOMS". * example: Around 2 minute disruption in edge routing for 10% of clusters. Check with oc .... * example: Up to 90 seconds of API downtime. Check with curl .... * example: etcd loses quorum and you have to restore from backup. Check with ssh .... How involved is remediation? * reasoning: This allows administrators who are already vulnerable, or who chose to waive conditional-update risks, to recover their cluster. And even moderately serious impacts might be acceptable if they are easy to mitigate. * example: Issue resolves itself after five minutes. * example: Admin can run a single: oc .... * example: Admin must SSH to hosts, restore from backups, or other non standard admin activities. Is this a regression? * reasoning: Updating between two vulnerable releases may not increase exposure (unless rebooting during the update increases vulnerability, etc.). We only qualify update recommendations if the update increases exposure. * example: No, it has always been like this we just never noticed. * example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1. [1]: https://github.com/openshift/enhancements/tree/master/enhancements/update/update-blocker-lifecycle [2]: https://github.com/openshift/telemeter/blob/987128a17c198270ff22cf36ff7b7df396f51826/jsonnet/telemeter/rules.libsonnet#L90-L107
Which 4.y.z to 4.y'.z' updates increase vulnerability? Which types of clusters? There is no increase in vulnerability. The primary bug of removing conntrack entries has existed in shared gateway mode since 4.8 during ovn-controller restart. This was just compounded by a kernel bug found in RHCOS 8.6 which is not currently present in any OCP version. What is the impact? Is it serious enough to warrant removing update recommendations? The impact of the primary bug is during ovn-controller restart (which is typical during an upgrade across all nodes) there could be temporary traffic outage/disruption. Since this has always existed it doesn't warrant any new update recommendation other than moving to a version with the fix across previous z streams once backported. How involved is remediation? The issue should resolve itself. Some TCP connections may become stale, but applications will typically reconnect. Other conntrack entries will be rebuilt. It mainly affects traffic that is NAT'ed, like service traffic or cluster egress traffic from pods. Is this a regression? No, has existed since 4.8.
> This was just compounded by a kernel bug found in RHCOS 8.6 which is not currently present in any OCP version. But we'll shortly have 4.11 FCs built off RHCOS 8.6, right? Anyhow, "temporary traffic outage/disruption" and "The issue should resolve itself" are sufficient for "not worth doing any graph-data work" for me, so I'm removing UpgradeBlocker. But if folks feel like I'm misunderstanding, feel free to explain why and add UpgradeBlocker back on, and we'll do a fresh round of assessment.
Step 2 should be ovs-appctl -t /var/run/openvswitch/ovs-vswitchd.<pid>.ctl vlog/set dbg Reproduced CT_FLUSH for zone_id=0 on 4.11.0-0.nightly-2022-05-20-213928. Next step is test fixed version. May 26 14:05:43 ip-10-0-184-88 ovs-vswitchd[1238]: ovs|01069|vconn|DBG|unix#146: received: NXT_CT_FLUSH_ZONE (OF1.5) (xid=0x9): zone_id=0
Verified on 4.11.0-0.nightly-2022-05-25-123329. After test procedure in comment 32 and comment 34, no CT_FLUSH entries seen after kill/restart of ovn-controller.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069