Description of problem: HAProxy performance is off by a factor of ~7x for OVNKubernetes as SDN, when compared with OpenShiftSDN. Following are the results: 1 cpt OVN439 OCPSDN439 1ka 820 9328 10ka 7750 38220 100ka 23687 55724 40 cpt 1ka 1970 10490 10ka 12411 40393 100ka 27332 69705 200 cpt 1ka 1147 10333 10ka 11372 37000 100ka 16443 65535 ka: HTTP keep-alive requests cpt : Number of connections (clients) per route/target Livenessprobe: http-get http://:1936/healthz delay=10s timeout=1s period=120s #success=1 #failure=3 Version-Release number of selected component (if applicable):4.3.9
During the router stress tests, I found out a similar behavior in OCP 4.4-rc2 with OVN. Liveness and readiness probe failures during the workload: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 12m (x4 over 6h42m) kubelet, ip-10-0-167-160.us-west-2.compute.internal Liveness probe failed: Get http://10.129.4.9:1936/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 13s (x108 over 7h40m) kubelet, ip-10-0-167-160.us-west-2.compute.internal Readiness probe failed: Get http://10.129.4.9:1936/healthz/ready: net/http: request canceled while waiting for connection (Client.Timeout ex ceeded while awaiting headers) I've also observed some packet dropping in several interfaces of the node where the router is running on, specially on the router pod's logical port: sh-4.4# ip -s l show f61b9e59248c73f 18: f61b9e59248c73f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8901 qdisc noqueue master ovs-system state UP mode DEFAULT group default link/ether d2:da:ee:61:eb:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 5 RX: bytes packets errors dropped overrun mcast 24800075635 52078562 0 642620 0 0 TX: bytes packets errors dropped carrier collsns 21293577006 42229469 0 438562 0 0 sh-4.2# ovs-vsctl get interface f61b9e59248c73f external_ids statistics {attached_mac="32:96:30:80:04:0a", iface-id="openshift-ingress_router-default-74688f77c4-ngdj8", ip_address="10.128.4.9/23", sandbox="f61b9e59248c73fe6da51f20b8b30561be5a5ba2406a64a264f5b26dabee78d2"} {collisions=0, rx_bytes=24799022912, rx_crc_err=0, rx_dropped=1285240, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=52070924, tx_bytes=21292989399, tx_dropped=877124, tx_errors=0, tx_packets=42223720}
Additional information from more iterations of the test. The test was repeated 3 times and the best result out of the 3 is taken to compare performance between OpenshiftSDN and OVNKubernetes. There is a performance degradation of ~80% to ~35% with OVNKubernetes where more degradation is noticed when there are fewer keep-alive connections. only 1 keep-alive connection vs 100 keep alives. Here is a table comparison 1ka 10ka 100ka Iteration1 70.00% 40.00% 35.00% Iteration2 86.00% 62.00% 57.00% Iteration3 80.00% 65.00% 60.00% The detailed data is attached to the bug - ComparisonResults.ods
Created attachment 1679069 [details] Comparison Results between the two tests
So digging into this revealed that the maximum number of conntrack connections limit in the kernel is being hit. This is because connections are being left in conntrack in established state after they are closed. Connections left in established state will timeout after 5 days, so they stick around. The way this happens is when a TCP RST is sent. In this particular setup, HA proxy is sending the RST, but I managed to reproduce this my env with a simple python script: import argparse import socket import sys import struct import time parser = argparse.ArgumentParser(description='A tutorial of argparse!') parser.add_argument("--src-port", type=int, default=11337, help="source port to use") parser.add_argument("--dst-port", type=int, help="dst port to use") parser.add_argument("--dst-ip", help="server ip to use") args = parser.parse_args() sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server_address = (args.dst_ip, args.dst_port) sock.bind(('0.0.0.0', args.src_port)) sock.connect(server_address) l_onoff = 1 l_linger = 0 time.sleep(1) sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger)) sock.close() ^above will force a TCP RST on close In my one node KIND setup: [trozet@trozet ovn-kubernetes]$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 9m49s 10.244.0.6 ovn-control-plane <none> <none> webserver1 1/1 Running 0 9m58s 10.244.0.5 ovn-control-plane <none> <none> [trozet@trozet ovn-kubernetes]$ kubectl exec -it pod1 /bin/bash [root@pod1 /]# curl 10.244.0.5 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> root@ovn-control-plane:/# conntrack -L |grep 10.244.0.5 tcp 6 83 TIME_WAIT src=10.244.0.6 dst=10.244.0.5 sport=40056 dport=80 src=10.244.0.5 dst=10.244.0.6 sport=80 dport=40056 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=15 use=1 tcp 6 83 TIME_WAIT src=10.244.0.6 dst=10.244.0.5 sport=40056 dport=80 src=10.244.0.5 dst=10.244.0.6 sport=80 dport=40056 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=16 use=1 Now, when I run the script to force a RST: [root@pod1 /]# python sock_rst.py --src-port 11337 --dst-port 80 --dst-ip 10.244.0.5 [root@pod1 /]# root@ovn-control-plane:/# conntrack -L |grep 10.244.0.5 tcp 6 86395 ESTABLISHED src=10.244.0.6 dst=10.244.0.5 sport=11337 dport=80 src=10.244.0.5 dst=10.244.0.6 sport=80 dport=11337 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=15 use=1 tcp 6 86395 ESTABLISHED src=10.244.0.6 dst=10.244.0.5 sport=11337 dport=80 src=10.244.0.5 dst=10.244.0.6 sport=80 dport=11337 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=16 use=1 ^ The connections are now stuck for the next 5 days. So the question is, is this behavior correct or a bug with conntrack?
Created attachment 1680691 [details] pcap showing RST
Thanks to Marcelo for trying this test without OVS. He found that it works correctly with just netns and the kernel. That means that somehow the TCP RST traffic is not being punted to CT and something is wrong with the OVS flows. Will dig into that tmrw.
Identified that the cause is RST packets getting leaked through OVN without going through CT: table=0 (ls_out_pre_lb ), priority=110 , match=(nd || nd_rs || nd_ra || icmp4.type == 3 ||icmp6.type == 1 || (tcp && tcp.flags == 20)), action=(next;) table=40 tcp,metadata=0x2,tcp_flags=rst|ack actions=resubmit(,41) table=40 tcp6,metadata=0x2,tcp_flags=rst|ack actions=resubmit(,41) This was due to the fix for: https://bugzilla.redhat.com/show_bug.cgi?id=1805651 Moving this bug to OVN.
http://patchwork.ozlabs.org/project/openvswitch/patch/20200424075507.1811244-1-numans@ovn.org/ [PATCH] Fix conntrack entry leaks because of TCP RST packets not sent to conntrack.
Fix is available in ovn2.13-2.13.0-20 and later as of April 27th
per #comment 20, this issue was fixed on ovn2.13-2.13.0-20 and I did test about the tcp rst,before the fix,the contrack staus will stay establish not closed or closing when recieve the tcp rst # rpm -qa|grep ovn ovn2.13-central-2.13.0-11.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-basic-1.0-24.noarch ovn2.13-2.13.0-11.el7fdp.x86_64 ovn2.13-host-2.13.0-11.el7fdp.x86_64 tcp,orig=(src=10.0.0.4,dst=10.0.0.10,sport=11337,dport=80),reply=(src=10.0.0.3,dst=10.0.0.4,sport=80,dport=11337),zone=6,protoinfo=(state=ESTABLISHED) after the fix,the contrack staus will change to closed or closing when recieve the tcp rst # rpm -qa|grep ovn ovn2.13-2.13.0-21.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-basic-1.0-24.noarch ovn2.13-central-2.13.0-21.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch ovn2.13-host-2.13.0-21.el7fdp.x86_64 tcp,orig=(src=10.0.0.4,dst=10.0.0.10,sport=11337,dport=80),reply=(src=10.0.0.3,dst=10.0.0.4,sport=80,dport=11337),zone=6,protoinfo=(state=CLOSING)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2317