Bug 1819785

Summary: HAProxy Router performance is off with OVNKubernetes as SDN on OCP 4.3.9 deployed on AWS
Product: Red Hat Enterprise Linux Fast Datapath Reporter: agopi
Component: ovn2.13Assignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: ying xu <yinxu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: RHEL 8.0CC: aos-bugs, bbennett, bperkins, ctrautma, dblack, dcbw, jiji, jishi, jtaleric, mleitner, mmichels, ralongi, rkhan, rsevilla
Target Milestone: ---Keywords: Performance, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-43
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1830370 (view as bug list) Environment:
Last Closed: 2020-05-26 14:07:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1830370    
Attachments:
Description Flags
Comparison Results between the two tests
none
pcap showing RST none

Description agopi 2020-04-01 14:51:56 UTC
Description of problem:

HAProxy performance is off by a factor of ~7x for OVNKubernetes as SDN, when compared with OpenShiftSDN. 

Following are the results:

1 cpt	OVN439	OCPSDN439
1ka	820	9328
10ka	7750	38220
100ka	23687	55724
		
40 cpt		
1ka	1970	10490
10ka	12411	40393
100ka	27332	69705
		
200 cpt		
1ka	1147	10333
10ka	11372	37000
100ka	16443	65535

ka: HTTP keep-alive requests
cpt : Number of connections (clients) per route/target

Livenessprobe: http-get http://:1936/healthz delay=10s timeout=1s period=120s #success=1 #failure=3


Version-Release number of selected component (if applicable):4.3.9

Comment 2 Raul Sevilla 2020-04-07 11:10:11 UTC
During the router stress tests, I found out a similar behavior in OCP 4.4-rc2 with OVN.


Liveness and readiness probe failures during the workload:

  Type     Reason     Age                    From                                                 Message
  ----     ------     ----                   ----                                                 -------
  Warning  Unhealthy  12m (x4 over 6h42m)    kubelet, ip-10-0-167-160.us-west-2.compute.internal  Liveness probe failed: Get http://10.129.4.9:1936/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded 
while awaiting headers)
  Warning  Unhealthy  13s (x108 over 7h40m)  kubelet, ip-10-0-167-160.us-west-2.compute.internal  Readiness probe failed: Get http://10.129.4.9:1936/healthz/ready: net/http: request canceled while waiting for connection (Client.Timeout ex
ceeded while awaiting headers)


I've also observed some packet dropping in several interfaces of the node where the router is running on, specially on the router pod's logical port:


sh-4.4# ip -s l show f61b9e59248c73f  
18: f61b9e59248c73f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8901 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether d2:da:ee:61:eb:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 5
    RX: bytes  packets  errors  dropped overrun mcast   
    24800075635 52078562 0       642620  0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    21293577006 42229469 0       438562  0       0     


sh-4.2# ovs-vsctl get interface f61b9e59248c73f external_ids statistics
{attached_mac="32:96:30:80:04:0a", iface-id="openshift-ingress_router-default-74688f77c4-ngdj8", ip_address="10.128.4.9/23", sandbox="f61b9e59248c73fe6da51f20b8b30561be5a5ba2406a64a264f5b26dabee78d2"}
{collisions=0, rx_bytes=24799022912, rx_crc_err=0, rx_dropped=1285240, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=52070924, tx_bytes=21292989399, tx_dropped=877124, tx_errors=0, tx_packets=42223720}

Comment 3 Siva Reddy 2020-04-15 13:54:25 UTC
Additional information from more iterations of the test. The test was repeated 3 times and the best result out of the 3 is taken to compare performance between OpenshiftSDN and OVNKubernetes. 

There is a performance degradation of ~80% to ~35% with OVNKubernetes where more degradation is noticed when there are fewer keep-alive connections. only 1 keep-alive connection vs 100 keep alives. Here is a table comparison

             1ka	10ka	100ka
Iteration1 70.00%	40.00%	35.00%
Iteration2 86.00%	62.00%	57.00%
Iteration3 80.00%	65.00%	60.00%

The detailed data is attached to the bug - ComparisonResults.ods

Comment 4 Siva Reddy 2020-04-15 13:56:21 UTC
Created attachment 1679069 [details]
Comparison Results between the two tests

Comment 5 Tim Rozet 2020-04-21 20:55:44 UTC
So digging into this revealed that the maximum number of conntrack connections limit in the kernel is being hit. This is because connections are being left in conntrack in established state after they are closed. Connections left in established state will timeout after 5 days, so they stick around. The way this happens is when a TCP RST is sent. In this particular setup, HA proxy is sending the RST, but I managed to reproduce this my env with a simple python script:

import argparse
import socket
import sys
import struct
import time

parser = argparse.ArgumentParser(description='A tutorial of argparse!')
parser.add_argument("--src-port", type=int, default=11337, help="source port to use")
parser.add_argument("--dst-port", type=int, help="dst port to use")
parser.add_argument("--dst-ip", help="server ip to use")
args = parser.parse_args()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = (args.dst_ip, args.dst_port)
sock.bind(('0.0.0.0', args.src_port))
sock.connect(server_address)
l_onoff = 1
l_linger = 0
time.sleep(1)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))
sock.close()

^above will force a TCP RST on close

In my one node KIND setup:
[trozet@trozet ovn-kubernetes]$ kubectl get pod -o wide
NAME         READY   STATUS    RESTARTS   AGE     IP           NODE                NOMINATED NODE   READINESS GATES
pod1         1/1     Running   0          9m49s   10.244.0.6   ovn-control-plane   <none>           <none>
webserver1   1/1     Running   0          9m58s   10.244.0.5   ovn-control-plane   <none>           <none>

[trozet@trozet ovn-kubernetes]$ kubectl exec -it pod1 /bin/bash
[root@pod1 /]# curl 10.244.0.5
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

root@ovn-control-plane:/# conntrack -L |grep 10.244.0.5
tcp      6 83 TIME_WAIT src=10.244.0.6 dst=10.244.0.5 sport=40056 dport=80 src=10.244.0.5 dst=10.244.0.6 sport=80 dport=40056 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=15 use=1
tcp      6 83 TIME_WAIT src=10.244.0.6 dst=10.244.0.5 sport=40056 dport=80 src=10.244.0.5 dst=10.244.0.6 sport=80 dport=40056 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=16 use=1

Now, when I run the script to force a RST:
[root@pod1 /]# python sock_rst.py --src-port 11337 --dst-port 80 --dst-ip 10.244.0.5
[root@pod1 /]#


root@ovn-control-plane:/# conntrack -L |grep 10.244.0.5
tcp      6 86395 ESTABLISHED src=10.244.0.6 dst=10.244.0.5 sport=11337 dport=80 src=10.244.0.5 dst=10.244.0.6 sport=80 dport=11337 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=15 use=1
tcp      6 86395 ESTABLISHED src=10.244.0.6 dst=10.244.0.5 sport=11337 dport=80 src=10.244.0.5 dst=10.244.0.6 sport=80 dport=11337 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=16 use=1

^ The connections are now stuck for the next 5 days. So the question is, is this behavior correct or a bug with conntrack?

Comment 6 Tim Rozet 2020-04-21 21:16:03 UTC
Created attachment 1680691 [details]
pcap showing RST

Comment 7 Tim Rozet 2020-04-21 21:43:50 UTC
Thanks to Marcelo for trying this test without OVS. He found that it works correctly with just netns and the kernel. That means that somehow the TCP RST traffic is not being punted to CT and something is wrong with the OVS flows. Will dig into that tmrw.

Comment 8 Tim Rozet 2020-04-22 14:34:36 UTC
Identified that the cause is RST packets getting leaked through OVN without going through CT:

  table=0 (ls_out_pre_lb      ), priority=110  , match=(nd || nd_rs || nd_ra || icmp4.type == 3 ||icmp6.type == 1 || (tcp && tcp.flags == 20)), action=(next;)
    table=40 tcp,metadata=0x2,tcp_flags=rst|ack actions=resubmit(,41)
    table=40 tcp6,metadata=0x2,tcp_flags=rst|ack actions=resubmit(,41)


This was due to the fix for:
https://bugzilla.redhat.com/show_bug.cgi?id=1805651

Moving this bug to OVN.

Comment 14 Dan Williams 2020-04-28 13:54:11 UTC
http://patchwork.ozlabs.org/project/openvswitch/patch/20200424075507.1811244-1-numans@ovn.org/
[PATCH] Fix conntrack entry leaks because of TCP RST packets not sent to conntrack.

Comment 19 Dan Williams 2020-05-01 18:23:19 UTC
Fix is available in ovn2.13-2.13.0-20 and later as of April 27th

Comment 21 ying xu 2020-05-07 09:02:31 UTC
per #comment 20, this issue was fixed on ovn2.13-2.13.0-20

and I did test about the tcp rst,before the fix,the contrack staus will stay establish not closed or closing when recieve the tcp rst

# rpm -qa|grep ovn
ovn2.13-central-2.13.0-11.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-basic-1.0-24.noarch
ovn2.13-2.13.0-11.el7fdp.x86_64
ovn2.13-host-2.13.0-11.el7fdp.x86_64

tcp,orig=(src=10.0.0.4,dst=10.0.0.10,sport=11337,dport=80),reply=(src=10.0.0.3,dst=10.0.0.4,sport=80,dport=11337),zone=6,protoinfo=(state=ESTABLISHED)


after the fix,the contrack staus will change to closed or closing when recieve the tcp rst
# rpm -qa|grep ovn
ovn2.13-2.13.0-21.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-basic-1.0-24.noarch
ovn2.13-central-2.13.0-21.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-common-1.0-7.noarch
ovn2.13-host-2.13.0-21.el7fdp.x86_64

tcp,orig=(src=10.0.0.4,dst=10.0.0.10,sport=11337,dport=80),reply=(src=10.0.0.3,dst=10.0.0.4,sport=80,dport=11337),zone=6,protoinfo=(state=CLOSING)

Comment 23 errata-xmlrpc 2020-05-26 14:07:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2317