Bug 1915080 - Large number of tcp connections with shiftstack ocp cluster in about 24 hours
Summary: Large number of tcp connections with shiftstack ocp cluster in about 24 hours
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.0
Assignee: Yossi Boaron
QA Contact: Oleg Sher
URL:
Whiteboard:
: 1906194 (view as bug list)
Depends On:
Blocks: 1926732
TreeView+ depends on / blocked
 
Reported: 2021-01-11 22:00 UTC by Alex Krzos
Modified: 2021-03-08 14:55 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1926730 1926732 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:51:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Prometheus data for node_netstat_Tcp_CurrEstab (125.77 KB, image/png)
2021-01-11 22:00 UTC, Alex Krzos
no flags Details
Established connections (3.65 MB, text/plain)
2021-01-13 16:33 UTC, Alex Krzos
no flags Details
Updated screenshot showing tcp established connections growth. (153.06 KB, image/png)
2021-01-13 16:37 UTC, Alex Krzos
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 117 0 None closed Bug 1915080: add CloseIdleConnections for HTTP K8S API healthcheck 2021-02-19 06:22:34 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:52:26 UTC

Description Alex Krzos 2021-01-11 22:00:18 UTC
Created attachment 1746452 [details]
Prometheus data for node_netstat_Tcp_CurrEstab

Version:
$ openshift-install version
./openshift-install 4.6.7
built from commit ae223bc615b6f08afd679cf03bfecaf575dd47bb
release image quay.io/openshift-release-dev/ocp-release@sha256:494dd5029f03c7b5c8f4404c9281b5ad435403a6f7a29daecc82381d66cf5d89

Platform:
openstack 16.1

Please specify:
IPI

What happened?
There is a very large number of current establish tcp connections (58,000-60,000) compared to AWS.  Unclear at the moment if this is causing us issues however we have seen scalability issues with RHACM on OCP on OSP compared to OCP on AWS.  On AWS we have around 4500-5000 current established connections.




#See the troubleshooting documentation (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md) for ideas about what information to collect.

#For example, 

# If the installer fails to create resources (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md#installer-fails-to-create-resources), attach the relevant portions of your `.openshift_install.log.`
# If the installer fails to bootstrap the cluster (https://github.com/openshift/installer/blob/master/docs/user/troubleshootingbootstrap.md), attach the bootstrap log bundle.
# If the installer fails to complete installation after bootstrapping completes (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md#installer-fails-to-initialize-the-cluster), attach the must-gather log bundle using `oc adm must-gather`

# Always at least include the `.openshift_install.log`

What did you expect to happen?

The number of established tcp connections to roughly match between on-prem and on a cloud

How to reproduce it (as minimally and precisely as possible)?


Anything else we need to know?

See attached screen shot.  The number of established connections grows once a cluster is built and running until it tops off at around 58000-60000 connections in about 24 hours

from off a master node:

sh-4.4# netstat -s | grep conn
    3448928 active connection openings
    2394862 passive connection openings
    545316 failed connection attempts
    914000 connection resets received
    58410 connections established
    4573 packetes rejected in established connections because of timestamp
    1050182 connections reset due to unexpected data
    310479 connections reset due to early user close
    2 connections aborted due to timeout

Perhaps the check scripts for openshift-openstack-infra are not closing connections to the api and haproxy properly?

Comment 1 Adolfo Duarte 2021-01-13 09:03:35 UTC
@Alex Kroz

Any chance we can get the port number the connections are established  to? 
and the full output of netstat -s may be useful as well. 
and also 
netstat -nau  to get an idea of the tcp ports.

or 
netstat -nau | grep ESTABLISHED > somefilebecauseislarge.txt

Comment 2 Pierre Prinetti 2021-01-13 15:14:28 UTC
Accepted as valid; to be investigated.

Tentatively setting the severity to "low"; to be revisited if some impact is detected on clusters.

Comment 3 Alex Krzos 2021-01-13 16:32:30 UTC
(In reply to Adolfo Duarte from comment #1)
> @Alex Kroz
> 
> Any chance we can get the port number the connections are established  to? 
> and the full output of netstat -s may be useful as well. 
> and also 
> netstat -nau  to get an idea of the tcp ports.
> 
> or 
> netstat -nau | grep ESTABLISHED > somefilebecauseislarge.txt

Off master-0

# netstat -s
Ip:
    Forwarding: 1
    677095483 total packets received
    24115304 forwarded
    0 incoming packets discarded
    652962145 incoming packets delivered
    579072763 requests sent out
    8232 outgoing packets dropped
    423 dropped because of missing route
Icmp:
    42594 ICMP messages received
    49 input ICMP message failed
    ICMP input histogram:
        destination unreachable: 42593
        timeout in transit: 1
    52816 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 44891
        redirect: 7925
IcmpMsg:
        InType3: 42593
        InType11: 1
        OutType3: 44891
        OutType5: 7925
Tcp:
    5458764 active connection openings
    3803949 passive connection openings
    855534 failed connection attempts
    1449090 connection resets received
    48205 connections established
    1075856680 segments received
    1233135213 segments sent out
    138885 segments retransmitted
    0 bad segments received
    4837830 resets sent
Udp:
    125620098 packets received
    44564 packets to unknown port received
    0 packet receive errors
    1236699 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    81 invalid SYN cookies received
    8 packets pruned from receive queue because of socket buffer overrun
    15 ICMP packets dropped because they were out-of-window
    1142247 TCP sockets finished time wait in fast timer
    19069 time wait sockets recycled by time stamp
    5852 packetes rejected in established connections because of timestamp
    14601950 delayed acks sent
    1841 delayed acks further delayed because of locked socket
    Quick ack mode was activated 90996 times
    74 times the listen queue of a socket overflowed
    74 SYNs to LISTEN sockets dropped
    208293095 packet headers predicted
    319649263 acknowledgments not containing data payload received
    194189954 predicted acknowledgments
    TCPSackRecovery: 7569
    Detected reordering 155401 times using SACK
    Detected reordering 715 times using time stamp
    3714 congestion windows fully recovered without slow start
    487 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 2348
    23 congestion windows recovered without slow start after partial ack
    TCPLostRetransmit: 4266
    TCPSackFailures: 16
    12778 fast retransmits
    TCPTimeouts: 6883
    TCPLossProbes: 126375
    TCPLossProbeRecovery: 87
    TCPSackRecoveryFail: 523
    TCPBacklogCoalesce: 1337411
    TCPDSACKOldSent: 91196
    TCPDSACKOfoSent: 71
    TCPDSACKRecv: 104750
    TCPDSACKOfoRecv: 34
    1661653 connections reset due to unexpected data
    495095 connections reset due to early user close
    76 connections aborted due to timeout
    2 times unable to send RST due to no memory
    TCPDSACKIgnoredOld: 132
    TCPDSACKIgnoredNoUndo: 88417
    TCPSackShifted: 14290
    TCPSackMerged: 37705
    TCPSackShiftFallback: 200618
    TCPRcvCoalesce: 22390816
    TCPOFOQueue: 104674
    TCPOFOMerge: 71
    TCPChallengeACK: 4185
    TCPSpuriousRtxHostQueues: 1786
    TCPAutoCorking: 20637
    TCPFromZeroWindowAdv: 7
    TCPToZeroWindowAdv: 7
    TCPWantZeroWindowAdv: 159
    TCPSynRetrans: 1120
    TCPOrigDataSent: 482111603
    TCPHystartTrainDetect: 6580
    TCPHystartTrainCwnd: 132741
    TCPHystartDelayDetect: 2
    TCPHystartDelayCwnd: 184
    TCPACKSkippedPAWS: 1190
    TCPACKSkippedSeq: 1685
    TCPACKSkippedChallenge: 18
    TCPKeepAlive: 280264131
    TCPDelivered: 482402249
    TCPAckCompressed: 3626
IpExt:
    InNoRoutes: 1
    InMcastPkts: 7098323
    OutMcastPkts: 1216946
    InOctets: 352544381659
    OutOctets: 355411710695
    InMcastOctets: 1136073832
    OutMcastOctets: 185810790
    InNoECTPkts: 680308975

Comment 4 Alex Krzos 2021-01-13 16:33:57 UTC
Created attachment 1747109 [details]
Established connections

Note, since yesterday there was a DNS outage in the lab which "reset" all of the established connections.

Comment 5 Alex Krzos 2021-01-13 16:37:29 UTC
Created attachment 1747112 [details]
Updated screenshot showing tcp established connections growth.

Note how rebooting one master reduced established connections on the other two master nodes proportionally and (as expected) reset that nodes established connection count on the rebooted node.  Also note that a lab dns outage reset the established connection count as well.

Comment 6 egarcia 2021-01-20 20:29:32 UTC
By the way, I was thinking about your dns outage and I have a theory. We host the lb and dns on the master nodes. Each lb allows for a maximum of 20,000 active tcp connections. That means that once you hit 60,000 no new connections will be accepted by the load balancer. This would mean that the healthcheck, that occurs every 2 seconds or so, will not be able to query haproxy's healthz api, resulting in the vip being moved around the master nodes until one of the health checks go through. This absolutely has implications at scale, because as the number of services that run on your cluster grows, it will rapidly approach that upper limit to the number of connections allowed per node. That may mean that you need more master nodes in order to maintain larger clusters for all platforms using this networking architecture...

Comment 7 Alex Krzos 2021-01-25 14:02:26 UTC
(In reply to egarcia from comment #6)
> By the way, I was thinking about your dns outage and I have a theory. We
> host the lb and dns on the master nodes. Each lb allows for a maximum of
> 20,000 active tcp connections. That means that once you hit 60,000 no new
> connections will be accepted by the load balancer. This would mean that the
> healthcheck, that occurs every 2 seconds or so, will not be able to query
> haproxy's healthz api, resulting in the vip being moved around the master
> nodes until one of the health checks go through. This absolutely has
> implications at scale, because as the number of services that run on your
> cluster grows, it will rapidly approach that upper limit to the number of
> connections allowed per node. That may mean that you need more master nodes
> in order to maintain larger clusters for all platforms using this networking
> architecture...

Is there a hard limit on 20000 connections for haproxy for this application?

Comment 8 egarcia 2021-01-28 14:49:58 UTC
yes, per insance of haproxy. There is an instance on each master node.

Comment 9 Oleg Sher 2021-02-03 09:56:50 UTC
verified in
(.venv) 17:05:15 ocp-edge-auto-sher(master) > oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-02-02-052812   True        False         144m    Cluster version is 4.7.0-0.nightly-2021-02-02-052812

Managed, 3 masters, 2 workers
baremetal ipv6

[core@master-0-0 ~]$ echo `uptime` > net.log 
[core@master-0-0 ~]$ for i in {1..10}; do echo `date` >> net.log; netstat -s | grep "connections established" >> net.log; sleep 60; done
[core@master-0-0 ~]$ cat net.log 
09:11:43 up 21:13, 1 user, load average: 1.27, 0.92, 0.90
Wed Feb 3 09:11:46 UTC 2021
    1038 connections established
Wed Feb 3 09:12:46 UTC 2021
    1040 connections established
Wed Feb 3 09:13:46 UTC 2021
    1029 connections established
Wed Feb 3 09:14:46 UTC 2021
    1034 connections established
Wed Feb 3 09:15:46 UTC 2021
    1021 connections established
Wed Feb 3 09:16:46 UTC 2021
    1037 connections established
Wed Feb 3 09:17:46 UTC 2021
    1037 connections established
Wed Feb 3 09:18:46 UTC 2021
    1032 connections established
Wed Feb 3 09:19:47 UTC 2021
    1036 connections established
Wed Feb 3 09:20:47 UTC 2021
    1035 connections established

Comment 12 egarcia 2021-02-17 17:01:39 UTC
*** Bug 1906194 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2021-02-24 15:51:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 16 Lucas López Montero 2021-03-08 14:55:14 UTC
KCS article related to this problem has been written: 

https://access.redhat.com/solutions/5865521


Note You need to log in before you can comment on or make changes to this bug.