Bug 1969979 - TCP Port exhaustion in events handling
Summary: TCP Port exhaustion in events handling
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Service Telemetry Framework
Classification: Red Hat
Component: sg-core-container
Version: 1.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z2
: 1.3 (STF)
Assignee: Chris Sibbitt
QA Contact: Leonid Natapov
Joanne O'Flynn
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-09 15:01 UTC by Chris Sibbitt
Modified: 2021-10-04 18:21 UTC (History)
3 users (show)

Fixed In Version: sg-core-container-4.0.3-5
Doc Type: Bug Fix
Doc Text:
Before this update, code delivered 28231 events and stopped working because it consumed all available outbound TCP ports. As a result, delivery of events stopped when all outbound TCP ports were consumed. To re-use persistent TCP connections in the default HTTP transport, close and consume the response body. As a result, persistent TCP connections can be re-used without exhausting all available outbound TCP connections, resulting in event delivery without a limit.
Clone Of:
Environment:
Last Closed: 2021-10-04 18:21:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github infrawatch sg-core pull 50 0 None open Fix TCP port exhaustion 2021-08-09 19:16:29 UTC
Red Hat Product Errata RHBA-2021:3721 0 None None None 2021-10-04 18:21:02 UTC

Description Chris Sibbitt 2021-06-09 15:01:49 UTC
Description of problem:

Existing code delivers 28231 events and then stops working because it has
consumed all available outbound TCP ports.

"It is critical to both close the response body and to consume it,
in order to re-use persistent TCP connections in the default HTTP transport."

https://github.com/elastic/go-elasticsearch/blob/master/README.md#usage


How reproducible: Always


Steps to Reproduce:
1. Send >30,000 events (rsyslog, or presumably ceilometer or collectd)
2. Observe that only 28231 make it to Elasticsearch
3. Observe that there are 28232 TCP connections from the sg pod to the ES pod


Actual results:

$ ./esquery /sglogs-osp_cloudops_02.2021.06.07/_count
{"count":28231,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

$ oc rsh elasticsearch-es-default-0
sh-4.4$ ss -tnp | tail -2
ESTAB 0 0 [::ffff:10.129.3.21]:9200 [::ffff:10.128.3.36]:42332 users:(("java",pid=7,fd=1863)) 
ESTAB 0 0 [::ffff:10.129.3.21]:9200 [::ffff:10.128.3.36]:59351 users:(("java",pid=7,fd=24131))
sh-4.4$ ss -tnp | grep 10.128.3.36 | wc
 28232 169392 4545352


Expected results:
* One (or a few) TCP connections, and an uninterrupted flow of events


Additional info:

* This was found during testing of an unreleased logging feature, but the code appears to apply to all events
* A patch is actively being worked on and will be tested ASAP

Comment 8 Paul Leimer 2021-09-30 17:02:18 UTC
Tested the down stream container and I no longer see TCP port exhaustion

Comment 13 errata-xmlrpc 2021-10-04 18:21:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Service Telemetry Framework 1.3.2 - Container Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3721


Note You need to log in before you can comment on or make changes to this bug.