Description of problem: Existing code delivers 28231 events and then stops working because it has consumed all available outbound TCP ports. "It is critical to both close the response body and to consume it, in order to re-use persistent TCP connections in the default HTTP transport." https://github.com/elastic/go-elasticsearch/blob/master/README.md#usage How reproducible: Always Steps to Reproduce: 1. Send >30,000 events (rsyslog, or presumably ceilometer or collectd) 2. Observe that only 28231 make it to Elasticsearch 3. Observe that there are 28232 TCP connections from the sg pod to the ES pod Actual results: $ ./esquery /sglogs-osp_cloudops_02.2021.06.07/_count {"count":28231,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}} $ oc rsh elasticsearch-es-default-0 sh-4.4$ ss -tnp | tail -2 ESTAB 0 0 [::ffff:10.129.3.21]:9200 [::ffff:10.128.3.36]:42332 users:(("java",pid=7,fd=1863)) ESTAB 0 0 [::ffff:10.129.3.21]:9200 [::ffff:10.128.3.36]:59351 users:(("java",pid=7,fd=24131)) sh-4.4$ ss -tnp | grep 10.128.3.36 | wc 28232 169392 4545352 Expected results: * One (or a few) TCP connections, and an uninterrupted flow of events Additional info: * This was found during testing of an unreleased logging feature, but the code appears to apply to all events * A patch is actively being worked on and will be tested ASAP
Tested the down stream container and I no longer see TCP port exhaustion
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Service Telemetry Framework 1.3.2 - Container Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3721