Bug 1616352

Summary: logging-fluentd needs to periodically reconnect to logging-mux or elasticsearch to help balance sessions
Product: OpenShift Container Platform Reporter: Rich Megginson <rmeggins>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: anli, aos-bugs, bperkins, jcantril, mifiedle, pportant, rmeggins, tkatarki
Target Milestone: ---Keywords: Reopened
Target Release: 3.10.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openshift3/logging-fluentd:v3.10.34-1 Doc Type: Enhancement
Doc Text:
Feature: Fluentd will now reconnect to Elasticsearch every 100 operations by default. Reason: If one Elasticsearch is started before the others in the cluster, the load balancer in the Elasticsearch service will connect to that one and that one only, and so will all of the Fluentd connecting to Elasticsearch. Result: By having Fluentd reconnect periodically, the load balancer will be able to spread the load evenly among all of the Elasticsearch in the cluster.
Story Points: ---
Clone Of: 1489533
: 1616354 (view as bug list) Environment:
Last Closed: 2018-09-22 04:55:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1489533    
Bug Blocks: 1616354    

Comment 3 Anping Li 2018-09-11 13:48:35 UTC
@mifiedle, Could you help test this bug and point out the steps?  I have started 10 ocp-logtest pods with rate 600 on two nodes cluster.  But I haven't watched the sessions on the scaled mux pods.

Comment 4 Mike Fiedler 2018-09-11 19:02:36 UTC
The easiest way to test is is with the ss utility, but unfortunately our ES pod image does not have it.   I copy the utility there and run it.

1. From a linux system that has ss (from iproute package):  

oc scp /usr/sbin/ss -c elasticsearch <pod>:/tmp/ss

2. oc rsh into the ES pod

3. /tmp/ss -tnp | grep 9200 | awk {'print $5'} | cut -f4 -d":" | sort -u  This shows all client IPs with connections to ES

4. Send log traffic for a while and repeat step 3.   The list of connected clients should be different. 

5. Check on all ES servers and each should have roughly an equal number of clients

Comment 5 Anping Li 2018-09-13 06:24:04 UTC
Verified with logging:v3.10.45.  The ES had roughly an equal connections.

Comment 7 errata-xmlrpc 2018-09-22 04:55:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2660