1616352 – logging-fluentd needs to periodically reconnect to logging-mux or elasticsearch to help balance sessions

Bug 1616352 - logging-fluentd needs to periodically reconnect to logging-mux or elasticsearch to help balance sessions

Summary: logging-fluentd needs to periodically reconnect to logging-mux or elasticsear...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.10.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.10.z
Assignee:	Jeff Cantrill
QA Contact:	Mike Fiedler
Docs Contact:
URL:
Whiteboard:
Depends On:	1489533
Blocks:	1616354
TreeView+	depends on / blocked

Reported:	2018-08-15 16:45 UTC by Rich Megginson
Modified:	2018-09-22 04:56 UTC (History)
CC List:	8 users (show)
Fixed In Version:	openshift3/logging-fluentd:v3.10.34-1
Doc Type:	Enhancement
Doc Text:	Feature: Fluentd will now reconnect to Elasticsearch every 100 operations by default. Reason: If one Elasticsearch is started before the others in the cluster, the load balancer in the Elasticsearch service will connect to that one and that one only, and so will all of the Fluentd connecting to Elasticsearch. Result: By having Fluentd reconnect periodically, the load balancer will be able to spread the load evenly among all of the Elasticsearch in the cluster.
Clone Of:	1489533
Clones:	1616354 (view as bug list)
Environment:
Last Closed:	2018-09-22 04:55:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	uken fluent-plugin-elasticsearch pull 459	0	None	None	None	2018-08-15 16:45:46 UTC
Red Hat Product Errata	RHBA-2018:2660	0	None	None	None	2018-09-22 04:56:04 UTC

Comment 3 Anping Li 2018-09-11 13:48:35 UTC

@mifiedle， Could you help test this bug and point out the steps?  I have started 10 ocp-logtest pods with rate 600 on two nodes cluster.  But I haven't watched the sessions on the scaled mux pods.

Comment 4 Mike Fiedler 2018-09-11 19:02:36 UTC

The easiest way to test is is with the ss utility, but unfortunately our ES pod image does not have it.   I copy the utility there and run it.

1. From a linux system that has ss (from iproute package):  

oc scp /usr/sbin/ss -c elasticsearch <pod>:/tmp/ss

2. oc rsh into the ES pod

3. /tmp/ss -tnp | grep 9200 | awk {'print $5'} | cut -f4 -d":" | sort -u  This shows all client IPs with connections to ES

4. Send log traffic for a while and repeat step 3.   The list of connected clients should be different. 

5. Check on all ES servers and each should have roughly an equal number of clients

Comment 5 Anping Li 2018-09-13 06:24:04 UTC

Verified with logging:v3.10.45.  The ES had roughly an equal connections.

Comment 7 errata-xmlrpc 2018-09-22 04:55:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2660

Note You need to log in before you can comment on or make changes to this bug.