Bug 1489533

Summary:	logging-fluentd needs to periodically reconnect to logging-mux or elasticsearch to help balance sessions
Product:	OpenShift Container Platform	Reporter:	Mike Fiedler <mifiedle>
Component:	Logging	Assignee:	Rich Megginson <rmeggins>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Mike Fiedler <mifiedle>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.6.0	CC:	aos-bugs, bperkins, jcantril, pportant, rmeggins, tkatarki
Target Milestone:	---	Keywords:	Reopened
Target Release:	3.11.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	openshift3/ose-logging-fluentd:v3.11.0-0.16.0	Doc Type:	Enhancement
Doc Text:	Feature: Fluentd will now reconnect to Elasticsearch every 100 operations by default. Reason: If one Elasticsearch is started before the others in the cluster, the load balancer in the Elasticsearch service will connect to that one and that one only, and so will all of the Fluentd connecting to Elasticsearch. Result: By having Fluentd reconnect periodically, the load balancer will be able to spread the load evenly among all of the Elasticsearch in the cluster.	Story Points:	---
Clone Of:
Clones:	1616352 (view as bug list)		Environment:
Last Closed:	2018-12-21 15:16:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1616352, 1616354

Description Mike Fiedler 2017-09-07 16:02:01 UTC

Description of problem:

After logging-fluentd pods establish a session to a particular logging-mux pod via the logging-mux service, they never let go. When the logging-mux deploymentconfig is scaled up, the new logging-mux pod never gets any of the sessions from the logging-fluentd pods.

Logging might need to use the reconnect_interval parameter for the secure forward plugin to assist with session spreading when logging-mux is scaled up. It claims to have a default of 5 seconds, but that's not what I am seeing.

https://docs.fluentd.org/v0.12/articles/out_secure_forward#reconnectinterval-time

Session spreading can currently be forced by restarting all of the logging-fluentd pods (tag nodes logging-infra-fluent=false and then back to true)

Version-Release number of selected component (if applicable): logging v3.6.173.0.27

How reproducible: Always when scaling logging-mux up while existing logging-fluentd sessions exist

Steps to Reproduce:

1. Deploy logging with the logging-mux enabled (sample inventory below) in an environment with multiple compute nodes
2. Verify the current sessions going through logging-mux with oc exec <logging-mux-pod> -- ss -tnpi (pipe to wc -l if desired)
3. oc scale --replicas=2 dc/logging-mux
4. Repeat the oc exec command for each logging-mux and verify all sessions are still on the original logging-mux
5. Run some logging traffic and wait a while. Repeat step 4

Actual results:

Sessions stay with the original logging-mux pod and do not spread to additional logging-mux pods when the dc is scaled up

Expected results:

Sessions balance between the pods over time

Comment 1 Peter Portante 2017-09-09 19:20:44 UTC

I think we might be disabling reconnecting by default, see https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/configs.d/openshift/output-es-config.conf#L19

Comment 2 Mike Fiedler 2017-09-12 13:34:00 UTC

The non-mux case has the same issue.  With a 3 node elasticsearch cluster, if an ES deploymentconfig is scaled down and back up the new ES pod will never get sessions from any fluentd clients.

Changing the summary of this bz - the core issue is fluentd never reconnects to help with session spreading.

Comment 3 Jeff Cantrill 2017-10-06 14:59:34 UTC

Closing in favor of RFE trello card

Comment 4 Jeff Cantrill 2017-10-06 15:07:18 UTC

*** Bug 1448951 has been marked as a duplicate of this bug. ***

Comment 6 Peter Portante 2017-10-06 23:50:57 UTC

I agree with Mike, tracking this as a trello card is worth while, but the bug appears to be present with all versions of aggregated logging which use fluentd.  Seems like we need to keep this open, and clone this to all the versions we support.

Comment 9 Rich Megginson 2018-08-13 22:24:58 UTC

https://github.com/uken/fluent-plugin-elasticsearch/pull/459 has merged

Comment 10 openshift-github-bot 2018-08-14 19:39:51 UTC

Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/f084fee4de8c32f83c53694058320e6dc3e5d170
Bug 1489533 - logging-fluentd needs to periodically reconnect to logging-mux or elasticsearch to help balance sessions

https://bugzilla.redhat.com/show_bug.cgi?id=1489533

https://github.com/uken/fluent-plugin-elasticsearch/pull/459
implements support for reloading connections when the
Elasticsearch is behind a proxy/load balancer, as in our case,
and allows specifying the reload interval in terms of the
number of operations.

This PR adds support for the following env. vars which can be
set in the fluentd daemonset/mux deployment.  The ability to
set these is provided primarily for experimentation, not something
which will ordinarily require tuning in production.
`ES_RELOAD_CONNECTIONS` - boolean - default `true`
`ES_RELOAD_AFTER` - integer - default `100`
`ES_SNIFFER_CLASS_NAME` - string - default `Fluent::Plugin::ElasticsearchSimpleSniffer`
There are also `OPS_` named env. vars which will override the
corresponding `ES_` named env. var.

That is, by default, fluentd will reload connections to
Elasticsearch every 100 operations (NOTE: not every 100 records!)
These include internal `ping` operations, so will not exactly
correspond to each bulk index request.

https://github.com/openshift/origin-aggregated-logging/commit/0ecf76a77627c2205f78da6c9ace4dbdc6b72197
Merge pull request #1284 from richm/bug-1489533

Bug 1489533 - logging-fluentd needs to periodically reconnect to logging-mux or elasticsearch to help balance sessions

Comment 16 Mike Fiedler 2018-08-23 17:01:36 UTC

Tested this with varying workloads from 50 to 700 messages/second/node from 100 pods per node, each in its own namespace.   Tested with RELOAD off, default (100 operations) and 250 operations.

For the highest workload (700 1Kb messages/second/node), fluentd cpu utilization:

RELOAD off:  48%
RELOAD 100 operations:  52%
RELOAD 250 operations:  49%

For a workload of 250 messages/second/node

RELOAD off: 19%
RELOAD 100 operations: 22%
RELOAD 250 operations: 21%

Different RELOAD levels had no impact on fluentd memory utilization

Different RELOAD levels had no impact on elasticsearch cpu or memory.

Leaving it at 100 operations seems reasonable, but defaulting to 200 or 250 might provide some marginal cpu utilization savings.

Comment 17 Mike Fiedler 2018-09-06 11:39:15 UTC

@rmeggins, opinion on upping the default reload to 200 operations?

Comment 18 Rich Megginson 2018-09-06 16:40:57 UTC

(In reply to Mike Fiedler from comment #17)
> @rmeggins, opinion on upping the default reload to 200 operations?

Sure, sounds good.

Comment 19 Mike Fiedler 2018-09-11 11:29:58 UTC

Verified on 3.11.0-0.25.0.  Verified on a 500 node cluster that logging connections are spread evenly across ES systems and that re-connections occur.  Will leave it to dev to decide if the default of 100 should change based on the data in comment 16.

Comment 20 Rich Megginson 2018-09-12 15:40:16 UTC

(In reply to Mike Fiedler from comment #19)
> Verified on 3.11.0-0.25.0.  Verified on a 500 node cluster that logging
> connections are spread evenly across ES systems and that re-connections
> occur.  Will leave it to dev to decide if the default of 100 should change
> based on the data in comment 16.

openshift/origin-aggregated-logging/pull/1341

Comment 21 Luke Meyer 2018-12-21 15:16:40 UTC

Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.