Hide Forgot
Description of problem: After logging-fluentd pods establish a session to a particular logging-mux pod via the logging-mux service, they never let go. When the logging-mux deploymentconfig is scaled up, the new logging-mux pod never gets any of the sessions from the logging-fluentd pods. Logging might need to use the reconnect_interval parameter for the secure forward plugin to assist with session spreading when logging-mux is scaled up. It claims to have a default of 5 seconds, but that's not what I am seeing. https://docs.fluentd.org/v0.12/articles/out_secure_forward#reconnectinterval-time Session spreading can currently be forced by restarting all of the logging-fluentd pods (tag nodes logging-infra-fluent=false and then back to true) Version-Release number of selected component (if applicable): logging v3.6.173.0.27 How reproducible: Always when scaling logging-mux up while existing logging-fluentd sessions exist Steps to Reproduce: 1. Deploy logging with the logging-mux enabled (sample inventory below) in an environment with multiple compute nodes 2. Verify the current sessions going through logging-mux with oc exec <logging-mux-pod> -- ss -tnpi (pipe to wc -l if desired) 3. oc scale --replicas=2 dc/logging-mux 4. Repeat the oc exec command for each logging-mux and verify all sessions are still on the original logging-mux 5. Run some logging traffic and wait a while. Repeat step 4 Actual results: Sessions stay with the original logging-mux pod and do not spread to additional logging-mux pods when the dc is scaled up Expected results: Sessions balance between the pods over time
I think we might be disabling reconnecting by default, see https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/configs.d/openshift/output-es-config.conf#L19
The non-mux case has the same issue. With a 3 node elasticsearch cluster, if an ES deploymentconfig is scaled down and back up the new ES pod will never get sessions from any fluentd clients. Changing the summary of this bz - the core issue is fluentd never reconnects to help with session spreading.
Closing in favor of RFE trello card
*** Bug 1448951 has been marked as a duplicate of this bug. ***
I agree with Mike, tracking this as a trello card is worth while, but the bug appears to be present with all versions of aggregated logging which use fluentd. Seems like we need to keep this open, and clone this to all the versions we support.
https://github.com/uken/fluent-plugin-elasticsearch/pull/459 has merged
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/f084fee4de8c32f83c53694058320e6dc3e5d170 Bug 1489533 - logging-fluentd needs to periodically reconnect to logging-mux or elasticsearch to help balance sessions https://bugzilla.redhat.com/show_bug.cgi?id=1489533 https://github.com/uken/fluent-plugin-elasticsearch/pull/459 implements support for reloading connections when the Elasticsearch is behind a proxy/load balancer, as in our case, and allows specifying the reload interval in terms of the number of operations. This PR adds support for the following env. vars which can be set in the fluentd daemonset/mux deployment. The ability to set these is provided primarily for experimentation, not something which will ordinarily require tuning in production. `ES_RELOAD_CONNECTIONS` - boolean - default `true` `ES_RELOAD_AFTER` - integer - default `100` `ES_SNIFFER_CLASS_NAME` - string - default `Fluent::Plugin::ElasticsearchSimpleSniffer` There are also `OPS_` named env. vars which will override the corresponding `ES_` named env. var. That is, by default, fluentd will reload connections to Elasticsearch every 100 operations (NOTE: not every 100 records!) These include internal `ping` operations, so will not exactly correspond to each bulk index request. https://github.com/openshift/origin-aggregated-logging/commit/0ecf76a77627c2205f78da6c9ace4dbdc6b72197 Merge pull request #1284 from richm/bug-1489533 Bug 1489533 - logging-fluentd needs to periodically reconnect to logging-mux or elasticsearch to help balance sessions
Tested this with varying workloads from 50 to 700 messages/second/node from 100 pods per node, each in its own namespace. Tested with RELOAD off, default (100 operations) and 250 operations. For the highest workload (700 1Kb messages/second/node), fluentd cpu utilization: RELOAD off: 48% RELOAD 100 operations: 52% RELOAD 250 operations: 49% For a workload of 250 messages/second/node RELOAD off: 19% RELOAD 100 operations: 22% RELOAD 250 operations: 21% Different RELOAD levels had no impact on fluentd memory utilization Different RELOAD levels had no impact on elasticsearch cpu or memory. Leaving it at 100 operations seems reasonable, but defaulting to 200 or 250 might provide some marginal cpu utilization savings.
@rmeggins, opinion on upping the default reload to 200 operations?
(In reply to Mike Fiedler from comment #17) > @rmeggins, opinion on upping the default reload to 200 operations? Sure, sounds good.
Verified on 3.11.0-0.25.0. Verified on a 500 node cluster that logging connections are spread evenly across ES systems and that re-connections occur. Will leave it to dev to decide if the default of 100 should change based on the data in comment 16.
(In reply to Mike Fiedler from comment #19) > Verified on 3.11.0-0.25.0. Verified on a 500 node cluster that logging > connections are spread evenly across ES systems and that re-connections > occur. Will leave it to dev to decide if the default of 100 should change > based on the data in comment 16. openshift/origin-aggregated-logging/pull/1341
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.