Bug 1408633
Summary: | [RFE] fluentd temporarily failed to flush the buffer | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||||||||||
Component: | Logging | Assignee: | Peter Portante <pportant> | ||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Xia Zhao <xiazhao> | ||||||||||||
Severity: | low | Docs Contact: | |||||||||||||
Priority: | low | ||||||||||||||
Version: | 3.4.0 | CC: | aos-bugs, jcantril, juzhao, nhosoi, pportant, rmeggins, xiazhao | ||||||||||||
Target Milestone: | --- | Keywords: | FutureFeature, RFE | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 1479651 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2017-10-09 21:25:17 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1479651, 1479679, 1539994 | ||||||||||||||
Attachments: |
|
Is elasticsearch being restarted at these times? Or perhaps it has not been started yet? Is the problem here that you do not want to see a full stacktrace when this condition occurs? (In reply to Rich Megginson from comment #1) > Is elasticsearch being restarted at these times? Or perhaps it has not been > started yet? Is the problem here that you do not want to see a full > stacktrace when this condition occurs? Hi Rich, I encountered this while configuring the fluentd secure forward settings according to the steps in [1]. Fluentd was restarted during my operations but the logging embedded ES was on, not restarted. I reported it since I see similar error stacks in [2], and expect not to see a full stacktrace which seem like indicating a real problem but in fact it's not. [1]https://github.com/openshift/origin-aggregated-logging/blob/master/hack/testing/test-fluentd-forward.sh [2]https://bugzilla.redhat.com/show_bug.cgi?id=1395448 Thanks, Xiaa Issue reproduced on the latest 3.5.0 env: # openshift version openshift v3.5.5 kubernetes v1.5.2+43a9be4 etcd 3.1.0 Image tested with: openshift3/logging-fluentd 3.5.0 32a4ac0a3e18 # oc logs logging-fluentd-l5102 2017-03-27 08:08:47 +0000 [info]: reading config file path="/etc/fluent/fluent.conf" 2017-03-27 08:08:56 +0000 [warn]: /var/log/messages not found. Continuing without tailing it. 2017-03-27 08:09:02 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-27 08:09:02 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:aaaa00" 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:58:in `client' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:194:in `rescue in send_bulk' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:192:in `send_bulk' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:185:in `block in write_objects' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:184:in `each' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:184:in `write_objects' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/output.rb:490:in `write' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/buffer.rb:354:in `write_chunk' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/buffer.rb:333:in `pop' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/output.rb:342:in `try_flush' 2017-03-27 08:09:02 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/output.rb:149:in `run' 2017-03-27 08:09:03 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-27 08:09:04 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:aaaa00" 2017-03-27 08:09:03 +0000 [warn]: suppressed same stacktrace 2017-03-27 08:09:04 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-27 08:09:08 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:aaaa00" 2017-03-27 08:09:04 +0000 [warn]: suppressed same stacktrace 2017-03-27 08:09:08 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-27 08:09:17 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:aaaa00" 2017-03-27 08:09:08 +0000 [warn]: suppressed same stacktrace 2017-03-27 08:09:17 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-27 08:09:35 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:aaaa00" 2017-03-27 08:09:17 +0000 [warn]: suppressed same stacktrace 2017-03-27 08:09:35 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-27 08:10:05 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:aaaa00" 2017-03-27 08:09:35 +0000 [warn]: suppressed same stacktrace 2017-03-27 08:10:08 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-27 08:11:08 +0000 error_class="Elasticsearch::Transport::Transport::Errors::ServiceUnavailable" error="[503] " plugin_id="object:aaaa00" 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/base.rb:201:in `__raise_transport_error' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/base.rb:312:in `perform_request' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/client.rb:128:in `perform_request' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/elasticsearch-api-1.0.18/lib/elasticsearch/api/actions/ping.rb:20:in `ping' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:58:in `client' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:194:in `rescue in send_bulk' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:192:in `send_bulk' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:185:in `block in write_objects' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:184:in `each' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.2/lib/fluent/plugin/out_elasticsearch_dynamic.rb:184:in `write_objects' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/output.rb:490:in `write' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/buffer.rb:354:in `write_chunk' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/buffer.rb:333:in `pop' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/output.rb:342:in `try_flush' 2017-03-27 08:10:08 +0000 [warn]: /usr/share/gems/gems/fluentd-0.12.31/lib/fluent/output.rb:149:in `run' 2017-03-27 08:11:11 +0000 [warn]: retry succeeded. plugin_id="object:aaaa00" At this time: 2017-03-27 08:09:02 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-03-27 08:09:02 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:aaaa00" Was elasticsearch up and running? Issue reproduced in 3.6.0: # openshift version openshift v3.6.16 kubernetes v1.5.2+43a9be4 etcd 3.1.0 images on ops registry: openshift3/logging-fluentd 3.6.0 a55545358479 logs attached in https://bugzilla.redhat.com/attachment.cgi?id=1269169 Can you provide the logs of the elasticsearch pods logging-es-* and logging-es-ops-* that show what elasticsearch was doing between 2017-04-05 23:22:07 -0400 [warn]: temporarily failed to flush the buffer. next_retry=2017-04-05 23:22:05 -0400 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:1cf3d24" and 2017-04-06 02:20:44 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ba92c0" elapsed_time=28.30003044 slow_flush_log_threshold=20.0 ? Tested with logging 3.6, and set DEBUG mode for es pod, but there was not useful info, see the attached log file. This error maybe is transient error, it is lost connection to es, but finally retry succeeded Created attachment 1272478 [details]
es,fluentd pod log
I also sometimes see the fluentd error. 2017-04-19 17:47:07 -0400 [warn]: temporarily failed to flush the buffer. next_retry=2017-04-19 17:47:13 -0400 error_class="Elasticsearch::Transport::Transport::Errors::ServiceUnavailable" error="[503] " plugin_id="object:1028a2c" I'm certain it is transient. 2017-04-19 17:47:31 -0400 [warn]: retry succeeded. plugin_id="object:1028a2c" At the moment [2017-04-19 17:47:07], elasticsearch has not started yet. [2017-04-19 21:47:07,468][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin) [2017-04-19 21:47:09,333][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin) It turned to be GREEN 10 seconds later in my test environment. [2017-04-19 21:47:17,825][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin) Clustername: logging-es Clusterstate: GREEN (Please note that the fluentd log and the ES log are configured with the different timezone. :) Is it ok to close this bug with NOTABUG? (In reply to Noriko Hosoi from comment #11) > I also sometimes see the fluentd error. > 2017-04-19 17:47:07 -0400 [warn]: temporarily failed to flush the buffer. > next_retry=2017-04-19 17:47:13 -0400 > error_class="Elasticsearch::Transport::Transport::Errors:: > ServiceUnavailable" error="[503] " plugin_id="object:1028a2c" > > I'm certain it is transient. > 2017-04-19 17:47:31 -0400 [warn]: retry succeeded. plugin_id="object:1028a2c" > > At the moment [2017-04-19 17:47:07], elasticsearch has not started yet. > [2017-04-19 > 21:47:07,468][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet > initialized (you may need to run sgadmin) > [2017-04-19 > 21:47:09,333][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet > initialized (you may need to run sgadmin) > > It turned to be GREEN 10 seconds later in my test environment. > [2017-04-19 > 21:47:17,825][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet > initialized (you may need to run sgadmin) > Clustername: logging-es > Clusterstate: GREEN > > (Please note that the fluentd log and the ES log are configured with the > different timezone. :) > > Is it ok to close this bug with NOTABUG? In the case where fluentd reports "unable to flush buffer" because Elasticsearch is not running, then yes, this is not a bug. I think we might want to reduce the verbosity of the fluentd logs though - seeing this particular error, and seeing it frequently at startup, is going to be distressing to users. We should perhaps change this to an RFE - "Suppress Fluentd connection errors" - we could have a threshold e.g. if you get this error N times in M seconds, then throw an error every X seconds. It seems this defect does not affect the whole logging functions, it did not happen prior to logging 3.4.0, and it is not user friendly if this defect happens again and customers had been complained this issue. (In reply to Junqi Zhao from comment #13) > It seems this defect does not affect the whole logging functions, it did not > happen prior to logging 3.4.0, and it is not user friendly if this defect > happens again and customers had been complained this issue. I agree it gives an unnecessary concern to customers. I'm not sure, but prior to 3.4.0, elasticsearch could have started much quicker? I'm trying to launch fluentd with the option --suppress-repeated-stacktrace, but it just suppresses repeated stacktraces. So, it logs the error at least once... @Noriko - yes, with 3.4 we switched to ES 2.4.1 from ES 1.5, and changed the way it starts up too. Another problem is that there is no orchestration - that is, we don't have a way to prevent the other services that use ES from starting until ES is really up and running and ready to accept client operations. I don't think there is anything in fluentd currently that can help. That's why I suggested an RFE because we will likely need to change fluentd itself or perhaps the elasticsearch plugin. Thank you, Rich. It makes sense. I was searching the way to set any dependency among containers/bots, but I could not find any good doc... To set RFE, choosing RFE in the keywords box is good enough? (In reply to Rich Megginson from comment #15) > @Noriko - yes, with 3.4 we switched to ES 2.4.1 from ES 1.5, and changed the > way it starts up too. Another problem is that there is no orchestration - > that is, we don't have a way to prevent the other services that use ES from > starting until ES is really up and running and ready to accept client > operations. > > I don't think there is anything in fluentd currently that can help. That's > why I suggested an RFE because we will likely need to change fluentd itself > or perhaps the elasticsearch plugin. Tested on OCP v3.3.1.20, this error did not happen a few months ago, but appeared now. ES version is 1.5.2, fluentd version is 0.12.23, see the attached file Created attachment 1273837 [details]
fluntd log, ocp 3.3.1.20
The related info was provided in #Comment 17 Tested with logging-fluentd:3.3.1-21, besides the "fluentd temporarily failed to flush the buffer" warn info, we get the following warning info: ************************************************************************** 2017-08-08 03:51:00 -0400 [warn]: /var/log/containers/logging-es-ops-a7gbxh1s-1-deploy_logging_POD-e7a36ba4c2316f04d045871bc630c74e2e1e17ab44dea856977df2caddb7a33d.log not found. Continuing without tailing it. ************************************************************************** Checked on node, these files are not exist, they are used for deployer pod log, maybe it is deleted when fluentd read logs from these files. Created attachment 1310475 [details]
fluntd log, logging-fluentd:3.3.1-21
Hi @Junqi, since this bug was opened for 3.4, can we stick to just 3.4?
I.e., could you open a separate bug for the 3.3.1 issue? In the new bug for 3.3.1, could you put the steps you took to reproduce the problem?
In terms of "fluentd temporarily failed to flush the buffer" warn issue, we often see the warnings when the elasticsearch pod does not have enough resource. To check if what you saw was similar to it or not, could you try restarting the elasticsearch with higher CPU limit?
resources:
limits:
cpu: 1000m
In #c17, you put the following observation. Were the machine powers (including the cpu and memory size) identical between for the 3.3.1.20 test case and the 3.4.x? It would be helpful to record such information in the bug, as well.
> Tested on OCP v3.3.1.20, this error did not happen a few months ago, but appeared now.
(In reply to Noriko Hosoi from comment #22) > Hi @Junqi, since this bug was opened for 3.4, can we stick to just 3.4? > > I.e., could you open a separate bug for the 3.3.1 issue? In the new bug for > 3.3.1, could you put the steps you took to reproduce the problem? Ok, opened defects for v3.5 and 3.3.1 v3.5 https://bugzilla.redhat.com/show_bug.cgi?id=1479651 3.3.1 https://bugzilla.redhat.com/show_bug.cgi?id=1479679 > > In terms of "fluentd temporarily failed to flush the buffer" warn issue, we > often see the warnings when the elasticsearch pod does not have enough > resource. To check if what you saw was similar to it or not, could you try > restarting the elasticsearch with higher CPU limit? > resources: > limits: > cpu: 1000m There is no es error in 3.4, 3.3.1 and 3.5 fluentd pod logs now, and it is not caused by elasticsearch CPU limit, when the fluented pods started up, it will throw out "fluentd temporarily failed to flush the buffer" warn info, see the attached files for the defect (In reply to Junqi Zhao from comment #23) > There is no es error in 3.4, 3.3.1 and 3.5 fluentd pod logs now, and it is > not caused by elasticsearch CPU limit, when the fluented pods started up, it > will throw out "fluentd temporarily failed to flush the buffer" warn info, > see the attached files for the defect Thanks for the clarification, @Junqi. So, the warning messagess are issued just at the startup and disappear once "retry succeeded." is logged. That is, the elasticsearch was slower to be ready to accept the incoming data than the fluentd sent them. The readiness probe for the elasticsearch was introduced since 3.5. I guess you don't see the warnings on 3.5 and newer, do you? For 3.4, since the severity of this bug is low, we could close this bug as WONTFIX? (In reply to Noriko Hosoi from comment #24) > The readiness probe for the elasticsearch was introduced since 3.5. I guess > you don't see the warnings on 3.5 and newer, do you? 3.5 still has such issue, 3.6 does not have. please see attachment in v3.5 defect https://bugzilla.redhat.com/show_bug.cgi?id=1479651 "temporarily failed to flush the buffer" is still there > For 3.4, since the severity of this bug is low, we could close this bug as > WONTFIX? Although it's a low priority defect, if we know how to fix it, I think we should fix it, since the warning logs are noisy, and customer had complained about these warning messages. @Noriko, one more warning info I'd like to raise, in fluentd pod logs, there is additional warning info: 2017-08-17 21:37:02 -0400 [warn]: /var/log/containers/logging-curator-ops-1-deploy_logging_POD-af9d1fae33516e492cba52ffa97bc03f4a0cabf51a66720d6967b656d34fad53.log not found. Continuing without tailing it. Checked on node, these files are not exist, they are used for deployer pod log, maybe it is deleted when fluentd read logs from these files. more info please see my attached file. Created attachment 1314985 [details]
additional waring info in fluentd pod log
I'm going to close this bug about the connection issue as NOTABUG because it is transient and expected. Please open a new bz about the 2017-08-17 21:37:02 -0400 [warn]: /var/log/containers/logging-curator-ops-1-deploy_logging_POD-af9d1fae33516e492cba52ffa97bc03f4a0cabf51a66720d6967b656d34fad53.log not found. Continuing without tailing it. issue. |
Created attachment 1235301 [details] fluentd_log Description of problem: During the 3.4.0 testing on GCE we rarely see this error stacks in fluentd logs, but are still not sure how to reproduce it: $ oc logs -f logging-fluentd-3sby6 2016-12-26 00:33:20 -0500 [info]: reading config file path="/etc/fluent/fluent.conf" 2016-12-26 00:34:00 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-12-26 00:33:42 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:1ab41e4" 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:66:in `client' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:202:in `rescue in send_bulk' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:200:in `send_bulk' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:193:in `block in write_objects' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:192:in `each' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:192:in `write_objects' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/output.rb:480:in `write' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/buffer.rb:354:in `write_chunk' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/buffer.rb:333:in `pop' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/output.rb:338:in `try_flush' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/output.rb:149:in `run' 2016-12-26 00:34:00 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-12-26 00:33:41 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:1a6c72c" 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:66:in `client' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:202:in `rescue in send_bulk' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:200:in `send_bulk' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:193:in `block in write_objects' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:192:in `each' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.9.1/lib/fluent/plugin/out_elasticsearch_dynamic.rb:192:in `write_objects' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/output.rb:480:in `write' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/buffer.rb:354:in `write_chunk' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/buffer.rb:333:in `pop' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/output.rb:338:in `try_flush' 2016-12-26 00:34:00 -0500 [warn]: /usr/share/gems/gems/fluentd-0.12.29/lib/fluent/output.rb:149:in `run' 2016-12-26 00:34:01 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-12-26 00:33:44 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:1ab41e4" 2016-12-26 00:34:01 -0500 [warn]: suppressed same stacktrace 2016-12-26 00:34:02 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-12-26 00:33:44 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:1a6c72c" 2016-12-26 00:34:02 -0500 [warn]: suppressed same stacktrace 2016-12-26 00:34:03 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-12-26 00:33:47 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"logging-es\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})!" plugin_id="object:1ab41e4" 2016-12-26 00:34:03 -0500 [warn]: suppressed same stacktrace at last , retry succeeded: 2016-12-26 00:34:46 -0500 [warn]: retry succeeded. plugin_id="object:1ab41e4" 2016-12-26 00:34:51 -0500 [warn]: retry succeeded. plugin_id="object:1a6c72c" Set severity to low since we didn't notice any impact to logging functionalities. Version-Release number of selected component (if applicable): (ops registry): openshift3/logging-auth-proxy e96b37a99960 openshift3/logging-kibana 27f978fc2946 openshift3/logging-fluentd c493f8b4553b openshift3/logging-elasticsearch 3ca95a8a9433 openshift3/logging-curator e39988877cd9 openshift3/logging-deployer 1033ccb0557b openshift version: openshift v3.4.0.38 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 How reproducible: 3-4 times on GCE platform Steps to Reproduce: 1.Please refer to "Description of problem" 2. 3. Actual results: Sometimes, fluentd temporarily failed to flush the buffer Expected results: It's no need for fluentd throw out error stacks if temporarily failed to flush the buffer, and recovered later Additional info: full log of fluentd attached