1395448 – Continuous "No such file or directory - (Errno::ENOENT)" error when Aggregated Logging try to connect external elasticsearch

Bug 1395448 - Continuous "No such file or directory - (Errno::ENOENT)" error when Aggregated Logging try to connect external elasticsearch

Summary: Continuous "No such file or directory - (Errno::ENOENT)" error when Aggregat...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	3.8.0
Assignee:	Peter Portante
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1559435 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-15 23:47 UTC by Takayoshi Tanaka
Modified:	2020-12-14 07:52 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-10-09 21:22:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Takayoshi Tanaka 2016-11-15 23:47:46 UTC

Description of problem:
Set up EFK stack and configured to connect to external ElasticSearch, fluent-plugin-elasticsearch continuously output the same error "No such file or directory -  (Errno::ENOENT)".

- The external ealsticseach is AWS elasticsearch
 - It's configured to allow for public access
- As not using mutual TLS,  ES_CLIENT_CERT, ES_CLIENT_KEY, OPS_CLIENT_CERT and OPS_CLIENT_KEY have the empty value.
- Recreated the logging-fluentd secret to only hold the CA cert of the cert configured on the AWS Elasticsearch endpoint (Verisign)
- Reinstall the daemonset by issuing a 'oc delete daemonset logging-fluentd' followed by a 'oc new-app logging-fluentd-template'

Version-Release number of selected component (if applicable):


How reproducible:
No at present. Trying to reproduce.

Steps to Reproduce:
1. Setup EFK stack and configure to connect external ealsticsearch [1]

[1] https://docs.openshift.com/container-platform/3.3/install_config/aggregate_logging.html#sending-logs-to-an-external-elasticsearch-instance

Actual results:
"oc logs logging-fluentd-xxxxx" shows continuous error below.

```
2016-11-13 12:43:17 +1100 [info]: reading config file path="/etc/fluent/fluent.conf"
2016-11-13 12:43:31 +1100 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-13 12:43:31 +1100 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"xxxxxxxx.ap-southeast-2.es.amazonaws.com\", :port=>443, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})! No such file or directory -  (Errno::ENOENT)" plugin_id="object:1ce0184"
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.3.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:61:in `rescue in client'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.3.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:58:in `client'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.3.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:180:in `rescue in send'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.3.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:178:in `send'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.3.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:171:in `block in write'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.3.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:170:in `each'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluent-plugin-elasticsearch-1.3.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb:170:in `write'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluentd-0.12.20/lib/fluent/buffer.rb:345:in `write_chunk'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluentd-0.12.20/lib/fluent/buffer.rb:324:in `pop'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluentd-0.12.20/lib/fluent/output.rb:329:in `try_flush'
  2016-11-13 12:43:31 +1100 [warn]: /usr/share/gems/gems/fluentd-0.12.20/lib/fluent/output.rb:140:in `run'
```

Expected results:
Sending logs without error.

Additional info:
This report is based on the customer case.

Comment 2 Takayoshi Tanaka 2016-11-16 07:41:13 UTC

I reproduced the issue as same as the customer. The steps are simple. 

1. Setup AWS elasticsearch. minimum instance size and number with anonymous accessible.
2. Update logging-fluentd.  
$ oc edit -n logging template logging-fluentd-template
- name: ES_HOST
  value: search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com
- name: ES_PORT
  value: "443"
- name: ES_CLIENT_CERT
- name: ES_CLIENT_KEY

Then recreate logging-fluentd.
$ oc delete daemonset logging-fluentd
$ oc new-app logging-fluentd-template

After that, I got the same error messages.

```
2016-11-16 02:33:51 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-16 02:38:50 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com\", :port=>443, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})! No such file or directory -  (Errno::ENOENT)" plugin_id="object:10cbe70"
  2016-11-16 02:33:51 -0500 [warn]: suppressed same stacktrace
2016-11-16 02:38:51 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-16 02:43:50 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com\", :port=>443, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})! No such file or directory -  (Errno::ENOENT)" plugin_id="object:10cbe70"
  2016-11-16 02:38:51 -0500 [warn]: suppressed same stacktrac
```

I wonder this configuration is OK or not.

```
// log into the pod
$ oc rsh logging-fluentd-xxx
//on the pod
sh-4.2# cat /etc/fluent/configs.d/openshift/output-es-config.conf 
    <store>
      @type elasticsearch_dynamic
      host "#{ENV['ES_HOST']}"
      port "#{ENV['ES_PORT']}"
      scheme https
      index_name ${record['kubernetes_namespace_name']}.${record['kubernetes_namespace_id']}.${Time.at(time).getutc.strftime(@logstash_dateformat)}
      user fluentd
      password changeme

      client_key "#{ENV['ES_CLIENT_KEY']}"
      client_cert "#{ENV['ES_CLIENT_CERT']}"
      ca_file "#{ENV['ES_CA']}"

      flush_interval 5s
      max_retry_wait 300
      disable_retry_limit
    </store>
```

client_key and client_cert will be emtpry value. Is it ok?
>      client_key "#{ENV['ES_CLIENT_KEY']}"
>      client_cert "#{ENV['ES_CLIENT_CERT']}"

Comment 3 Rich Megginson 2016-11-16 17:56:52 UTC

(In reply to Takayoshi Tanaka from comment #2)
> I reproduced the issue as same as the customer. The steps are simple. 
> 
> 1. Setup AWS elasticsearch. minimum instance size and number with anonymous
> accessible.
> 2. Update logging-fluentd.  
> $ oc edit -n logging template logging-fluentd-template
> - name: ES_HOST
>   value:
> search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com
> - name: ES_PORT
>   value: "443"
> - name: ES_CLIENT_CERT
> - name: ES_CLIENT_KEY

Using scheme "https" and port "443" means TLS.  In order to use TLS, you must have a CA cert for the CA that issued the TLS server cert for "search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com".  It is possible that fluentd is looking for a CA cert file, and when it is not found, you get the "File not Found" error.

> 
> Then recreate logging-fluentd.
> $ oc delete daemonset logging-fluentd
> $ oc new-app logging-fluentd-template
> 
> After that, I got the same error messages.
> 
> ```
> 2016-11-16 02:33:51 -0500 [warn]: temporarily failed to flush the buffer.
> next_retry=2016-11-16 02:38:50 -0500
> error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not
> reach Elasticsearch cluster
> ({:host=>\"search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.
> amazonaws.com\", :port=>443, :scheme=>\"https\", :user=>\"fluentd\",
> :password=>\"obfuscated\"})! No such file or directory -  (Errno::ENOENT)"
> plugin_id="object:10cbe70"
>   2016-11-16 02:33:51 -0500 [warn]: suppressed same stacktrace
> 2016-11-16 02:38:51 -0500 [warn]: temporarily failed to flush the buffer.
> next_retry=2016-11-16 02:43:50 -0500
> error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not
> reach Elasticsearch cluster
> ({:host=>\"search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.
> amazonaws.com\", :port=>443, :scheme=>\"https\", :user=>\"fluentd\",
> :password=>\"obfuscated\"})! No such file or directory -  (Errno::ENOENT)"
> plugin_id="object:10cbe70"
>   2016-11-16 02:38:51 -0500 [warn]: suppressed same stacktrac
> ```
> 
> I wonder this configuration is OK or not.
> 
> ```
> // log into the pod
> $ oc rsh logging-fluentd-xxx
> //on the pod
> sh-4.2# cat /etc/fluent/configs.d/openshift/output-es-config.conf 
>     <store>
>       @type elasticsearch_dynamic
>       host "#{ENV['ES_HOST']}"
>       port "#{ENV['ES_PORT']}"
>       scheme https
>       index_name
> ${record['kubernetes_namespace_name']}.${record['kubernetes_namespace_id']}.
> ${Time.at(time).getutc.strftime(@logstash_dateformat)}
>       user fluentd
>       password changeme
> 
>       client_key "#{ENV['ES_CLIENT_KEY']}"
>       client_cert "#{ENV['ES_CLIENT_CERT']}"
>       ca_file "#{ENV['ES_CA']}"
> 
>       flush_interval 5s
>       max_retry_wait 300
>       disable_retry_limit
>     </store>
> ```
> 
> client_key and client_cert will be emtpry value. Is it ok?
> >      client_key "#{ENV['ES_CLIENT_KEY']}"
> >      client_cert "#{ENV['ES_CLIENT_CERT']}"

Yes, assuming they are not using SearchGuard or something which requires authentication.

But they must use ES_CA.

Can we have a copy of the customer's configuration to confirm?

Comment 4 Rich Megginson 2016-11-16 18:03:21 UTC

Also, try accessing elasticsearch from the command line:

1) try without CA cert:
curl -s https://search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com:443/

this should give an error about "peer untrusted" or something like that

2) try ignoring the CA cert:
curl -s -k https://search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com:443/ | python -mjson.tool

This should give you some sort of information about Elasticsearch

Comment 5 Takayoshi Tanaka 2016-11-17 00:05:33 UTC

In my test environment, both commands are succeeded.

sh-4.2# curl -s https://search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com:443/
{
  "name" : "Ashcan",
  "cluster_name" : "694280550618:tatanaka-es",
  "version" : {
    "number" : "2.3.2",
    "build_hash" : "0944b4bae2d0f7a126e92b6133caf1651ae316cc",
    "build_timestamp" : "2016-05-20T07:46:04Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.0"
  },
  "tagline" : "You Know, for Search"
}
sh-4.2# curl -s -k https://search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com:443/ | python -mjson.tool
{
    "cluster_name": "694280550618:tatanaka-es",
    "name": "Ashcan",
    "tagline": "You Know, for Search",
    "version": {
        "build_hash": "0944b4bae2d0f7a126e92b6133caf1651ae316cc",
        "build_snapshot": false,
        "build_timestamp": "2016-05-20T07:46:04Z",
        "lucene_version": "5.5.0",
        "number": "2.3.2"
    }
}
sh-4.2# env | grep ^ES_
ES_COPY_SCHEME=https
ES_CA=/etc/fluent/keys/ca
ES_COPY_PORT=
ES_COPY_USERNAME=
ES_COPY_PASSWORD=
ES_HOST=search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com
ES_COPY_CA=
ES_COPY_CLIENT_CERT=
ES_COPY_HOST=
ES_CLIENT_CERT=
ES_CLIENT_KEY=
ES_PORT=443
ES_COPY=false
ES_COPY_CLIENT_KEY=

I'll send the customer's configuration in private message.

Comment 7 Rich Megginson 2016-11-17 04:41:11 UTC

The customer is using /etc/fluent/keys/ca for ES_CA and OPS_CA. This is the CA cert of the CA that issued the Elasticsearch SSL server cert. In the customers case, it doesn't appear that they are using the embedded Elasticsearch, but only an external Elasticsearch (NOTE: This is really supported? If so, wheree in our documentation do we say that the customer can not use the embedded Elasticsearch, and use another one instead?). The customer must provide a CA cert file for the CA that issued the Elasticsearch server SSL cert, and must set the contents of that file to be the value of the 'ca' field in the logging-fluentd secret (oc edit secret logging-fluentd).

Looks as though you must specify a file for ES_CLIENT_CERT, ES_CLIENT_KEY, OPS_CLIENT_CERT, and OPS_CLIENT_KEY. Unfortunately I don't have a set up with a plain Elasticsearch to test this with - all of the OpenShift ES instances require client cert auth, so we always specify these, and must specify these.

You could try specifying a 0 length file for these.

on the host:
# touch /var/log/dummyfile
# oc edit template logging-fluentd-template
...
- name: ES_CLIENT_CERT
value: /var/log/dummyfile
- name: ES_CLIENT_KEY
value: /var/log/dummyfile
- name: OPS_CLIENT_CERT
value: /var/log/dummyfile
- name: OPS_CLIENT_KEY
value: /var/log/dummyfile
...

Then redeploy the fluentd daemonset.

If that doesn't work, the next step would be to overwrite these files in /etc/fluent/configs.d/openshift in the pod:
output-es-config.conf
output-es-ops-config.conf

on the host:
# oc exec $the_fluentd_pod -- cat /etc/fluent/configs.d/openshift/output-es-config.conf > /var/log/output-es-config.conf
# edit the /var/log/output-es-config.conf - remove the client_cert and client_key lines
# oc exec $the_fluentd_pod -- cat /etc/fluent/configs.d/openshift/output-es-ops-config.conf > /var/log/output-es-ops-config.conf
# edit the /var/log/output-es-ops-config.conf - remove the client_cert and client_key lines

Next - edit the template to add volumes and mounts for these:
...
volumeMounts:
- mountPath: /etc/fluent/configs.d/openshift/output-es-config.conf
name: esconf
readOnly: true
- mountPath: /etc/fluent/configs.d/openshift/output-es-ops-config.conf
name: esopsconf
readOnly: true
...
volumes:
- hostPath:
path: /var/log/output-es-config.conf
name: esconf
- hostPath:
path: /var/log/output-es-ops-config.conf
name: esopsconf
...

Then redeploy the fluentd daemonset.

Comment 8 Takayoshi Tanaka 2016-11-17 05:03:25 UTC

> NOTE: This is really supported?  If so, wheree in our documentation do we say that the customer can not use the embedded Elasticsearch, and use another one instead?

Here is our document. It appears the customer can send logs to external elasticsearch instead of embedded one. And also no special configuration is required to external elasticsearch.
https://docs.openshift.com/container-platform/3.3/install_config/aggregate_logging.html#sending-logs-to-an-external-elasticsearch-instance

I will try what you commented from now.

Comment 9 Takayoshi Tanaka 2016-11-17 06:55:50 UTC

Based on your suggestion, I found a workaround. The steps is based on yours:

Next - edit the template to add volumes and mounts for these:
...
          volumeMounts:
          - mountPath: /etc/fluent/configs.d/openshift/output-es-config.conf
            name: esconf
            readOnly: true
          - mountPath: /etc/fluent/configs.d/openshift/output-es-ops-config.conf
            name: esopsconf
            readOnly: true
...
        volumes:
        - hostPath:
            path: /var/log/output-es-config.conf
          name: esconf
        - hostPath:
            path: /var/log/output-es-ops-config.conf
          name: esopsconf
...

And the edited /var/log/output-es-config.conf is as below:
```
# cat /var/log/output-es-config.conf
    <store>
      @type elasticsearch_dynamic
      host "#{ENV['ES_HOST']}"
      port "#{ENV['ES_PORT']}"
      scheme https
      index_name ${record['kubernetes_namespace_name']}.${record['kubernetes_namespace_id']}.${Time.at(time).getutc.strftime(@logstash_dateformat)}

      flush_interval 5s
      max_retry_wait 300
      disable_retry_limit
    </store>
```

We should remove user, password, client_key, client_cert and ca_file. We must not specify either of them.


Then here is the result what I tried.

1. specify 0 length file for client_key, client_cert and ca_file
Result: "header too long" error

2016-11-17 00:26:46 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-17 00:31:46 -0500 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"search-tatanaka-es-mbko5rxbyyskijgnncmoepou5m.ap-northeast-1.es.amazonaws.com\", :port=>443, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"})! header too long (OpenSSL::X509::CertificateError)" plugin_id="object:1511994"

2. override output-es-config.conf with removing client_key and client_cert files.

2016-11-17 00:36:10 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-17 00:36:39 -0500 error_class="Faraday::SSLError" error="Unable to verify certificate, please set `Excon.defaults[:ssl_ca_path] = path_to_certs`, `ENV['SSL_CERT_DIR'] = path_to_certs`, `Excon.defaults[:ssl_ca_file] = path_to_file`, `ENV['SSL_CERT_FILE'] = path_to_file`, `Excon.defaults[:ssl_verify_callback] = callback` (see OpenSSL::SSL::SSLContext#verify_callback), or `Excon.defaults[:ssl_verify_peer] = false` (less secure)." plugin_id="object:165c704"

3. override output-es-config.conf with removing all cert files.

2016-11-17 01:02:20 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-17 01:02:19 -0500 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:15313ac"

As far as I tested, we should have to remove user, password, client_key, client_cert and ca_file to connect external elasticsearch which doesn't use  mutual TLS nor HTTP basic auth.

Comment 10 Takayoshi Tanaka 2016-11-17 08:01:33 UTC

The customer confirms this workaround goes well. Then, do you have a plan to fix it so that the customer need not attach an external volume to configure?

Also, the customer has questions.

1) If the customer uses this workaround, can Red Hat support?

2) About the curator, it can manage to external elasticsearch? As far as our document, there's no parameter to configure for external elasticsearch.

3) The customer will use external elasticsearch only and they won't use our emebedded elasticseafch. In this case, does the customer only have to run logging-fluentd daemonset?

Comment 11 Rich Megginson 2016-11-17 17:19:35 UTC

(In reply to Takayoshi Tanaka from comment #10)
> The customer confirms this workaround goes well. Then, do you have a plan to
> fix it so that the customer need not attach an external volume to configure?
> 
> Also, the customer has questions.
> 
> 1) If the customer uses this workaround, can Red Hat support?
 
Yes.

Eric, how would the customer use configmaps to do this instead of volume mounts?

> 2) About the curator, it can manage to external elasticsearch? As far as our
> document, there's no parameter to configure for external elasticsearch.

No, it cannot.  It is designed to work with the internal ES only.  The idea is that if the customer is using an external ES, they will manage it with their own tools.

If you want curator (and kibana?) to work with an external ES, please file bugzilla RFEs for those features.
 
> 3) The customer will use external elasticsearch only and they won't use our
> emebedded elasticseafch. In this case, does the customer only have to run
> logging-fluentd daemonset?

correct.

This bug fix/feature will be targeted for OCP 3.5 at the earliest.  If you need it sooner, please escalate.

Comment 12 ewolinet 2016-11-17 17:42:08 UTC

In order to do this with configmaps instead:

  $ oc edit configmap/logging-fluentd

Then in the fluent.conf section you would update the "## matches" section to look like the following -- replacing the @include statements with the files they expand to and making the changes you mentioned above:

## matches
  @include configs.d/openshift/output-pre-*.conf
  <match journal.system** system.var.log** **_default_** **_openshift_** **_openshift-infra_**>
    @type elasticsearch_dynamic
    host "#{ENV['OPS_HOST']}"
    port "#{ENV['OPS_PORT']}"
    scheme https
    index_name .operations.${record['time'].nil? ? Time.at(time).getutc.strftime(@logstash_dateformat) : Time.parse(record['time']).getutc.strftime(@logstash_dateformat)}

    flush_interval 5s
    max_retry_wait 300
    disable_retry_limit
  </match>
  <match **>
    @type elasticsearch_dynamic
    host "#{ENV['ES_HOST']}"
    port "#{ENV['ES_PORT']}"
    scheme https
    index_name ${record['kubernetes_namespace_name']}.${record['kubernetes_namespace_id']}.${Time.at(time).getutc.strftime(@logstash_dateformat)}

    flush_interval 5s
    max_retry_wait 300
    disable_retry_limit
  </match>
  # no post - applications.conf matches everything left
##

Comment 13 Takayoshi Tanaka 2016-11-28 00:11:53 UTC

Based on the above workaround, I can ship logs to AWS elasticsearch at the begging of running the logging-fluentd daemonset. However, fluentd shows the below error and failed to ship logs after daemonset runs for a while.

2016-11-27 17:37:28 -0500 [warn]: temporarily failed to flush the buffer. next_retry=2016-11-27 17:42:28 -0500 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:10d9200"

I found some discussion in public forum of elasticsearch and fluentd plugins [1] [2] [3], then added "resurrect_after" and "reload_connections" in fluent.conf. However, the same errors occurred. The current configmap is below.

    ## matches
      @include configs.d/openshift/output-pre-*.conf
      <match journal.system** system.var.log** **_default_** **_openshift_** **_openshift-infra_**>
        @type elasticsearch_dynamic
        host "#{ENV['OPS_HOST']}"
        port "#{ENV['OPS_PORT']}"
        scheme https
        index_name .operations.${record['time'].nil? ? Time.at(time).getutc.strftime(@logstash_dateformat) : Time.parse(record['time']).getutc.strftime(@logstash_dateformat)}

        ca_file "#{ENV['OPS_CA']}"

        flush_interval 5s
        max_retry_wait 300
        disable_retry_limit
        resurrect_after 5s
        reload_connections false
      </match>
      <match **>
        @type elasticsearch_dynamic
        host "#{ENV['ES_HOST']}"
        port "#{ENV['ES_PORT']}"
        scheme https
        index_name ${record['kubernetes_namespace_name']}.${record['kubernetes_namespace_id']}.${Time.at(time).getutc.strftime(@logstash_dateformat)}

        ca_file "#{ENV['ES_CA']}"

        flush_interval 5s
        max_retry_wait 300
        disable_retry_limit
        resurrect_after 5s
        reload_connections false
      </match>
    </label>

I think this issue is particular for the AWS elasticsearch. Can I keep discussing in this thread, or create another Bugzilla?


[1] https://discuss.elastic.co/t/elasitcsearch-ruby-raises-cannot-get-new-connection-from-pool-error/36252/11
[2] https://github.com/uken/fluent-plugin-elasticsearch/issues/182
[3] https://github.com/atomita/fluent-plugin-aws-elasticsearch-service/issues/15#issuecomment-254868104

Comment 14 Rich Megginson 2016-11-28 15:17:41 UTC

(In reply to Takayoshi Tanaka from comment #13)
> Based on the above workaround, I can ship logs to AWS elasticsearch at the
> begging of running the logging-fluentd daemonset. However, fluentd shows the
> below error and failed to ship logs after daemonset runs for a while.
> 
> 2016-11-27 17:37:28 -0500 [warn]: temporarily failed to flush the buffer.
> next_retry=2016-11-27 17:42:28 -0500
> error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get
> new connection from pool." plugin_id="object:10d9200"
> 
> I found some discussion in public forum of elasticsearch and fluentd plugins
> [1] [2] [3], then added "resurrect_after" and "reload_connections" in
> fluent.conf. However, the same errors occurred.

With the new configuration, do the errors occur at the same frequency as with the old configuration?

> The current configmap is
> below.
> 
>     ## matches
>       @include configs.d/openshift/output-pre-*.conf
>       <match journal.system** system.var.log** **_default_** **_openshift_**
> **_openshift-infra_**>
>         @type elasticsearch_dynamic
>         host "#{ENV['OPS_HOST']}"
>         port "#{ENV['OPS_PORT']}"
>         scheme https
>         index_name .operations.${record['time'].nil? ?
> Time.at(time).getutc.strftime(@logstash_dateformat) :
> Time.parse(record['time']).getutc.strftime(@logstash_dateformat)}
> 
>         ca_file "#{ENV['OPS_CA']}"
> 
>         flush_interval 5s
>         max_retry_wait 300
>         disable_retry_limit
>         resurrect_after 5s
>         reload_connections false
>       </match>
>       <match **>
>         @type elasticsearch_dynamic
>         host "#{ENV['ES_HOST']}"
>         port "#{ENV['ES_PORT']}"
>         scheme https
>         index_name
> ${record['kubernetes_namespace_name']}.${record['kubernetes_namespace_id']}.
> ${Time.at(time).getutc.strftime(@logstash_dateformat)}
> 
>         ca_file "#{ENV['ES_CA']}"
> 
>         flush_interval 5s
>         max_retry_wait 300
>         disable_retry_limit
>         resurrect_after 5s
>         reload_connections false
>       </match>
>     </label>
> 
> I think this issue is particular for the AWS elasticsearch. Can I keep
> discussing in this thread, or create another Bugzilla?

Please open another bugzilla.  This is a completely different issue than the original issue for this bz.

> 
> 
> [1]
> https://discuss.elastic.co/t/elasitcsearch-ruby-raises-cannot-get-new-
> connection-from-pool-error/36252/11
> [2] https://github.com/uken/fluent-plugin-elasticsearch/issues/182
> [3]
> https://github.com/atomita/fluent-plugin-aws-elasticsearch-service/issues/
> 15#issuecomment-254868104

Comment 16 Jeff Cantrill 2017-04-04 15:40:52 UTC

Lowering the priority given seems like usecase is small.

Comment 18 Rich Megginson 2017-10-09 21:22:11 UTC

AFAICT, this bug only applies to the es-copy feature which is removed in 3.7

Comment 19 Jeff Cantrill 2018-05-11 20:32:28 UTC

*** Bug 1559435 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.