1666167 – Fluentd and curator can't connect to ES after secrets regenerated.

Bug 1666167 - Fluentd and curator can't connect to ES after secrets regenerated.

Summary: Fluentd and curator can't connect to ES after secrets regenerated.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.2.z
Assignee:	ewolinet
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-15 05:17 UTC by Qiaoling Tang
Modified:	2019-10-30 04:45 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-30 04:44:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
elasticsearch pod log (1.95 MB, text/plain) 2019-07-24 03:21 UTC, Qiaoling Tang	no flags	Details
Elasticsearch pod logs (102.59 KB, application/gzip) 2019-07-24 05:43 UTC, Qiaoling Tang	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-operator pull 157	'None'	closed	Bug 1666167 - Updating to use the EO serviceaccount token instead of the admin cert	2021-02-17 13:45:41 UTC
Github	openshift elasticsearch-operator pull 178	'None'	closed	Bug 1666167: Do a full cluster restart in case of cert redeploy	2021-02-17 13:45:41 UTC
Github	openshift origin-aggregated-logging pull 1662	'None'	closed	Bug 1666167 - Updating to allow the EO to be authorized as sg_role_admin if presenting its SA token	2021-02-17 13:45:41 UTC
Red Hat Product Errata	RHBA-2019:3151	None	None	None	2019-10-30 04:45:04 UTC

Description Qiaoling Tang 2019-01-15 05:17:03 UTC

Description of problem:
Deploy logging stack via operator, make sure the logging stack works well. Then scale down CLO to 0, delete mater-certs secret, scale up CLO to 1. After doing these, master-certs secrte is regenerated, the CAs in curator, fluentd, kibana and elasticsearch are changed, but the fluentd and curator can't connect to ES.

$ oc get pod
NAME                                                  READY     STATUS    RESTARTS   AGE
cluster-logging-operator-65458bf7d7-hzv2d             1/1       Running   0          7m
curator-1547520600-cjn7m                              0/1       Error     0          4m
elasticsearch-clientdatamaster-0-1-84d764899d-ckx9l   1/1       Running   0          14m
elasticsearch-clientdatamaster-0-2-56984bb76c-lsslp   1/1       Running   0          14m
elasticsearch-clientdatamaster-0-3-7cd67f75dd-6dxbp   1/1       Running   0          14m
elasticsearch-operator-86cc7cb548-599hl               1/1       Running   0          15m
fluentd-46cpb                                         1/1       Running   0          14m
fluentd-6d59t                                         1/1       Running   0          14m
fluentd-hlgrp                                         1/1       Running   0          14m
fluentd-p229t                                         1/1       Running   0          14m
fluentd-pwcc7                                         1/1       Running   0          14m
fluentd-skmzd                                         1/1       Running   0          14m
kibana-675b587dfd-jjpgt                               2/2       Running   0          14m

Pod logs:

$ oc logs curator-1547520600-cjn7m
Was not able to connect to Elasticearch at elasticsearch:9200 within 60 attempts

$ oc exec fluentd-46cpb -- logs
2019-01-15 02:44:00 +0000 [warn]: Could not connect Elasticsearch or obtain version. Assuming Elasticsearch 5.
2019-01-15 02:44:00 +0000 [warn]: To prevent events traffic jam, you should specify 2 or more 'flush_thread_count'.
2019-01-15 02:44:03 +0000 [warn]: [elasticsearch-apps] Could not connect Elasticsearch or obtain version. Assuming Elasticsearch 5.
2019-01-15 02:44:03 +0000 [warn]: [elasticsearch-apps] To prevent events traffic jam, you should specify 2 or more 'flush_thread_count'.
2019-01-15 02:44:06 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-01-15 02:44:07 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:645:in `rescue in send_bulk'
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:627:in `send_bulk'
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:534:in `block in write'
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:533:in `each'
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:533:in `write'
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:1123:in `try_flush'
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:1423:in `flush_thread_run'
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:452:in `block (2 levels) in start'
  2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.3.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-01-15 02:44:09 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-01-15 02:44:10 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:44:09 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:44:12 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-01-15 02:44:14 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:44:12 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:44:17 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=3 next_retry_seconds=2019-01-15 02:44:21 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:44:17 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:44:24 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=4 next_retry_seconds=2019-01-15 02:44:33 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:44:24 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:44:36 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=5 next_retry_seconds=2019-01-15 02:44:53 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:44:36 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:44:55 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=6 next_retry_seconds=2019-01-15 02:45:26 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:44:55 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:45:29 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-01-15 02:46:37 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:45:29 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:46:44 +0000 [warn]: [elasticsearch-apps] retry succeeded. chunk_id="57f7541be8fd43fa1e6fbfda7c53d328"
[core@ip-10-0-45-39 ~]$ oc exec fluentd-46cpb -- logs -f
  2019-01-15 02:44:55 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:45:29 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-01-15 02:46:37 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)"
  2019-01-15 02:45:29 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:46:44 +0000 [warn]: [elasticsearch-apps] retry succeeded. chunk_id="57f7541be8fd43fa1e6fbfda7c53d328"
2019-01-15 02:52:13 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-01-15 02:52:14 +0000 chunk="57f763f8d79c95f5e6848bdfebb363f6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n            `Excon.defaults[:ssl_ca_path] = path_to_certs`\n            `ENV['SSL_CERT_DIR'] = path_to_certs`\n            `Excon.defaults[:ssl_ca_file] = path_to_file`\n            `ENV['SSL_CERT_FILE'] = path_to_file`\n            `Excon.defaults[:ssl_verify_callback] = callback`\n                (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n            `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n"
  2019-01-15 02:52:13 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:52:14 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-01-15 02:52:15 +0000 chunk="57f763f8d79c95f5e6848bdfebb363f6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n            `Excon.defaults[:ssl_ca_path] = path_to_certs`\n            `ENV['SSL_CERT_DIR'] = path_to_certs`\n            `Excon.defaults[:ssl_ca_file] = path_to_file`\n            `ENV['SSL_CERT_FILE'] = path_to_file`\n            `Excon.defaults[:ssl_verify_callback] = callback`\n                (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n            `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n"
  2019-01-15 02:52:14 +0000 [warn]: suppressed same stacktrace
2019-01-15 02:52:15 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-01-15 02:52:17 +0000 chunk="57f763f8d79c95f5e6848bdfebb363f6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n            `Excon.defaults[:ssl_ca_path] = path_to_certs`\n            `ENV['SSL_CERT_DIR'] = path_to_certs`\n            `Excon.defaults[:ssl_ca_file] = path_to_file`\n            `ENV['SSL_CERT_FILE'] = path_to_file`\n            `Excon.defaults[:ssl_verify_callback] = callback`\n                (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n            `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n"

$ oc exec elasticsearch-clientdatamaster-0-1-84d764899d-ckx9l -- logs
[2019-01-15T02:45:01,339][INFO ][c.f.s.c.IndexBaseConfigurationRepository] .searchguard index does not exist yet, so no need to load config on node startup. Use sgadmin to initialize cluster
[2019-01-15T02:52:01,738][ERROR][c.f.s.h.SearchGuardHttpServerTransport] [elasticsearch-clientdatamaster-0-1] SSL Problem Received fatal alert: unknown_ca
javax.net.ssl.SSLException: Received fatal alert: unknown_ca
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1647) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1615) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1781) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1070) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:896) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766) ~[?:?]
	at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[?:1.8.0_191]
	at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:255) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1162) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1084) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.13.Final.jar:4.1.13.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]


Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2019-01-14-202030   True        False         2h        Cluster version is 4.0.0-0.alpha-2019-01-14-202030

$ oc get pod cluster-logging-operator-65458bf7d7-hzv2d -o yaml |grep image
    image: quay.io/openshift/origin-cluster-logging-operator:latest
    imagePullPolicy: IfNotPresent
  imagePullSecrets:
    image: quay.io/openshift/origin-cluster-logging-operator:latest
    imageID: quay.io/openshift/origin-cluster-logging-operator@sha256:d86673069b90956945d70ba60754635706e9bb30fcc252c4c913f9805588807c

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging, make sure logging stack works well
2. scale down CLO `oc scale deploy cluster-logging-operator --replicas=0`
3. delete master-certs secret `oc delete secret master-certs`
4. scale up CLO `oc scale deploy cluster-logging-operator --replicas=1`
5. check pod logs

Actual results:


Expected results:


Additional info:

Comment 3 ewolinet 2019-04-02 18:09:43 UTC

should be resolved by https://github.com/openshift/elasticsearch-operator/pull/80

Comment 4 Qiaoling Tang 2019-04-03 02:39:44 UTC

This issue still can be reproduced with latest EO.

Same logs as it in description part.

Comment 5 ewolinet 2019-04-04 15:31:34 UTC

Pushing out to 4.2.

Currently this will only happen if someone is purposely trying to force CA cert rotation and the work around is to delete the elasticsearch pods so they are rescheduled and will read in the newly mounted certs at start up.

Comment 7 Qiaoling Tang 2019-07-24 03:20:24 UTC

The elasticsearch container couldn't start after the secret refreshed. Testing images:

ose-elasticsearch-operator-v4.2.0-201907231419
ose-logging-elasticsearch5-v4.2.0-201907222219


$ oc get pod
NAME                                            READY   STATUS    RESTARTS   AGE
cluster-logging-operator-7868cb99dc-k6jzl       1/1     Running   0          13m
curator-1563937800-kv7m5                        1/1     Running   0          9m44s
elasticsearch-cdm-do9awmuw-1-7f68678558-plggv   1/2     Running   0          12m
elasticsearch-cdm-do9awmuw-2-d4d4c5977-kpt76    1/2     Running   0          17m
kibana-5cbd5cc9c9-5rt5b                         2/2     Running   0          18m
rsyslog-2np4s                                   2/2     Running   0          18m
rsyslog-48tfn                                   2/2     Running   0          18m
rsyslog-78s8r                                   2/2     Running   0          18m
rsyslog-kfhtf                                   2/2     Running   0          18m
rsyslog-ncz4n                                   2/2     Running   0          18m
rsyslog-p7r7l                                   2/2     Running   0          18m

$ oc logs -n openshift-operators-redhat elasticsearch-operator-7b67d65659-fkwlv
{"level":"info","ts":1563937271.7598116,"logger":"cmd","msg":"Go Version: go1.11.6"}
{"level":"info","ts":1563937271.7598336,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1563937271.759837,"logger":"cmd","msg":"Version of operator-sdk: v0.7.0"}
{"level":"info","ts":1563937271.760082,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1563937271.9012883,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1563937271.9076557,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1563937271.9942646,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1563937271.9945068,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1563937272.1010253,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
{"level":"info","ts":1563937272.1010544,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1563937272.201281,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1563937272.3014593,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
time="2019-07-24T03:07:01Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 7 shards in preparation for cluster restart"
time="2019-07-24T03:09:07Z" level=info msg="Timed out waiting for elasticsearch-cdm-do9awmuw-1 to rejoin cluster"
time="2019-07-24T03:09:37Z" level=info msg="Waiting for cluster to be fully recovered before restarting elasticsearch-cdm-do9awmuw-2:  / green"
time="2019-07-24T03:12:39Z" level=info msg="Timed out waiting for elasticsearch-cdm-do9awmuw-1 to rejoin cluster"

$ oc logs -c elasticsearch elasticsearch-cdm-do9awmuw-1-7f68678558-plggv
[2019-07-24 03:07:50,587][INFO ][container.run            ] Begin Elasticsearch startup script
[2019-07-24 03:07:50,590][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2019-07-24 03:07:50,591][INFO ][container.run            ] Inspecting the maximum RAM available...
[2019-07-24 03:07:50,593][INFO ][container.run            ] ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms2048m -Xmx2048m'
[2019-07-24 03:07:50,594][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret
[2019-07-24 03:07:50,597][INFO ][container.run            ] Building required jks files and truststore
Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore
Certificate was added to keystore
[2019-07-24 03:07:52,353][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2019-07-24 03:07:52,354][INFO ][container.run            ] ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms2048m -Xmx2048m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled'
[2019-07-24 03:07:52,354][INFO ][container.run            ] Checking if Elasticsearch is ready
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N

### LICENSE NOTICE Search Guard ###

If you use one or more of the following features in production
make sure you have a valid Search Guard license
(See https://floragunn.com/searchguard-validate-license)

* Kibana Multitenancy
* LDAP authentication/authorization
* Active Directory authentication/authorization
* REST Management API
* JSON Web Token (JWT) authentication/authorization
* Kerberos authentication/authorization
* Document- and Fieldlevel Security (DLS/FLS)
* Auditlogging

In case of any doubt mail to <sales>
###################################

### LICENSE NOTICE Search Guard ###

If you use one or more of the following features in production
make sure you have a valid Search Guard license
(See https://floragunn.com/searchguard-validate-license)

* Kibana Multitenancy
* LDAP authentication/authorization
* Active Directory authentication/authorization
* REST Management API
* JSON Web Token (JWT) authentication/authorization
* Kerberos authentication/authorization
* Document- and Fieldlevel Security (DLS/FLS)
* Auditlogging

In case of any doubt mail to <sales>
###################################
Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation.
Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[2019-07-24 03:13:38,335][ERROR][container.run            ] Timed out waiting for Elasticsearch to be ready
HTTP/1.1 503 Service Unavailable
content-type: application/json; charset=UTF-8
content-length: 331

The output of `oc exec elasticsearch-cdm-do9awmuw-1-7f68678558-plggv -- logs` is in the attachment.

Comment 8 Qiaoling Tang 2019-07-24 03:21:03 UTC

Created attachment 1593051 [details]
elasticsearch pod log

Comment 10 Qiaoling Tang 2019-07-24 05:43:28 UTC

Created attachment 1593055 [details]
Elasticsearch pod logs

Comment 11 ewolinet 2019-07-24 16:17:07 UTC

I'm able to reproduce this --

The EO is attempting to do a rolling restart and after one ES node is restarted it will not be able to communicate with the rest of the cluster.

Comment 13 Anping Li 2019-10-11 09:50:45 UTC

Verified in 4.2.0-0.nightly-2019-10-10-225709.   The cluster logging are blocked by OLM issue in 4.3.

Comment 16 errata-xmlrpc 2019-10-30 04:44:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3151

Note You need to log in before you can comment on or make changes to this bug.