Description of problem: Deploy logging stack via operator, make sure the logging stack works well. Then scale down CLO to 0, delete mater-certs secret, scale up CLO to 1. After doing these, master-certs secrte is regenerated, the CAs in curator, fluentd, kibana and elasticsearch are changed, but the fluentd and curator can't connect to ES. $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-65458bf7d7-hzv2d 1/1 Running 0 7m curator-1547520600-cjn7m 0/1 Error 0 4m elasticsearch-clientdatamaster-0-1-84d764899d-ckx9l 1/1 Running 0 14m elasticsearch-clientdatamaster-0-2-56984bb76c-lsslp 1/1 Running 0 14m elasticsearch-clientdatamaster-0-3-7cd67f75dd-6dxbp 1/1 Running 0 14m elasticsearch-operator-86cc7cb548-599hl 1/1 Running 0 15m fluentd-46cpb 1/1 Running 0 14m fluentd-6d59t 1/1 Running 0 14m fluentd-hlgrp 1/1 Running 0 14m fluentd-p229t 1/1 Running 0 14m fluentd-pwcc7 1/1 Running 0 14m fluentd-skmzd 1/1 Running 0 14m kibana-675b587dfd-jjpgt 2/2 Running 0 14m Pod logs: $ oc logs curator-1547520600-cjn7m Was not able to connect to Elasticearch at elasticsearch:9200 within 60 attempts $ oc exec fluentd-46cpb -- logs 2019-01-15 02:44:00 +0000 [warn]: Could not connect Elasticsearch or obtain version. Assuming Elasticsearch 5. 2019-01-15 02:44:00 +0000 [warn]: To prevent events traffic jam, you should specify 2 or more 'flush_thread_count'. 2019-01-15 02:44:03 +0000 [warn]: [elasticsearch-apps] Could not connect Elasticsearch or obtain version. Assuming Elasticsearch 5. 2019-01-15 02:44:03 +0000 [warn]: [elasticsearch-apps] To prevent events traffic jam, you should specify 2 or more 'flush_thread_count'. 2019-01-15 02:44:06 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-01-15 02:44:07 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:645:in `rescue in send_bulk' 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:627:in `send_bulk' 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:534:in `block in write' 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:533:in `each' 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-3.0.1/lib/fluent/plugin/out_elasticsearch.rb:533:in `write' 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:1123:in `try_flush' 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:1423:in `flush_thread_run' 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:452:in `block (2 levels) in start' 2019-01-15 02:44:06 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.3.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create' 2019-01-15 02:44:09 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-01-15 02:44:10 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:44:09 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:44:12 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-01-15 02:44:14 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:44:12 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:44:17 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=3 next_retry_seconds=2019-01-15 02:44:21 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:44:17 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:44:24 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=4 next_retry_seconds=2019-01-15 02:44:33 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:44:24 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:44:36 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=5 next_retry_seconds=2019-01-15 02:44:53 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:44:36 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:44:55 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=6 next_retry_seconds=2019-01-15 02:45:26 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:44:55 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:45:29 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-01-15 02:46:37 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:45:29 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:46:44 +0000 [warn]: [elasticsearch-apps] retry succeeded. chunk_id="57f7541be8fd43fa1e6fbfda7c53d328" [core@ip-10-0-45-39 ~]$ oc exec fluentd-46cpb -- logs -f 2019-01-15 02:44:55 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:45:29 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-01-15 02:46:37 +0000 chunk="57f7541be8fd43fa1e6fbfda7c53d328" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): No route to host - connect(2) for 172.30.85.4:9200 (Errno::EHOSTUNREACH)" 2019-01-15 02:45:29 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:46:44 +0000 [warn]: [elasticsearch-apps] retry succeeded. chunk_id="57f7541be8fd43fa1e6fbfda7c53d328" 2019-01-15 02:52:13 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-01-15 02:52:14 +0000 chunk="57f763f8d79c95f5e6848bdfebb363f6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n `Excon.defaults[:ssl_ca_path] = path_to_certs`\n `ENV['SSL_CERT_DIR'] = path_to_certs`\n `Excon.defaults[:ssl_ca_file] = path_to_file`\n `ENV['SSL_CERT_FILE'] = path_to_file`\n `Excon.defaults[:ssl_verify_callback] = callback`\n (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n" 2019-01-15 02:52:13 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:52:14 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-01-15 02:52:15 +0000 chunk="57f763f8d79c95f5e6848bdfebb363f6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n `Excon.defaults[:ssl_ca_path] = path_to_certs`\n `ENV['SSL_CERT_DIR'] = path_to_certs`\n `Excon.defaults[:ssl_ca_file] = path_to_file`\n `ENV['SSL_CERT_FILE'] = path_to_file`\n `Excon.defaults[:ssl_verify_callback] = callback`\n (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n" 2019-01-15 02:52:14 +0000 [warn]: suppressed same stacktrace 2019-01-15 02:52:15 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-01-15 02:52:17 +0000 chunk="57f763f8d79c95f5e6848bdfebb363f6" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n `Excon.defaults[:ssl_ca_path] = path_to_certs`\n `ENV['SSL_CERT_DIR'] = path_to_certs`\n `Excon.defaults[:ssl_ca_file] = path_to_file`\n `ENV['SSL_CERT_FILE'] = path_to_file`\n `Excon.defaults[:ssl_verify_callback] = callback`\n (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n" $ oc exec elasticsearch-clientdatamaster-0-1-84d764899d-ckx9l -- logs [2019-01-15T02:45:01,339][INFO ][c.f.s.c.IndexBaseConfigurationRepository] .searchguard index does not exist yet, so no need to load config on node startup. Use sgadmin to initialize cluster [2019-01-15T02:52:01,738][ERROR][c.f.s.h.SearchGuardHttpServerTransport] [elasticsearch-clientdatamaster-0-1] SSL Problem Received fatal alert: unknown_ca javax.net.ssl.SSLException: Received fatal alert: unknown_ca at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1647) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1615) ~[?:?] at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1781) ~[?:?] at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1070) ~[?:?] at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:896) ~[?:?] at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766) ~[?:?] at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[?:1.8.0_191] at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:255) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1162) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1084) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.13.Final.jar:4.1.13.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191] Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2019-01-14-202030 True False 2h Cluster version is 4.0.0-0.alpha-2019-01-14-202030 $ oc get pod cluster-logging-operator-65458bf7d7-hzv2d -o yaml |grep image image: quay.io/openshift/origin-cluster-logging-operator:latest imagePullPolicy: IfNotPresent imagePullSecrets: image: quay.io/openshift/origin-cluster-logging-operator:latest imageID: quay.io/openshift/origin-cluster-logging-operator@sha256:d86673069b90956945d70ba60754635706e9bb30fcc252c4c913f9805588807c How reproducible: Always Steps to Reproduce: 1. Deploy logging, make sure logging stack works well 2. scale down CLO `oc scale deploy cluster-logging-operator --replicas=0` 3. delete master-certs secret `oc delete secret master-certs` 4. scale up CLO `oc scale deploy cluster-logging-operator --replicas=1` 5. check pod logs Actual results: Expected results: Additional info:
should be resolved by https://github.com/openshift/elasticsearch-operator/pull/80
This issue still can be reproduced with latest EO. Same logs as it in description part.
Pushing out to 4.2. Currently this will only happen if someone is purposely trying to force CA cert rotation and the work around is to delete the elasticsearch pods so they are rescheduled and will read in the newly mounted certs at start up.
The elasticsearch container couldn't start after the secret refreshed. Testing images: ose-elasticsearch-operator-v4.2.0-201907231419 ose-logging-elasticsearch5-v4.2.0-201907222219 $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-7868cb99dc-k6jzl 1/1 Running 0 13m curator-1563937800-kv7m5 1/1 Running 0 9m44s elasticsearch-cdm-do9awmuw-1-7f68678558-plggv 1/2 Running 0 12m elasticsearch-cdm-do9awmuw-2-d4d4c5977-kpt76 1/2 Running 0 17m kibana-5cbd5cc9c9-5rt5b 2/2 Running 0 18m rsyslog-2np4s 2/2 Running 0 18m rsyslog-48tfn 2/2 Running 0 18m rsyslog-78s8r 2/2 Running 0 18m rsyslog-kfhtf 2/2 Running 0 18m rsyslog-ncz4n 2/2 Running 0 18m rsyslog-p7r7l 2/2 Running 0 18m $ oc logs -n openshift-operators-redhat elasticsearch-operator-7b67d65659-fkwlv {"level":"info","ts":1563937271.7598116,"logger":"cmd","msg":"Go Version: go1.11.6"} {"level":"info","ts":1563937271.7598336,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1563937271.759837,"logger":"cmd","msg":"Version of operator-sdk: v0.7.0"} {"level":"info","ts":1563937271.760082,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1563937271.9012883,"logger":"leader","msg":"No pre-existing lock was found."} {"level":"info","ts":1563937271.9076557,"logger":"leader","msg":"Became the leader."} {"level":"info","ts":1563937271.9942646,"logger":"cmd","msg":"Registering Components."} {"level":"info","ts":1563937271.9945068,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="} {"level":"info","ts":1563937272.1010253,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"} {"level":"info","ts":1563937272.1010544,"logger":"cmd","msg":"Starting the Cmd."} {"level":"info","ts":1563937272.201281,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"} {"level":"info","ts":1563937272.3014593,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1} time="2019-07-24T03:07:01Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 7 shards in preparation for cluster restart" time="2019-07-24T03:09:07Z" level=info msg="Timed out waiting for elasticsearch-cdm-do9awmuw-1 to rejoin cluster" time="2019-07-24T03:09:37Z" level=info msg="Waiting for cluster to be fully recovered before restarting elasticsearch-cdm-do9awmuw-2: / green" time="2019-07-24T03:12:39Z" level=info msg="Timed out waiting for elasticsearch-cdm-do9awmuw-1 to rejoin cluster" $ oc logs -c elasticsearch elasticsearch-cdm-do9awmuw-1-7f68678558-plggv [2019-07-24 03:07:50,587][INFO ][container.run ] Begin Elasticsearch startup script [2019-07-24 03:07:50,590][INFO ][container.run ] Comparing the specified RAM to the maximum recommended for Elasticsearch... [2019-07-24 03:07:50,591][INFO ][container.run ] Inspecting the maximum RAM available... [2019-07-24 03:07:50,593][INFO ][container.run ] ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms2048m -Xmx2048m' [2019-07-24 03:07:50,594][INFO ][container.run ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret [2019-07-24 03:07:50,597][INFO ][container.run ] Building required jks files and truststore Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Certificate was added to keystore [2019-07-24 03:07:52,353][INFO ][container.run ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof [2019-07-24 03:07:52,354][INFO ][container.run ] ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms2048m -Xmx2048m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled' [2019-07-24 03:07:52,354][INFO ][container.run ] Checking if Elasticsearch is ready OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N ### LICENSE NOTICE Search Guard ### If you use one or more of the following features in production make sure you have a valid Search Guard license (See https://floragunn.com/searchguard-validate-license) * Kibana Multitenancy * LDAP authentication/authorization * Active Directory authentication/authorization * REST Management API * JSON Web Token (JWT) authentication/authorization * Kerberos authentication/authorization * Document- and Fieldlevel Security (DLS/FLS) * Auditlogging In case of any doubt mail to <sales> ################################### ### LICENSE NOTICE Search Guard ### If you use one or more of the following features in production make sure you have a valid Search Guard license (See https://floragunn.com/searchguard-validate-license) * Kibana Multitenancy * LDAP authentication/authorization * Active Directory authentication/authorization * REST Management API * JSON Web Token (JWT) authentication/authorization * Kerberos authentication/authorization * Document- and Fieldlevel Security (DLS/FLS) * Auditlogging In case of any doubt mail to <sales> ################################### Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation. Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [2019-07-24 03:13:38,335][ERROR][container.run ] Timed out waiting for Elasticsearch to be ready HTTP/1.1 503 Service Unavailable content-type: application/json; charset=UTF-8 content-length: 331 The output of `oc exec elasticsearch-cdm-do9awmuw-1-7f68678558-plggv -- logs` is in the attachment.
Created attachment 1593051 [details] elasticsearch pod log
Created attachment 1593055 [details] Elasticsearch pod logs
I'm able to reproduce this -- The EO is attempting to do a rolling restart and after one ES node is restarted it will not be able to communicate with the rest of the cluster.
Verified in 4.2.0-0.nightly-2019-10-10-225709. The cluster logging are blocked by OLM issue in 4.3.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3151