Description of problem: Fluentd, Curator and Kibana are unable to connect to Elasticsearch, no other indices in ES pod except .searchguard, curator pod start failed. $ oc exec elasticsearch-clientdatamaster-0-1-84d764899d-wl4l5 -- indices Mon Dec 17 02:06:20 UTC 2018 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .searchguard rLx5fPSdTXCfIcX0vSQHrQ 1 0 5 0 0 0 $ oc logs curator-1545012660-cbrhd Was not able to connect to Elasticearch at elasticsearch:9200 within 60 attempts Fluentd pod logs: 2018-12-17 01:41:51 +0000 [warn]: [elasticsearch-apps] failed to flush the buffer. retry_time=8 next_retry_seconds=2018-12-17 01:43:53 +0000 chunk="57d2dd72fde8b7ccb8167bdbc2835b05" error_class=Faraday::SSLError error="SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate) (OpenSSL::SSL::SSLError) Unable to verify certificate. This may be an issue with the remote host or with Excon. Excon has certificates bundled, but these can be customized:\n\n `Excon.defaults[:ssl_ca_path] = path_to_certs`\n `ENV['SSL_CERT_DIR'] = path_to_certs`\n `Excon.defaults[:ssl_ca_file] = path_to_file`\n `ENV['SSL_CERT_FILE'] = path_to_file`\n `Excon.defaults[:ssl_verify_callback] = callback`\n (see OpenSSL::SSL::SSLContext#verify_callback)\nor:\n `Excon.defaults[:ssl_verify_peer] = false` (less secure).\n" 2018-12-17 01:41:51 +0000 [warn]: suppressed same stacktrace Kibana logs: $ oc logs -c kibana kibana-675b587dfd-prf28 {"type":"log","@timestamp":"2018-12-17T01:39:11Z","tags":["warning","elasticsearch","admin"],"pid":222,"message":"Unable to revive connection: https://elasticsearch:9200/"} {"type":"log","@timestamp":"2018-12-17T01:39:11Z","tags":["warning","elasticsearch","admin"],"pid":222,"message":"No living connections"} {"type":"log","@timestamp":"2018-12-17T01:39:11Z","tags":["status","plugin:elasticsearch.13","error"],"pid":222,"state":"red","message":"Status changed from red to red - Unable to connect to Elasticsearch at https://elasticsearch:9200.","prevState":"red","prevMsg":"Request Timeout after 3000ms"} {"type":"log","@timestamp":"2018-12-17T01:39:14Z","tags":["warning","elasticsearch","admin"],"pid":222,"message":"Unable to revive connection: https://elasticsearch:9200/"} {"type":"log","@timestamp":"2018-12-17T01:39:14Z","tags":["warning","elasticsearch","admin"],"pid":222,"message":"No living connections"} {"type":"response","@timestamp":"2018-12-17T01:39:15Z","tags":[],"pid":222,"method":"head","statusCode":200,"req":{"url":"/","method":"head","headers":{"user-agent":"curl/7.29.0","host":"localhost:5601","accept":"*/*"},"remoteAddress":"127.0.0.1","userAgent":"127.0.0.1"},"res":{"statusCode":200,"responseTime":6,"contentLength":9},"message":"HEAD / 200 6ms - 9.0B"} ES pod logs: [2018-12-17T01:57:49,792][ERROR][c.f.s.h.SearchGuardHttpServerTransport] [elasticsearch-clientdatamaster-0-1] SSL Problem General SSLEngine problem javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1521) ~[?:?] at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:528) ~[?:?] at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:802) ~[?:?] at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766) ~[?:?] at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[?:1.8.0_191] at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:255) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1162) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1084) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-codec-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) [netty-transport-4.1.13.Final.jar:4.1.13.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.13.Final.jar:4.1.13.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191] Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1709) ~[?:?] at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:318) ~[?:?] at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:310) ~[?:?] at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1983) ~[?:?] at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:232) ~[?:?] at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) ~[?:?] at sun.security.ssl.Handshaker$1.run(Handshaker.java:970) ~[?:?] at sun.security.ssl.Handshaker$1.run(Handshaker.java:967) ~[?:?] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191] at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1459) ~[?:?] at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1301) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1214) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] ... 18 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397) ~[?:?] at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302) ~[?:?] at sun.security.validator.Validator.validate(Validator.java:262) ~[?:?] at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) ~[?:?] at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:279) ~[?:?] at sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(X509TrustManagerImpl.java:130) ~[?:?] at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1970) ~[?:?] at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:232) ~[?:?] at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) ~[?:?] at sun.security.ssl.Handshaker$1.run(Handshaker.java:970) ~[?:?] at sun.security.ssl.Handshaker$1.run(Handshaker.java:967) ~[?:?] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191] at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1459) ~[?:?] at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1301) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1214) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] ... 18 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) ~[?:?] at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) ~[?:?] at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) ~[?:1.8.0_191] at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392) ~[?:?] at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302) ~[?:?] at sun.security.validator.Validator.validate(Validator.java:262) ~[?:?] at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) ~[?:?] at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:279) ~[?:?] at sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(X509TrustManagerImpl.java:130) ~[?:?] at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1970) ~[?:?] at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:232) ~[?:?] at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) ~[?:?] at sun.security.ssl.Handshaker$1.run(Handshaker.java:970) ~[?:?] at sun.security.ssl.Handshaker$1.run(Handshaker.java:967) ~[?:?] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191] at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1459) ~[?:?] at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1301) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1214) ~[netty-handler-4.1.13.Final.jar:4.1.13.Final] ... 18 more Version-Release number of selected component (if applicable): # ./bin/openshift-install version ./bin/openshift-install v0.7.0-master-4-ga4e426ee762c20019bbb90fe35d33c9b26d23393 How reproducible: Always Steps to Reproduce: 1. Deploy logging EFK stack according to https://github.com/openshift/cluster-logging-operator#full-deploy 2. 3. Actual results: Expected results: Additional info: The test env is deployed by NextGen installer. $ oc exec -c kibana kibana-675b587dfd-prf28 -- curl -s -k --cert /etc/kibana/keys/cert --key /etc/kibana/keys/key https://elasticsearch:9200 -v * About to connect() to elasticsearch port 9200 (#0) * Trying 172.30.63.185... * Connected to elasticsearch (172.30.63.185) port 9200 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * skipping SSL peer certificate verification * NSS: client certificate from file * subject: CN=system.logging.kibana,OU=OpenShift,O=Logging * start date: Dec 17 01:37:40 2018 GMT * expire date: Dec 16 01:37:40 2020 GMT * common name: system.logging.kibana * issuer: CN=openshift-cluster-logging-signer * NSS error -12224 (SSL_ERROR_CERTIFICATE_UNKNOWN_ALERT) * SSL peer had some unspecified issue with the certificate it received. * Closing connection 0 command terminated with exit code 35
You may have been in between commits. Deployed a cluster today followed by logging using: Installer: $ ./bin/openshift-install version ./bin/openshift-install v0.7.0-master cluster-logging-operator: HEAD - 741444c81037b9a4ce26011ef6361dd9b28a9161 using the latest logging images as of 2018-dec-19
Tested with: # ./bin/openshift-install version ./bin/openshift-install v0.7.0-master-50-gded0735f92077f85de3d6a4dbc69b471b6683c3d $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2018-12-20-000118 True False 34m Cluster version is 4.0.0-0.alpha-2018-12-20-000118 The logging EFK works.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758