Created attachment 1423905 [details] "Application is not available" in kibana UI Description of problem: Enabled ops cluster, and deploy logging 3.4 first, then upgrade to logging 3.5, SSL error in ES pod caused kibana service is not available, "Application is not available" in kibana UI, but the kibana-ops service is normal, could login kibana-ops UI. This issue is only happen with ops enabled cluster, it does not have this issue with disabled ops cluster [2018-04-19 03:15:28,966][ERROR][com.floragunn.searchguard.http.SearchGuardHttpServerTransport] [logging-es-jg4ew0r5] SSL Problem General SSLEngine problem javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1529) at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535) at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813) at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1219) at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:330) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1979) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:237) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052) at sun.security.ssl.Handshaker$1.run(Handshaker.java:992) at sun.security.ssl.Handshaker$1.run(Handshaker.java:989) at java.security.AccessController.doPrivileged(Native Method) at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1467) at org.jboss.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1393) at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1256) ... 18 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302) at sun.security.validator.Validator.validate(Validator.java:260) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:279) at sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(X509TrustManagerImpl.java:130) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1966) ... 26 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392) ... 32 more Version-Release number of selected component (if applicable): # openshift version openshift v3.5.5.31.67 kubernetes v1.5.2+43a9be4 etcd 3.1.0 logging-deployer/images/v3.4.1.44.53-3 logging-kibana/images/v3.4.1.44.53-2 logging-elasticsearch/images/v3.4.1.44.53-2 logging-curator/images/v3.4.1.44.53-4 logging-fluentd/images/v3.4.1.44.53-2 logging-auth-proxy/images/v3.4.1.44.53-2 logging-elasticsearch/images/v3.5.5.31.67-2 logging-curator/images/v3.5.5.31.67-4 logging-fluentd/images/v3.5.5.31.67-2 logging-kibana/images/v3.5.5.31.67-2 logging-auth-proxy/images/v3.5.5.31.67-2 How reproducible: Always Steps to Reproduce: 1. Deployg logging 3.4 with enabled ops cluster 2. Upgrade logging from 3.4 to 3.5 and login kibana and kibana-ops UI 3. Actual results: kibana service is not available, but kibana-ops service is normal Expected results: kibana service is available Additional info:
Created attachment 1423906 [details] kibana-ops UI is normal
Created attachment 1423908 [details] logging 3.4 environment dump
Created attachment 1423909 [details] logging 3.5 environment dump
Junqi, Do you see this in the current stack (e.g. 3.9 or better?). I'm inclined to close as 'WONTFIX' given there is no attached customer case we are anticipating moving away from the current mechanism to deploying an 'ops' cluster
(In reply to Jeff Cantrill from comment #4) > Junqi, Do you see this in the current stack (e.g. 3.9 or better?). I'm > inclined to close as 'WONTFIX' given there is no attached customer case we > are anticipating moving away from the current mechanism to deploying an > 'ops' cluster Upgrade from 3.9 to 3.10 does not have this issue, close as 'WONTFIX' is fine