Bug 1569338 - [Upgrade] Upgrade logging from 3.4 to 3.5 with enabled ops cluster, SSL error in ES pod caused kibana service is not available
Summary: [Upgrade] Upgrade logging from 3.4 to 3.5 with enabled ops cluster, SSL error...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.5.z
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-19 05:31 UTC by Junqi Zhao
Modified: 2018-05-17 17:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-17 17:28:50 UTC
Target Upstream Version:


Attachments (Terms of Use)
"Application is not available" in kibana UI (52.49 KB, image/png)
2018-04-19 05:31 UTC, Junqi Zhao
no flags Details
kibana-ops UI is normal (260.03 KB, image/png)
2018-04-19 05:32 UTC, Junqi Zhao
no flags Details
logging 3.4 environment dump (32.14 KB, application/x-gzip)
2018-04-19 05:32 UTC, Junqi Zhao
no flags Details
logging 3.5 environment dump (55.94 KB, application/x-gzip)
2018-04-19 05:33 UTC, Junqi Zhao
no flags Details

Description Junqi Zhao 2018-04-19 05:31:10 UTC
Created attachment 1423905 [details]
"Application is not available" in kibana UI

Description of problem:
Enabled ops cluster, and deploy logging 3.4 first, then upgrade to logging 3.5,
SSL error in ES pod caused kibana service is not available, "Application is not available" in kibana UI, but the kibana-ops service is normal, could login kibana-ops UI.

This issue is only happen with ops enabled cluster, it does not have this issue with disabled ops cluster

[2018-04-19 03:15:28,966][ERROR][com.floragunn.searchguard.http.SearchGuardHttpServerTransport] [logging-es-jg4ew0r5] SSL Problem General SSLEngine problem
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
	at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1529)
	at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
	at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813)
	at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
	at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
	at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1219)
	at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:330)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322)
	at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1979)
	at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:237)
	at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052)
	at sun.security.ssl.Handshaker$1.run(Handshaker.java:992)
	at sun.security.ssl.Handshaker$1.run(Handshaker.java:989)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1467)
	at org.jboss.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1393)
	at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1256)
	... 18 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397)
	at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302)
	at sun.security.validator.Validator.validate(Validator.java:260)
	at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
	at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:279)
	at sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(X509TrustManagerImpl.java:130)
	at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1966)
	... 26 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
	at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
	at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392)
	... 32 more

Version-Release number of selected component (if applicable):
# openshift version
openshift v3.5.5.31.67
kubernetes v1.5.2+43a9be4
etcd 3.1.0


logging-deployer/images/v3.4.1.44.53-3
logging-kibana/images/v3.4.1.44.53-2
logging-elasticsearch/images/v3.4.1.44.53-2
logging-curator/images/v3.4.1.44.53-4
logging-fluentd/images/v3.4.1.44.53-2
logging-auth-proxy/images/v3.4.1.44.53-2

logging-elasticsearch/images/v3.5.5.31.67-2
logging-curator/images/v3.5.5.31.67-4
logging-fluentd/images/v3.5.5.31.67-2
logging-kibana/images/v3.5.5.31.67-2
logging-auth-proxy/images/v3.5.5.31.67-2

How reproducible:
Always

Steps to Reproduce:
1. Deployg logging 3.4 with enabled ops cluster
2. Upgrade logging from 3.4 to 3.5 and login kibana and kibana-ops UI
3.

Actual results:
kibana service is not available, but kibana-ops service is normal

Expected results:
kibana service is available

Additional info:

Comment 1 Junqi Zhao 2018-04-19 05:32:15 UTC
Created attachment 1423906 [details]
kibana-ops UI is normal

Comment 2 Junqi Zhao 2018-04-19 05:32:50 UTC
Created attachment 1423908 [details]
logging 3.4 environment dump

Comment 3 Junqi Zhao 2018-04-19 05:33:19 UTC
Created attachment 1423909 [details]
logging 3.5 environment dump

Comment 4 Jeff Cantrill 2018-05-14 19:22:45 UTC
Junqi,  Do you see this in the current stack (e.g. 3.9 or better?).  I'm inclined to close as 'WONTFIX' given there is no attached customer case we are anticipating moving away from the current mechanism to deploying an 'ops' cluster

Comment 5 Junqi Zhao 2018-05-17 06:15:37 UTC
(In reply to Jeff Cantrill from comment #4)
> Junqi,  Do you see this in the current stack (e.g. 3.9 or better?).  I'm
> inclined to close as 'WONTFIX' given there is no attached customer case we
> are anticipating moving away from the current mechanism to deploying an
> 'ops' cluster

Upgrade from 3.9 to 3.10 does not have this issue, close as 'WONTFIX' is fine


Note You need to log in before you can comment on or make changes to this bug.