Bug 1369646
Summary: | Encounter "503 Service Unavailable" while accessing Kibana OPS UI after upgrading from logging 3.2.0 to 3.3.0 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||||||
Component: | Logging | Assignee: | Luke Meyer <lmeyer> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | chunchen <chunchen> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 3.3.0 | CC: | anli, aos-bugs, ewolinet, pweil, tdawson, wsun | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||||
Doc Text: |
undefined
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2016-09-27 09:45:39 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Xia Zhao
2016-08-24 05:31:38 UTC
Created attachment 1193472 [details]
Upgrade_pod_log
Created attachment 1193473 [details]
OPS UI kibana screenshot which running fine
Created attachment 1193474 [details]
Non OPS Kiana UI sccreeshot which get bug repro
It is a cert problem as Paul said; the problem seems to be that logging-kibana-proxy and logging-kibana-ops-proxy secrets get different server certs (which is right) signed by different signers, which should not happen as every cert in the deployment should have the same signer. So, the routes should have been right (having the same CA for both), but the server cert on the kibana-ops instance is wrong. I need to figure out if that's something new or we just never noticed before we had and upgrade creating a reencrypt route. (In reply to Luke Meyer from comment #14) > It is a cert problem as Paul said; the problem seems to be that > logging-kibana-proxy and logging-kibana-ops-proxy secrets get different > server certs (which is right) signed by different signers, which should not > happen as every cert in the deployment should have the same signer. So, the > routes should have been right (having the same CA for both), but the server > cert on the kibana-ops instance is wrong. I need to figure out if that's > something new or we just never noticed before we had and upgrade creating a > reencrypt route. Thanks for the info Luke. I'll keep the test env in comment #12 until it's finished in using by you. The problem is that in OSE 3.2, kibana and kibana-ops pods were created with separate secrets (though they had the same contents) and in 3.3 they are both created to use the same secret, logging-kibana-proxy. The logging-kibana-ops-proxy secret from the 3.2 installation is left unaltered by the upgrade, as is the kibana-ops DC secret volume mount, while all the other secrets are regenerated with a new signer. The routes are replaced with reencrypt routes looking for the new signer, so the kibana-ops cert isn't trusted. I need to fix the upgrade so that it deletes the old secret and patches the kibana-ops DC to look at the right one. Retested with the latest 3.3.0 logging images on brew, The kibana ops pod did not start up successfully after upgrade: $ oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-2j802 1/1 Running 0 2h logging-curator-ops-1-i219t 1/1 Running 0 2h logging-deployer-zczlp 0/1 Error 0 2h logging-es-60qdpasn-3-8grsp 1/1 Running 0 2h logging-es-ops-pvesokep-3-cz3e1 1/1 Running 0 2h logging-fluentd-seosv 1/1 Running 0 2h logging-kibana-2-8068e 2/2 Running 0 2h logging-kibana-ops-2-dapa2 0/2 ContainerCreating 0 2h And the upgrade deployer pod failed by this error: +++ oc get pod logging-kibana-ops-2-dapa2 -o 'jsonpath={.status.phase}' ++ [[ Running == \P\e\n\d\i\n\g ]] + sleep 1 + (( i++ )) + (( i<=300 )) + eval '[[ "Running" == "$(oc get pod logging-kibana-ops-2-dapa2 -o jsonpath='\''{.status.phase}'\'')" ]]' +++ oc get pod logging-kibana-ops-2-dapa2 -o 'jsonpath={.status.phase}' ++ [[ Running == \P\e\n\d\i\n\g ]] + sleep 1 logging-kibana-ops-2-dapa2 not started within 300 seconds + (( i++ )) + (( i<=300 )) + return 1 + echo 'logging-kibana-ops-2-dapa2 not started within 300 seconds' + return 1 I will retry and update later. I think I need to redeploy the kibana-ops DC after modifying it; it's probably looking for a secret that no longer exists. Built logging-deployer:3.3.0-9. Commit pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/221beecd5920f3f76d3694623d46faa6f372366e origin fix for bug 1369646 Make the upgrade set the correct secret volume on the logging-kibana-ops DC; in earlier versions, it got a separate-but-equal secret, but in the present versions both kibana DCs should use the same logging-kibana-proxy secret. It's fixed with below latest images: brew-pulp-docker01...com:8888/openshift3/logging-deployer 3.3.0 de84ad1448af 11 hours ago 760.1 MB brew-pulp-docker01...com:8888/openshift3/logging-kibana 3.3.0 ad2713df85a7 11 hours ago 266.9 MB brew-pulp-docker01...com:8888/openshift3/logging-fluentd 3.3.0 74505c2dd791 12 hours ago 238.7 MB brew-pulp-docker01...com:8888/openshift3/logging-elasticsearch 3.3.0 f204bea758eb 5 days ago 426 MB brew-pulp-docker01...com:8888/openshift3/logging-auth-proxy 3.3.0 196ecb30fc93 3 weeks ago 229.2 MB brew-pulp-docker01...com:8888/openshift3/logging-curator 3.3.0 2c88e1273c11 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933 |