Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1369646 - Encounter "503 Service Unavailable" while accessing Kibana OPS UI after upgrading from logging 3.2.0 to 3.3.0
Encounter "503 Service Unavailable" while accessing Kibana OPS UI after upgra...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.3.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Luke Meyer
chunchen
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-08-24 01:31 EDT by Xia Zhao
Modified: 2017-03-08 13 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-27 05:45:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Upgrade_pod_log (177.12 KB, text/plain)
2016-08-24 01:41 EDT, Xia Zhao
no flags Details
OPS UI kibana screenshot which running fine (79.19 KB, image/png)
2016-08-24 01:41 EDT, Xia Zhao
no flags Details
Non OPS Kiana UI sccreeshot which get bug repro (206.37 KB, image/png)
2016-08-24 01:42 EDT, Xia Zhao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1933 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3 Release Advisory 2016-09-27 09:24:36 EDT

  None (edit)
Description Xia Zhao 2016-08-24 01:31:38 EDT
Description of problem:
Upgrade logging from 3.2.0 to 3.3.0 when ENABLE_OPS_CLUSTER=true, after upgrade pod finished successfully, Encounter "503 Service Unavailable" while accessing the Kibana OPS UI. The non-OPS UI worked fine.

Version-Release number of selected component (if applicable):
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd            3.3.0               4d87d421e950        5 days ago          238.7 MB
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-auth-proxy         3.3.0               196ecb30fc93        2 weeks ago         229.2 MB
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch      3.3.0               e71d2b04669c        4 weeks ago         426.9 MB
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer           3.3.0               1c127f4f36a0        4 weeks ago         747.9 MB
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-curator            3.3.0               2c88e1273c11        4 weeks ago         253.8 MB
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-kibana             3.3.0               32d276bb46ae        8 weeks ago 

How reproducible:
Always

Steps to Reproduce:

0. Deploy 3.2.0 logging systems ( with OPS cluster enabled) :
IMAGE_PREFIX = brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/
IMAGE_VERSION = 3.2.0
ENABLE_OPS_CLUSTER=true

Make sure EFK pods are running fine, kibana and kibana OPS UI is accesible & functional.

1. Add yourself to cluster-admin
$ oadm policy add-cluster-role-to-user cluster-admin xiazhao@redhat.com

2. Delete existing templates if it exist
$ oc delete template logging-deployer-account-template logging-deployer-template
Error from server: templates "logging-deployer-account-template" not found
Error from server: templates "logging-deployer-template" not found

3. Create missing templates according to doc https://github.com/openshift/origin-aggregated-logging/tree/master/deployer#create-missing-templates:

$ oc create -f https://raw.githubusercontent.com/openshift/origin-aggregated-logging/master/deployer/deployer.yaml
template "logging-deployer-account-template" created
template "logging-deployer-template" created

Modify deployer template to use the new name for 3.3.0 deployer:
$ oc edit template logging-deployer-template -o yaml
changed from
 image: ${IMAGE_PREFIX}logging-deployment:${IMAGE_VERSION}
into 
 image: ${IMAGE_PREFIX}logging-deployer:${IMAGE_VERSION}

4. Create SA and permissions according to doc https://github.com/openshift/origin-aggregated-logging/tree/master/deployer#create-supporting-serviceaccount-and-permissions :

$ oc new-app logging-deployer-account-template
--> Deploying template logging-deployer-account-template for "logging-deployer-account-template"
--> Creating resources ...
    error: serviceaccounts "logging-deployer" already exists
    error: serviceaccounts "aggregated-logging-kibana" already exists
    error: serviceaccounts "aggregated-logging-elasticsearch" already exists
    error: serviceaccounts "aggregated-logging-fluentd" already exists
    serviceaccount "aggregated-logging-curator" created
    clusterrole "oauth-editor" created
    clusterrole "daemonset-admin" created
    rolebinding "logging-deployer-edit-role" created
    rolebinding "logging-deployer-dsadmin-role" created

$ oc policy add-role-to-user edit --serviceaccount logging-deployer
$ oc policy add-role-to-user daemonset-admin --serviceaccount logging-deployer
$ oadm policy add-cluster-role-to-user oauth-editor system:serviceaccount:logging:logging-deployer

5. Run logging deployer with parameters MODE=upgrade image_prefix = brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ image_version = 3.3.0:

$oc process logging-deployer-template -v\
ENABLE_OPS_CLUSTER=true,\
IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,\
KIBANA_HOSTNAME=kibana.0822-is1.qe.rhcloud.com,\
KIBANA_OPS_HOSTNAME=kibana-ops.0822-is1.qe.rhcloud.com,\
PUBLIC_MASTER_URL=https://host-8-172-89.host.centralci.eng.rdu2.redhat.com:8443,\
ES_INSTANCE_RAM=1024M,\
ES_CLUSTER_SIZE=1,\
MODE=upgrade,\
IMAGE_VERSION=3.3.0,\
MASTER_URL=https://host-8-172-89.host.centralci.eng.rdu2.redhat.com:8443\
|oc create -f -

6. Check logging pods after upgrade:
# oc get po
NAME                              READY     STATUS             RESTARTS   AGE
logging-curator-1-9zs7s           1/1       Running            0          6m
logging-curator-ops-1-n6f8r       1/1       Running            0          6m
logging-deployer-tw2e2            0/1       Completed          0          8m
logging-es-be6nb8x3-3-0zh3g       1/1       Running            0          6m
logging-es-ops-ht5m08g3-3-vir1b   1/1       Running            0          6m
logging-fluentd-0grmx             1/1       Running            0          6m
logging-fluentd-krx4v             1/1       Running            0          6m
logging-kibana-2-eybgx            2/2       Running            0          5m
logging-kibana-ops-2-owgru        2/2       Running            0          5m

7. Visit Kibana and Kibana OPS UI

Actual results:
Encounter "503 Service Unavailable" while accessing the Kibana OPS UI. The non-OPS UI worked fine.

Expected results:
Kibana OPS UI should work fine post upgrade

Additional info: 
Screenshots attached
Upgrade pod log attached
Comment 1 Xia Zhao 2016-08-24 01:41 EDT
Created attachment 1193472 [details]
Upgrade_pod_log
Comment 2 Xia Zhao 2016-08-24 01:41 EDT
Created attachment 1193473 [details]
OPS UI kibana screenshot which running fine
Comment 3 Xia Zhao 2016-08-24 01:42 EDT
Created attachment 1193474 [details]
Non OPS Kiana UI sccreeshot which get bug repro
Comment 14 Luke Meyer 2016-08-29 21:16:01 EDT
It is a cert problem as Paul said; the problem seems to be that logging-kibana-proxy and logging-kibana-ops-proxy secrets get different server certs (which is right) signed by different signers, which should not happen as every cert in the deployment should have the same signer. So, the routes should have been right (having the same CA for both), but the server cert on the kibana-ops instance is wrong. I need to figure out if that's something new or we just never noticed before we had and upgrade creating a reencrypt route.
Comment 15 Xia Zhao 2016-08-29 21:25:57 EDT
(In reply to Luke Meyer from comment #14)
> It is a cert problem as Paul said; the problem seems to be that
> logging-kibana-proxy and logging-kibana-ops-proxy secrets get different
> server certs (which is right) signed by different signers, which should not
> happen as every cert in the deployment should have the same signer. So, the
> routes should have been right (having the same CA for both), but the server
> cert on the kibana-ops instance is wrong. I need to figure out if that's
> something new or we just never noticed before we had and upgrade creating a
> reencrypt route.

Thanks for the info Luke. I'll keep the test env in comment #12 until it's finished in using by you.
Comment 16 Luke Meyer 2016-08-30 12:49:38 EDT
The problem is that in OSE 3.2, kibana and kibana-ops pods were created with separate secrets (though they had the same contents) and in 3.3 they are both created to use the same secret, logging-kibana-proxy. The logging-kibana-ops-proxy secret from the 3.2 installation is left unaltered by the upgrade, as is the kibana-ops DC secret volume mount, while all the other secrets are regenerated with a new signer. The routes are replaced with reencrypt routes looking for the new signer, so the kibana-ops cert isn't trusted.

I need to fix the upgrade so that it deletes the old secret and patches the kibana-ops DC to look at the right one.
Comment 18 Xia Zhao 2016-08-31 08:56:40 EDT
Retested with the latest 3.3.0 logging images on brew, The kibana ops pod did not start up successfully after upgrade: 

$ oc get po
NAME                              READY     STATUS              RESTARTS   AGE
logging-curator-1-2j802           1/1       Running             0          2h
logging-curator-ops-1-i219t       1/1       Running             0          2h
logging-deployer-zczlp            0/1       Error               0          2h
logging-es-60qdpasn-3-8grsp       1/1       Running             0          2h
logging-es-ops-pvesokep-3-cz3e1   1/1       Running             0          2h
logging-fluentd-seosv             1/1       Running             0          2h
logging-kibana-2-8068e            2/2       Running             0          2h
logging-kibana-ops-2-dapa2        0/2       ContainerCreating   0          2h


And the upgrade deployer pod failed by this error:

+++ oc get pod logging-kibana-ops-2-dapa2 -o 'jsonpath={.status.phase}'
++ [[ Running == \P\e\n\d\i\n\g ]]
+ sleep 1
+ (( i++  ))
+ (( i<=300 ))
+ eval '[[ "Running" == "$(oc get pod logging-kibana-ops-2-dapa2 -o jsonpath='\''{.status.phase}'\'')" ]]'
+++ oc get pod logging-kibana-ops-2-dapa2 -o 'jsonpath={.status.phase}'
++ [[ Running == \P\e\n\d\i\n\g ]]
+ sleep 1
logging-kibana-ops-2-dapa2 not started within 300 seconds
+ (( i++  ))
+ (( i<=300 ))
+ return 1
+ echo 'logging-kibana-ops-2-dapa2 not started within 300 seconds'
+ return 1


I will retry and update later.
Comment 19 Luke Meyer 2016-08-31 10:07:03 EDT
I think I need to redeploy the kibana-ops DC after modifying it; it's probably looking for a secret that no longer exists. Built logging-deployer:3.3.0-9.
Comment 20 openshift-github-bot 2016-08-31 17:35:10 EDT
Commit pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/221beecd5920f3f76d3694623d46faa6f372366e
origin fix for bug 1369646

Make the upgrade set the correct secret volume on the logging-kibana-ops
DC; in earlier versions, it got a separate-but-equal secret, but in the
present versions both kibana DCs should use the same
logging-kibana-proxy secret.
Comment 21 chunchen 2016-09-01 05:52:29 EDT
It's fixed with below latest images:

brew-pulp-docker01...com:8888/openshift3/logging-deployer        3.3.0               de84ad1448af        11 hours ago        760.1 MB
brew-pulp-docker01...com:8888/openshift3/logging-kibana          3.3.0               ad2713df85a7        11 hours ago        266.9 MB
brew-pulp-docker01...com:8888/openshift3/logging-fluentd         3.3.0               74505c2dd791        12 hours ago        238.7 MB
brew-pulp-docker01...com:8888/openshift3/logging-elasticsearch   3.3.0               f204bea758eb        5 days ago          426 MB
brew-pulp-docker01...com:8888/openshift3/logging-auth-proxy      3.3.0               196ecb30fc93        3 weeks ago         229.2 MB
brew-pulp-docker01...com:8888/openshift3/logging-curator         3.3.0               2c88e1273c11
Comment 23 errata-xmlrpc 2016-09-27 05:45:39 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.