Bug 1648458

Summary: Upgrading from 3.10 to 3.11, Service Catalog does not check for 3.11 rollout success
Product: OpenShift Container Platform Reporter: Jay Boyd <jaboyd>
Component: Service CatalogAssignee: Jay Boyd <jaboyd>
Status: CLOSED ERRATA QA Contact: Jian Zhang <jiazha>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: chezhang, zitang
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Service Catalog installation does not wait for rollout to complete. Consequence: Service Catalog rollout may cause the service to be unavailable. Something this would happen during the installation of Ansible Service Broker and cause overall failure. Fix: After deploying Service Catalog wait for the rollout to complete before continuing. Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-30 15:19:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jay Boyd 2018-11-09 18:51:32 UTC
I upgraded from 3.10 to 3.11.  Correlating logs and pod status by timetamp, I see that Service Catalog installation succeeded, but in the task "Verify that the catalog api server is running" within openshift-ansible/roles/openshift_service_catalog/tasks/start.yml  the check was actually done against the 3.10 pods.  That is, the tasks checks the /healthz endpoint, but when it did, the OLD 3.10 pods were still running.  We should ensure the DaemonSet rollout has completed prior to moving forward and checking for the health.


Perhaps adding
oc rollout status ds/apiserver -n kube-service-catalog

with an expected response of "daemon set "apiserver" successfully rolled out"

oc rollout status ds/controller-manager -n kube-service-catalog

with an expected response of "daemon set "controller-manager" successfully rolled out"


Also advise checking the endpoints to be certain at least one pod is available.

Comment 1 Jay Boyd 2018-11-14 14:18:11 UTC
pending fix in 3.11:  https://github.com/openshift/openshift-ansible/pull/10658

Comment 2 Zhang Cheng 2018-11-17 10:43:46 UTC
Set target release to 3.11.z

Comment 4 Jian Zhang 2019-01-17 12:56:44 UTC
LGTM, verify it. Details as below:

1, Install the OCP 3.10, as below:
[root@ip-172-18-3-150 ~]# oc version
oc v3.10.101
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-3-150.ec2.internal:8443
openshift v3.10.101
kubernetes v1.10.0+b81c8f8

[root@ip-172-18-3-150 ~]# oc get pods -n kube-service-catalog
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-64r5m            1/1       Running   0          16m
controller-manager-49zwq   1/1       Running   0          16m

2, Upgrade it to the OCP 3.11
[root@ip-172-18-3-150 ~]# oc version
oc v3.11.69
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-3-150.ec2.internal:8443
openshift v3.11.69
kubernetes v1.11.0+d4cacc0

[root@ip-172-18-3-150 ~]# oc get pods -n kube-service-catalog 
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-wvh8q            1/1       Running   0          1h
controller-manager-q74tf   1/1       Running   2          1h
[root@ip-172-18-3-150 ~]# oc get pods -n kube-service-catalog apiserver-wvh8q -o yaml |grep image
    image: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.11

correlating logs:
TASK [openshift_service_catalog : Wait for API Server rollout success] *********
task path: /usr/share/ansible/openshift-ansible/roles/openshift_service_catalog/tasks/start.yml:2
Thursday 17 January 2019  11:41:28 +0000 (0:00:00.155)       0:13:41.861 ****** 
ok: [ec2-54-81-218-203.compute-1.amazonaws.com] => {"attempts": 1, "changed": false, "cmd": ["oc", "rollout", "status", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog", "ds/apiserver"], "delta": "0:00:33.772827", "end": "2019-01-17 06:42:35.027440", "rc": 0, "start": "2019-01-17 06:42:01.254613", "stderr": "", "stderr_lines": [], "stdout": "Waiting for daemon set \"apiserver\" rollout to finish: 0 of 1 updated pods are available...\ndaemon set \"apiserver\" successfully rolled out", "stdout_lines": ["Waiting for daemon set \"apiserver\" rollout to finish: 0 of 1 updated pods are available...", "daemon set \"apiserver\" successfully rolled out"]}

TASK [openshift_service_catalog : Wait for Controller Manager rollout success] ***
task path: /usr/share/ansible/openshift-ansible/roles/openshift_service_catalog/tasks/start.yml:14
Thursday 17 January 2019  11:42:02 +0000 (0:00:34.394)       0:14:16.256 ****** 
ok: [ec2-54-81-218-203.compute-1.amazonaws.com] => {"attempts": 1, "changed": false, "cmd": ["oc", "rollout", "status", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog", "ds/controller-manager"], "delta": "0:00:50.687188", "end": "2019-01-17 06:43:26.095907", "rc": 0, "start": "2019-01-17 06:42:35.408719", "stderr": "", "stderr_lines": [], "stdout": "Waiting for daemon set \"controller-manager\" rollout to finish: 0 of 1 updated pods are available...\ndaemon set \"controller-manager\" successfully rolled out", "stdout_lines": ["Waiting for daemon set \"controller-manager\" rollout to finish: 0 of 1 updated pods are available...", "daemon set \"controller-manager\" successfully rolled out"]}

Comment 6 errata-xmlrpc 2019-01-30 15:19:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0096