1648458 – Upgrading from 3.10 to 3.11, Service Catalog does not check for 3.11 rollout success

Bug 1648458 - Upgrading from 3.10 to 3.11, Service Catalog does not check for 3.11 rollout success

Summary: Upgrading from 3.10 to 3.11, Service Catalog does not check for 3.11 rollout ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Service Catalog
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Jay Boyd
QA Contact:	Jian Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-09 18:51 UTC by Jay Boyd
Modified:	2019-01-30 15:19 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Service Catalog installation does not wait for rollout to complete. Consequence: Service Catalog rollout may cause the service to be unavailable. Something this would happen during the installation of Ansible Service Broker and cause overall failure. Fix: After deploying Service Catalog wait for the rollout to complete before continuing. Result:
Clone Of:
Environment:
Last Closed:	2019-01-30 15:19:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0096	0	None	None	None	2019-01-30 15:19:36 UTC

Description Jay Boyd 2018-11-09 18:51:32 UTC

I upgraded from 3.10 to 3.11.  Correlating logs and pod status by timetamp, I see that Service Catalog installation succeeded, but in the task "Verify that the catalog api server is running" within openshift-ansible/roles/openshift_service_catalog/tasks/start.yml  the check was actually done against the 3.10 pods.  That is, the tasks checks the /healthz endpoint, but when it did, the OLD 3.10 pods were still running.  We should ensure the DaemonSet rollout has completed prior to moving forward and checking for the health.


Perhaps adding
oc rollout status ds/apiserver -n kube-service-catalog

with an expected response of "daemon set "apiserver" successfully rolled out"

oc rollout status ds/controller-manager -n kube-service-catalog

with an expected response of "daemon set "controller-manager" successfully rolled out"


Also advise checking the endpoints to be certain at least one pod is available.

Comment 1 Jay Boyd 2018-11-14 14:18:11 UTC

pending fix in 3.11:  https://github.com/openshift/openshift-ansible/pull/10658

Comment 2 Zhang Cheng 2018-11-17 10:43:46 UTC

Set target release to 3.11.z

Comment 4 Jian Zhang 2019-01-17 12:56:44 UTC

LGTM, verify it. Details as below:

1, Install the OCP 3.10, as below:
[root@ip-172-18-3-150 ~]# oc version
oc v3.10.101
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-3-150.ec2.internal:8443
openshift v3.10.101
kubernetes v1.10.0+b81c8f8

[root@ip-172-18-3-150 ~]# oc get pods -n kube-service-catalog
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-64r5m            1/1       Running   0          16m
controller-manager-49zwq   1/1       Running   0          16m

2, Upgrade it to the OCP 3.11
[root@ip-172-18-3-150 ~]# oc version
oc v3.11.69
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-3-150.ec2.internal:8443
openshift v3.11.69
kubernetes v1.11.0+d4cacc0

[root@ip-172-18-3-150 ~]# oc get pods -n kube-service-catalog 
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-wvh8q            1/1       Running   0          1h
controller-manager-q74tf   1/1       Running   2          1h
[root@ip-172-18-3-150 ~]# oc get pods -n kube-service-catalog apiserver-wvh8q -o yaml |grep image
    image: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.11

correlating logs:
TASK [openshift_service_catalog : Wait for API Server rollout success] *********
task path: /usr/share/ansible/openshift-ansible/roles/openshift_service_catalog/tasks/start.yml:2
Thursday 17 January 2019  11:41:28 +0000 (0:00:00.155)       0:13:41.861 ****** 
ok: [ec2-54-81-218-203.compute-1.amazonaws.com] => {"attempts": 1, "changed": false, "cmd": ["oc", "rollout", "status", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog", "ds/apiserver"], "delta": "0:00:33.772827", "end": "2019-01-17 06:42:35.027440", "rc": 0, "start": "2019-01-17 06:42:01.254613", "stderr": "", "stderr_lines": [], "stdout": "Waiting for daemon set \"apiserver\" rollout to finish: 0 of 1 updated pods are available...\ndaemon set \"apiserver\" successfully rolled out", "stdout_lines": ["Waiting for daemon set \"apiserver\" rollout to finish: 0 of 1 updated pods are available...", "daemon set \"apiserver\" successfully rolled out"]}

TASK [openshift_service_catalog : Wait for Controller Manager rollout success] ***
task path: /usr/share/ansible/openshift-ansible/roles/openshift_service_catalog/tasks/start.yml:14
Thursday 17 January 2019  11:42:02 +0000 (0:00:34.394)       0:14:16.256 ****** 
ok: [ec2-54-81-218-203.compute-1.amazonaws.com] => {"attempts": 1, "changed": false, "cmd": ["oc", "rollout", "status", "--config=/etc/origin/master/admin.kubeconfig", "-n", "kube-service-catalog", "ds/controller-manager"], "delta": "0:00:50.687188", "end": "2019-01-17 06:43:26.095907", "rc": 0, "start": "2019-01-17 06:42:35.408719", "stderr": "", "stderr_lines": [], "stdout": "Waiting for daemon set \"controller-manager\" rollout to finish: 0 of 1 updated pods are available...\ndaemon set \"controller-manager\" successfully rolled out", "stdout_lines": ["Waiting for daemon set \"controller-manager\" rollout to finish: 0 of 1 updated pods are available...", "daemon set \"controller-manager\" successfully rolled out"]}

Comment 6 errata-xmlrpc 2019-01-30 15:19:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0096

Note You need to log in before you can comment on or make changes to this bug.