1556739 – Fail to upgrade ocp with service catalog deployed using an external etcd

Bug 1556739 - Fail to upgrade ocp with service catalog deployed using an external etcd

Summary: Fail to upgrade ocp with service catalog deployed using an external etcd

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Michael Gugino
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-15 07:38 UTC by liujia
Modified:	2021-01-18 05:24 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-17 06:42:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:1566	0	None	None	None	2018-05-17 06:43:12 UTC

Description liujia 2018-03-15 07:38:23 UTC

Description of problem:
Fail to upgrade ocp with external etcd when deployed service catalog at task [openshift_service_catalog : wait for api server to be ready].
fatal: [x.x.x.x]: FAILED! => {"attempts": 1, "changed": false, "connection": "close", "content": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed\n", "content_length": "180", "content_type": "text/plain; charset=utf-8", "date": "Thu, 15 Mar 2018 06:22:25 GMT", "msg": "Status code was not [200]: HTTP Error 500: Internal Server Error", "redirected": false, "status": 500, "url": "https://apiserver.kube-service-catalog.svc/healthz", "x_content_type_options": "nosniff"}

# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed

# oc describe pod apiserver-flplx -n kube-service-catalog | grep etcd-servers -A 1
      --etcd-servers
      https://qe-jliu-t2-master-1:2379

Here should be etcd host name but not master host name. 

Version-Release number of the following components:
ansible-2.4.3.0-1.el7ae.noarch
openshift-ansible-3.9.9-1.git.0.1a1f7d8.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Container install ocp v3.7 with external etcd(dedicated etcd not on master hosts)
2. Upgrade ocp v3.7 to v3.9
3. 

Actual results:
Upgrade failed.

Expected results:
upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Russell Teague 2018-03-16 12:28:42 UTC

The proposed fix is tracked in 1557036.

*** This bug has been marked as a duplicate of bug 1557036 ***

Comment 2 liujia 2018-03-16 12:36:39 UTC

I don't think it should be duplicated. Maybe the root cause was the same. But in this scenario, service catalog was deployed before upgrade and upgrade will fail and can not continue. 
For bug 1557036, upgrade was not blocked and fail to deploy service_catalog on v3.9.

Comment 3 Russell Teague 2018-03-16 12:53:33 UTC

Mike, can you evaluate if your proposed fix in 1557036 would apply to this scenario as well?

Comment 4 Michael Gugino 2018-03-16 13:03:07 UTC

My update was not designed to fix this as I didn't believe this was broken.  However, I do believe my fix will also apply to this scenario.

Comment 5 Michael Gugino 2018-04-02 20:09:03 UTC

PR https://github.com/openshift/openshift-ansible/pull/7542 merged, which probably also addresses this bug.

Comment 7 liujia 2018-04-18 09:24:58 UTC

blocked by bz1566238

Comment 9 liujia 2018-04-20 08:59:38 UTC

Verified on openshift-ansible-3.9.24-1.git.0.d0289ea.el7.noarch

Comment 12 errata-xmlrpc 2018-05-17 06:42:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1566

Note You need to log in before you can comment on or make changes to this bug.