Bug 1534311

Summary: [3.8]apiserver pod of service catalog in CrashLoopBackOff status after upgrading to v3.8
Product: OpenShift Container Platform Reporter: Weihua Meng <wmeng>
Component: Service BrokerAssignee: Jeff Peeler <jpeeler>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.8.0CC: aos-bugs, chezhang, dmoessne, jiazha, jmatthew, jokerman, mmccomas, pmorie, tsanders, wmeng
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1534275 Environment:
Last Closed: 2018-06-27 18:01:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1534275    
Bug Blocks:    

Description Weihua Meng 2018-01-15 01:34:44 UTC
upgrading from v3.7 to v3.8 also has the same issue.
+++ This bug was initially created as a clone of Bug #1534275 +++

Description of problem:
apiserver pod of service catalog in CrashLoopBackOff status after upgrading from v3.7.23 to v3.9.0-0.19.0
Version-Release number of the following components:
$ ansible --version
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)

How reproducible:

Steps to Reproduce:
1. Upgrade OCP from v3.7.23 to v3.9.0-0.19.0 by 
$ ansible-playbook -i ~/ansible-inventory /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade.yml
2. $ oc get pods -n kube-service-catalog

Actual results:
NAME                          READY     STATUS             RESTARTS   AGE
po/apiserver-q2j2p            0/1       CrashLoopBackOff   20         1h
po/controller-manager-44rc2   1/1       Running            4          1h

$ oc logs po/apiserver-q2j2p -n kube-service-catalog
I0114 17:31:06.822182       1 feature_gate.go:156] feature gates: map[OriginatingIdentity:true]
I0114 17:31:06.824630       1 run_server.go:59] Preparing to run API server
I0114 17:31:06.912391       1 round_trippers.go:417] curl -k -v -XGET  -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXNlcnZpY2UtY2F0YWxvZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJzZXJ2aWNlLWNhdGFsb2ctYXBpc2VydmVyLXRva2VuLXI3MjU3Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InNlcnZpY2UtY2F0YWxvZy1hcGlzZXJ2ZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI4NzhiMzNhOC1mOTQzLTExZTctODk5ZS1mYTE2M2U2NjliYzEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zZXJ2aWNlLWNhdGFsb2c6c2VydmljZS1jYXRhbG9nLWFwaXNlcnZlciJ9.nDTELoZQtCLRqbGWfO7Yoj0FesBqllGX6ae-Ckr1ehJ2ucGpOWXwtu2207z_o0ngmn0J2hFZPElmH_MqpkWLvk3awe7P0x0fXA-CFhKmGUXZOtpco-6YjO-zJseTxLJbkjWoYyInlf74yNTvHuOBq_I1DAk-cNaRNrtKj-swnor2qU47slGYKVjQY_X7ysjzUdAMzKj247SCJLntyQadZ6oiz-kHwCnmRVI1s4YmpCCq51EhXzOY5aP8zmbMHd4i03Y5gmzjDe-BsKIPQ_jjRkDH3VoiAVuag88COTwy-y_t0AHk9I8_JzYNlVY-vzL575k7YOkuOduVx5m-r6XHCg" -H "Accept: application/json, */*" -H "User-Agent: service-catalog/v3.7.23 (linux/amd64) kubernetes/8edc154"
I0114 17:31:09.918498       1 round_trippers.go:436] GET  in 3006 milliseconds
I0114 17:31:09.918536       1 round_trippers.go:442] Response Headers:
W0114 17:31:09.918582       1 authentication.go:231] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
Error: Get dial tcp getsockopt: no route to host

Expected results:
All pods are running

Comment 1 Jeff Peeler 2018-01-18 21:14:08 UTC
How was the 3.7 based cluster installed?

I was able (with a few workarounds unrelated to catalog) to get a 3.9 upgrade to have the service catalog pods functioning successfully at the end. From the error output, it looks like the api aggregator has not been configured properly which was only recently fixed for 3.7 upgrades.

Comment 4 Weihua Meng 2018-01-19 07:18:59 UTC
Thanks for the info, Jeff.
May I know the PR which has the fix you mentioned?

Comment 5 Weihua Meng 2018-01-19 07:23:33 UTC
OCP 3.7 cluster is installed by openshift-ansible.
ansible-playbook -i inventory openshift-ansible/playbooks/byo/config.yml

Comment 6 Jeff Peeler 2018-01-19 19:22:14 UTC
Ok if you did a fresh install, the aggregator should have been turned on. Just want to double confirm that 3.7 wasn't upgraded from a previous install.

For reference, doing "oc get configmap -n kube-system extension-apiserver-authentication -o jsonpath='{.data.requestheader-client-ca-file}'" should return data if the aggregator is enabled.

The upgrade fix was made in bug 1523298, but if you did a fresh install I don't know that it'll help.

Comment 7 Weihua Meng 2018-01-20 04:17:53 UTC
It was fresh install OCP 3.7 before upgrade, and service catalog is enabled by default and was working before upgrade.

Comment 8 Jeff Peeler 2018-02-01 15:10:33 UTC
Now that bug 1534275 is verified, I believe this bug should be too since the fix was a 3.7 problem.

Comment 12 Scott Dodson 2018-02-01 16:45:05 UTC

It's not proper to explicitly run the 3.8 upgrade playbooks when upgrading from 3.7 to 3.9. Customers will never install openshift-ansible-3.8 nor will they run 3.8 upgrade playbooks, we'll probably strip those out of the packaging just to make this clear.

The proper way to upgrade from 3.7 to 3.9 is to call the 3.9 upgrade playbooks with both 3.8 and 3.9 repos enabled on your hosts. This will automatically upgrade the control plane from 3.7 to 3.8 to 3.9. During this upgrade only the API and Controllers are updated to 3.8, all other components remain at 3.7 and then after the control plane has been updated they should be upgraded directly to 3.9.

Comment 14 Weihua Meng 2018-02-01 23:12:42 UTC
Thanks, Scott
So upgrade to 3.8 is not supposed to be called even for online devops, and we will not test it anymore.

Comment 16 Weihua Meng 2018-02-02 00:59:45 UTC
Fixed as the cause of the original bug, aggregator issue, was addressed.

Comment 18 errata-xmlrpc 2018-06-27 18:01:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.