Bug 1534275 - apiserver pod of service catalog in CrashLoopBackOff status after upgrading to v3.9
Summary: apiserver pod of service catalog in CrashLoopBackOff status after upgrading t...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.9.0
Assignee: Jeff Peeler
QA Contact: Weihua Meng
Depends On: 1523298
Blocks: 1534311
TreeView+ depends on / blocked
Reported: 2018-01-14 17:43 UTC by Weihua Meng
Modified: 2018-06-18 18:28 UTC (History)
9 users (show)

Fixed In Version: openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1534311 (view as bug list)
Last Closed: 2018-06-18 17:35:28 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Weihua Meng 2018-01-14 17:43:10 UTC
Description of problem:
apiserver pod of service catalog in CrashLoopBackOff status after upgrading from v3.7.23 to v3.9.0-0.19.0
Version-Release number of the following components:
$ ansible --version
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)

How reproducible:

Steps to Reproduce:
1. Upgrade OCP from v3.7.23 to v3.9.0-0.19.0 by 
$ ansible-playbook -i ~/ansible-inventory /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade.yml
2. $ oc get pods -n kube-service-catalog

Actual results:
NAME                          READY     STATUS             RESTARTS   AGE
po/apiserver-q2j2p            0/1       CrashLoopBackOff   20         1h
po/controller-manager-44rc2   1/1       Running            4          1h

$ oc logs po/apiserver-q2j2p -n kube-service-catalog
I0114 17:31:06.822182       1 feature_gate.go:156] feature gates: map[OriginatingIdentity:true]
I0114 17:31:06.824630       1 run_server.go:59] Preparing to run API server
I0114 17:31:06.912391       1 round_trippers.go:417] curl -k -v -XGET  -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXNlcnZpY2UtY2F0YWxvZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJzZXJ2aWNlLWNhdGFsb2ctYXBpc2VydmVyLXRva2VuLXI3MjU3Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InNlcnZpY2UtY2F0YWxvZy1hcGlzZXJ2ZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI4NzhiMzNhOC1mOTQzLTExZTctODk5ZS1mYTE2M2U2NjliYzEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zZXJ2aWNlLWNhdGFsb2c6c2VydmljZS1jYXRhbG9nLWFwaXNlcnZlciJ9.nDTELoZQtCLRqbGWfO7Yoj0FesBqllGX6ae-Ckr1ehJ2ucGpOWXwtu2207z_o0ngmn0J2hFZPElmH_MqpkWLvk3awe7P0x0fXA-CFhKmGUXZOtpco-6YjO-zJseTxLJbkjWoYyInlf74yNTvHuOBq_I1DAk-cNaRNrtKj-swnor2qU47slGYKVjQY_X7ysjzUdAMzKj247SCJLntyQadZ6oiz-kHwCnmRVI1s4YmpCCq51EhXzOY5aP8zmbMHd4i03Y5gmzjDe-BsKIPQ_jjRkDH3VoiAVuag88COTwy-y_t0AHk9I8_JzYNlVY-vzL575k7YOkuOduVx5m-r6XHCg" -H "Accept: application/json, */*" -H "User-Agent: service-catalog/v3.7.23 (linux/amd64) kubernetes/8edc154"
I0114 17:31:09.918498       1 round_trippers.go:436] GET  in 3006 milliseconds
I0114 17:31:09.918536       1 round_trippers.go:442] Response Headers:
W0114 17:31:09.918582       1 authentication.go:231] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
Error: Get dial tcp getsockopt: no route to host

Expected results:
All pods are running

Comment 1 Jeff Peeler 2018-01-19 18:18:38 UTC
This looks like to be a problem with the aggregator not being enabled in 3.7 (now fixed in bug 1523298).

Comment 2 Zhang Cheng 2018-01-24 09:22:54 UTC
Changing QA contact to wmeng@redhat.com since he is the reporter.

Comment 3 Weihua Meng 2018-01-26 07:01:42 UTC
Though I have no idea about the root cause, a pod recreataion can make the pod runnning.

NAME                          READY     STATUS             RESTARTS   AGE
po/apiserver-6kw8d            0/1       CrashLoopBackOff   26         22h

# oc delete po/apiserver-6kw8d -n kube-service-catalog
pod "apiserver-6kw8d" deleted

NAME                          READY     STATUS        RESTARTS   AGE
po/apiserver-6kw8d            0/1       Terminating   26         22h

NAME                          READY     STATUS    RESTARTS   AGE
po/apiserver-bxvtd            1/1       Running   0          8s

Comment 4 Jeff Peeler 2018-01-26 16:39:56 UTC
Are you sure the pod stays running? I believe it'll crash the same way previously if the aggregator is not setup. I think all of this will be fixed with 3.7.24 or later.

Comment 7 Zhang Cheng 2018-01-31 10:05:25 UTC

I think the root cause should be not upgraded to correct images, do you need to update title of bug? Thx

Comment 8 Jeff Peeler 2018-01-31 16:06:46 UTC
I'm confused about the status of this bug. The original report said version 3.7.23 was used to install from, but a later comment suggested 3.7.26 was used.

The original report also mentioned a missing configmap data, which would not have been fixed by restarting a catalog container. In order to verify that the cluster is in the correct state with the aggregator, the following should return a certificate:

kubectl --namespace kube-system get configmap extension-apiserver-authentication -o jsonpath="{ $.data['requestheader-client-ca-file'] }"

If that doesn't work, that's the problem. If it does work, then the scenario from the original report has changed.

Comment 9 Weihua Meng 2018-02-01 01:17:57 UTC
Sorry for confusion.
I absolutely agree with your point.
Now all the pods are running after upgrade so the crash issue is fixed.

We usually try latest version to keep up with the times.
3.7.23 was latest version at the time bug was reported.
and 3.7.26 was the latest version two weeks later.

Comment 10 Weihua Meng 2018-02-01 01:20:32 UTC
for the image tag problem during upgrade, could you report another bug to track?

Comment 11 Zhang Cheng 2018-02-01 06:25:08 UTC

Report in another bug https://bugzilla.redhat.com/show_bug.cgi?id=1540840

Comment 12 Weihua Meng 2018-02-01 09:12:58 UTC

Note You need to log in before you can comment on or make changes to this bug.