1534275 – apiserver pod of service catalog in CrashLoopBackOff status after upgrading to v3.9

Bug 1534275 - apiserver pod of service catalog in CrashLoopBackOff status after upgrading to v3.9

Summary: apiserver pod of service catalog in CrashLoopBackOff status after upgrading t...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Service Broker
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.9.0
Assignee:	Jeff Peeler
QA Contact:	Weihua Meng
Docs Contact:
URL:
Whiteboard:
Depends On:	1523298
Blocks:	1534311
TreeView+	depends on / blocked

Reported:	2018-01-14 17:43 UTC by Weihua Meng
Modified:	2018-06-18 18:28 UTC (History)
CC List:	9 users (show)
Fixed In Version:	openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Clones:	1534311 (view as bug list)
Environment:
Last Closed:	2018-06-18 17:35:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Weihua Meng 2018-01-14 17:43:10 UTC

Description of problem:
apiserver pod of service catalog in CrashLoopBackOff status after upgrading from v3.7.23 to v3.9.0-0.19.0
Version-Release number of the following components:
openshift-ansible-3.9.0-0.19.0.git.0.de168fd.el7.noarch
ansible-2.4.1.0-1.el7.noarch
$ ansible --version
ansible 2.4.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)

How reproducible:
Aleays

Steps to Reproduce:
1. Upgrade OCP from v3.7.23 to v3.9.0-0.19.0 by 
$ ansible-playbook -i ~/ansible-inventory /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade.yml
2. $ oc get pods -n kube-service-catalog

Actual results:
2. 
NAME                          READY     STATUS             RESTARTS   AGE
po/apiserver-q2j2p            0/1       CrashLoopBackOff   20         1h
po/controller-manager-44rc2   1/1       Running            4          1h


$ oc logs po/apiserver-q2j2p -n kube-service-catalog
I0114 17:31:06.822182       1 feature_gate.go:156] feature gates: map[OriginatingIdentity:true]
I0114 17:31:06.824630       1 run_server.go:59] Preparing to run API server
I0114 17:31:06.912391       1 round_trippers.go:417] curl -k -v -XGET  -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXNlcnZpY2UtY2F0YWxvZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJzZXJ2aWNlLWNhdGFsb2ctYXBpc2VydmVyLXRva2VuLXI3MjU3Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InNlcnZpY2UtY2F0YWxvZy1hcGlzZXJ2ZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI4NzhiMzNhOC1mOTQzLTExZTctODk5ZS1mYTE2M2U2NjliYzEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zZXJ2aWNlLWNhdGFsb2c6c2VydmljZS1jYXRhbG9nLWFwaXNlcnZlciJ9.nDTELoZQtCLRqbGWfO7Yoj0FesBqllGX6ae-Ckr1ehJ2ucGpOWXwtu2207z_o0ngmn0J2hFZPElmH_MqpkWLvk3awe7P0x0fXA-CFhKmGUXZOtpco-6YjO-zJseTxLJbkjWoYyInlf74yNTvHuOBq_I1DAk-cNaRNrtKj-swnor2qU47slGYKVjQY_X7ysjzUdAMzKj247SCJLntyQadZ6oiz-kHwCnmRVI1s4YmpCCq51EhXzOY5aP8zmbMHd4i03Y5gmzjDe-BsKIPQ_jjRkDH3VoiAVuag88COTwy-y_t0AHk9I8_JzYNlVY-vzL575k7YOkuOduVx5m-r6XHCg" -H "Accept: application/json, */*" -H "User-Agent: service-catalog/v3.7.23 (linux/amd64) kubernetes/8edc154" https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication
I0114 17:31:09.918498       1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication  in 3006 milliseconds
I0114 17:31:09.918536       1 round_trippers.go:442] Response Headers:
W0114 17:31:09.918582       1 authentication.go:231] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
Error: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 172.30.0.1:443: getsockopt: no route to host

Expected results:
All pods are running

Comment 1 Jeff Peeler 2018-01-19 18:18:38 UTC

This looks like to be a problem with the aggregator not being enabled in 3.7 (now fixed in bug 1523298).

Comment 2 Zhang Cheng 2018-01-24 09:22:54 UTC

Changing QA contact to wmeng since he is the reporter.

Comment 3 Weihua Meng 2018-01-26 07:01:42 UTC

Though I have no idea about the root cause, a pod recreataion can make the pod runnning.

NAME                          READY     STATUS             RESTARTS   AGE
po/apiserver-6kw8d            0/1       CrashLoopBackOff   26         22h

# oc delete po/apiserver-6kw8d -n kube-service-catalog
pod "apiserver-6kw8d" deleted

NAME                          READY     STATUS        RESTARTS   AGE
po/apiserver-6kw8d            0/1       Terminating   26         22h

NAME                          READY     STATUS    RESTARTS   AGE
po/apiserver-bxvtd            1/1       Running   0          8s

Comment 4 Jeff Peeler 2018-01-26 16:39:56 UTC

Are you sure the pod stays running? I believe it'll crash the same way previously if the aggregator is not setup. I think all of this will be fixed with 3.7.24 or later.

Comment 7 Zhang Cheng 2018-01-31 10:05:25 UTC

@Weihua,

I think the root cause should be not upgraded to correct images, do you need to update title of bug? Thx

Comment 8 Jeff Peeler 2018-01-31 16:06:46 UTC

I'm confused about the status of this bug. The original report said version 3.7.23 was used to install from, but a later comment suggested 3.7.26 was used.

The original report also mentioned a missing configmap data, which would not have been fixed by restarting a catalog container. In order to verify that the cluster is in the correct state with the aggregator, the following should return a certificate:

kubectl --namespace kube-system get configmap extension-apiserver-authentication -o jsonpath="{ $.data['requestheader-client-ca-file'] }"

If that doesn't work, that's the problem. If it does work, then the scenario from the original report has changed.

Comment 9 Weihua Meng 2018-02-01 01:17:57 UTC

Sorry for confusion.
I absolutely agree with your point.
Now all the pods are running after upgrade so the crash issue is fixed.
Thanks.

We usually try latest version to keep up with the times.
3.7.23 was latest version at the time bug was reported.
and 3.7.26 was the latest version two weeks later.

Comment 10 Weihua Meng 2018-02-01 01:20:32 UTC

@Cheng, 
for the image tag problem during upgrade, could you report another bug to track?
Thanks.

Comment 11 Zhang Cheng 2018-02-01 06:25:08 UTC

weihua,

Report in another bug https://bugzilla.redhat.com/show_bug.cgi?id=1540840

Comment 12 Weihua Meng 2018-02-01 09:12:58 UTC

Fixed 
openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch

Note You need to log in before you can comment on or make changes to this bug.