Description of problem: apiserver pod of service catalog in CrashLoopBackOff status after upgrading from v3.7.23 to v3.9.0-0.19.0 Version-Release number of the following components: openshift-ansible-3.9.0-0.19.0.git.0.de168fd.el7.noarch ansible-2.4.1.0-1.el7.noarch $ ansible --version ansible 2.4.1.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14) How reproducible: Aleays Steps to Reproduce: 1. Upgrade OCP from v3.7.23 to v3.9.0-0.19.0 by $ ansible-playbook -i ~/ansible-inventory /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade.yml 2. $ oc get pods -n kube-service-catalog Actual results: 2. NAME READY STATUS RESTARTS AGE po/apiserver-q2j2p 0/1 CrashLoopBackOff 20 1h po/controller-manager-44rc2 1/1 Running 4 1h $ oc logs po/apiserver-q2j2p -n kube-service-catalog I0114 17:31:06.822182 1 feature_gate.go:156] feature gates: map[OriginatingIdentity:true] I0114 17:31:06.824630 1 run_server.go:59] Preparing to run API server I0114 17:31:06.912391 1 round_trippers.go:417] curl -k -v -XGET -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXNlcnZpY2UtY2F0YWxvZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJzZXJ2aWNlLWNhdGFsb2ctYXBpc2VydmVyLXRva2VuLXI3MjU3Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InNlcnZpY2UtY2F0YWxvZy1hcGlzZXJ2ZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI4NzhiMzNhOC1mOTQzLTExZTctODk5ZS1mYTE2M2U2NjliYzEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zZXJ2aWNlLWNhdGFsb2c6c2VydmljZS1jYXRhbG9nLWFwaXNlcnZlciJ9.nDTELoZQtCLRqbGWfO7Yoj0FesBqllGX6ae-Ckr1ehJ2ucGpOWXwtu2207z_o0ngmn0J2hFZPElmH_MqpkWLvk3awe7P0x0fXA-CFhKmGUXZOtpco-6YjO-zJseTxLJbkjWoYyInlf74yNTvHuOBq_I1DAk-cNaRNrtKj-swnor2qU47slGYKVjQY_X7ysjzUdAMzKj247SCJLntyQadZ6oiz-kHwCnmRVI1s4YmpCCq51EhXzOY5aP8zmbMHd4i03Y5gmzjDe-BsKIPQ_jjRkDH3VoiAVuag88COTwy-y_t0AHk9I8_JzYNlVY-vzL575k7YOkuOduVx5m-r6XHCg" -H "Accept: application/json, */*" -H "User-Agent: service-catalog/v3.7.23 (linux/amd64) kubernetes/8edc154" https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication I0114 17:31:09.918498 1 round_trippers.go:436] GET https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication in 3006 milliseconds I0114 17:31:09.918536 1 round_trippers.go:442] Response Headers: W0114 17:31:09.918582 1 authentication.go:231] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA' Error: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 172.30.0.1:443: getsockopt: no route to host Expected results: All pods are running
This looks like to be a problem with the aggregator not being enabled in 3.7 (now fixed in bug 1523298).
Changing QA contact to wmeng since he is the reporter.
Though I have no idea about the root cause, a pod recreataion can make the pod runnning. NAME READY STATUS RESTARTS AGE po/apiserver-6kw8d 0/1 CrashLoopBackOff 26 22h # oc delete po/apiserver-6kw8d -n kube-service-catalog pod "apiserver-6kw8d" deleted NAME READY STATUS RESTARTS AGE po/apiserver-6kw8d 0/1 Terminating 26 22h NAME READY STATUS RESTARTS AGE po/apiserver-bxvtd 1/1 Running 0 8s
Are you sure the pod stays running? I believe it'll crash the same way previously if the aggregator is not setup. I think all of this will be fixed with 3.7.24 or later.
@Weihua, I think the root cause should be not upgraded to correct images, do you need to update title of bug? Thx
I'm confused about the status of this bug. The original report said version 3.7.23 was used to install from, but a later comment suggested 3.7.26 was used. The original report also mentioned a missing configmap data, which would not have been fixed by restarting a catalog container. In order to verify that the cluster is in the correct state with the aggregator, the following should return a certificate: kubectl --namespace kube-system get configmap extension-apiserver-authentication -o jsonpath="{ $.data['requestheader-client-ca-file'] }" If that doesn't work, that's the problem. If it does work, then the scenario from the original report has changed.
Sorry for confusion. I absolutely agree with your point. Now all the pods are running after upgrade so the crash issue is fixed. Thanks. We usually try latest version to keep up with the times. 3.7.23 was latest version at the time bug was reported. and 3.7.26 was the latest version two weeks later.
@Cheng, for the image tag problem during upgrade, could you report another bug to track? Thanks.
weihua, Report in another bug https://bugzilla.redhat.com/show_bug.cgi?id=1540840
Fixed openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch