Description of problem: The issue is found in https://issues.redhat.com/browse/OCPQE-551 . Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-05-23-224107 How reproducible: see 2 of 3 times reproduced. Steps to Reproduce: 1. Launch 4_5/upi-on-azure/versioned-installer-http_proxy-fips-ci env 2. After installation completed, check COs Actual results: 2. $ oc get co | grep -v "True.*False.*False"; oc get no monitoring 4.5.0-0.nightly-2020-05-23-224107 False True True 17m openshift-apiserver 4.5.0-0.nightly-2020-05-23-224107 False False False 25m Expected results: 2. All COs are normal Additional info:
Intended to attach must-gather stuff, but failed: $ oc adm must-gather --dest-dir ocpqe-551-oas-false [must-gather ] OUT the server was unable to return a response in the time allotted, but may still be processing the request (get imagestreams.image.openshift.io must-gather) [must-gather ] OUT [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest [must-gather ] OUT namespace/openshift-must-gather-5svv9 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-c2l6q created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-c2l6q deleted [must-gather ] OUT namespace/openshift-must-gather-5svv9 deleted Error from server (Forbidden): pods "must-gather-" is forbidden: error looking up service account openshift-must-gather-5svv9/default: serviceaccount "default" not found $ oc get ns openshift-must-gather-5svv9 NAME STATUS AGE openshift-must-gather-5svv9 Terminating 86s $ oc get serviceaccount -n openshift-must-gather-5svv9 NAME SECRETS AGE builder 0 2m27s deployer 0 2m27s $ oc create namespace xxia-test namespace/xxia-test created $ oc get sa -n xxia-test NAME SECRETS AGE builder 0 6s deployer 0 6s $ oc get po -n openshift-apiserver-operator NAME READY STATUS RESTARTS AGE openshift-apiserver-operator-74557659fc-f2bqj 1/1 Running 0 91m $ oc logs -n openshift-apiserver-operator openshift-apiserver-operator-74557659fc-f2bqj ... I0526 11:29:02.705944 1 status_controller.go:172] clusteroperator/openshift-apiserver diff {"status":{"conditions":[{"lastTransitionTime":"2020-05-26T10:01:34Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-05-26T09:28:44Z","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2020-05-26T11:25:55Z","message":"APIServicesAvailable: \"template.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)","reason":"APIServices_Error","status":"False","type":"Available"},{"lastTransitionTime":"2020-05-26T09:24:07Z","reason":"AsExpected","status":"True","type":"Upgradeable"}]}} E0526 11:29:02.706068 1 base_controller.go:180] "APIServiceController_openshift-apiserver" controller failed to sync "key", err: "template.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request) I0526 11:29:02.722147 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"7ea12c68-f2e2-4eeb-ba24-6c98a4673518", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available message changed from "APIServicesAvailable: \"authorization.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"image.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"oauth.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"quota.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"route.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)" to "APIServicesAvailable: \"template.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)" I0526 11:29:02.912739 1 request.go:621] Throttling request took 1.595125864s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-apiserver/pods?labelSelector=apiserver%3Dtrue I0526 11:29:02.973079 1 status_controller.go:172] clusteroperator/openshift-apiserver diff {"status":{"conditions":[{"lastTransitionTime":"2020-05-26T10:01:34Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-05-26T09:28:44Z","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2020-05-26T11:29:02Z","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2020-05-26T09:24:07Z","reason":"AsExpected","status":"True","type":"Upgradeable"}]}} I0526 11:29:02.986210 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"7ea12c68-f2e2-4eeb-ba24-6c98a4673518", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available changed from False to True ("") I0526 11:29:03.912950 1 request.go:621] Throttling request took 1.595346066s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver I0526 11:29:05.112723 1 request.go:621] Throttling request took 1.595419367s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-apiserver I0526 11:29:06.112747 1 request.go:621] Throttling request took 1.595551867s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver I0526 11:29:07.312732 1 request.go:621] Throttling request took 1.596010072s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver I0526 11:29:08.512894 1 request.go:621] Throttling request took 1.597202383s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-apiserver/configmaps/image-import-ca I0526 11:29:09.512929 1 request.go:621] Throttling request took 1.595750033s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver I0526 11:29:10.117343 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"7ea12c68-f2e2-4eeb-ba24-6c98a4673518", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'OpenShiftAPICheckFailed' "authorization.openshift.io.v1" failed with HTTP status code 503 (the server is currently unable to handle the request) I0526 11:29:10.712803 1 request.go:621] Throttling request took 1.396820581s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver I0526 11:29:11.712848 1 request.go:621] Throttling request took 1.394234356s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver I0526 11:29:12.712870 1 request.go:621] Throttling request took 1.394905463s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-image-registry/configmaps/image-registry-certificates ... $ oc cp /bin/oc -n openshift-kube-apiserver kube-apiserver-xxia-ocpqe-551-05260854-master-0:/tmp/oc Defaulting container name to kube-apiserver. $ oc rsh -n openshift-kube-apiserver kube-apiserver-xxia-ocpqe-551-05260854-master-0 sh-4.2# /tmp/oc get --insecure-skip-tls-verify --raw "/" --server https://10.128.0.9:8443 Error from server (Forbidden): forbidden: User "system:anonymous" cannot get path "/" # different from bug 1825219
Seeing openshift-apiserver failing to authorize requests by talking to the kube-apiserver. Also seeing kube-apiserver full of network timeout errors: E0526 12:29:24.410022 1 controller.go:114] loading OpenAPI spec for "v1.project.openshift.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error trying to reach service: 'read tcp 10.130.0.1:46868->10.129.0.9:8443: read: connection timed out', Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]] This is an UPI environment with proxy. Both are usually high suspicious to be configuration issues of some kind if networking is unstable.
To make progress here, moving to SDN team. Please investigate why we see all the timeouts on the service network.
Hi Xingxing, Can not reproduce the problem in aos-4_5/upi-on-azure/versioned-installer-http_proxy-fips-ci en using latest 4.5.0-0.nightly-2020-05-27-111700, close the bug now. You can reopen it if you see it again. [weliang@weliang ~]$ oc get clusterversions NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-05-27-111700 True False 22m Cluster version is 4.5.0-0.nightly-2020-05-27-111700 [weliang@weliang ~]$ oc get nodes NAME STATUS ROLES AGE VERSION weliang-273-05271317-master-0 Ready master 47m v1.18.2+a8e5c63 weliang-273-05271317-master-1 Ready master 47m v1.18.2+a8e5c63 weliang-273-05271317-master-2 Ready master 48m v1.18.2+a8e5c63 weliang-273-05271317-worker-centralus-1 Ready worker 30m v1.18.2+a8e5c63 weliang-273-05271317-worker-centralus-2 Ready worker 30m v1.18.2+a8e5c63 weliang-273-05271317-worker-centralus-3 Ready worker 31m v1.18.2+a8e5c63 [weliang@weliang ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-0.nightly-2020-05-27-111700 True False False 22m cloud-credential 4.5.0-0.nightly-2020-05-27-111700 True False False 52m cluster-autoscaler 4.5.0-0.nightly-2020-05-27-111700 True False False 40m config-operator 4.5.0-0.nightly-2020-05-27-111700 True False False 40m console 4.5.0-0.nightly-2020-05-27-111700 True False False 19m csi-snapshot-controller 4.5.0-0.nightly-2020-05-27-111700 True False False 12m dns 4.5.0-0.nightly-2020-05-27-111700 True False False 45m etcd 4.5.0-0.nightly-2020-05-27-111700 True False False 45m image-registry 4.5.0-0.nightly-2020-05-27-111700 True False False 12m ingress 4.5.0-0.nightly-2020-05-27-111700 True False False 29m insights 4.5.0-0.nightly-2020-05-27-111700 True False False 41m kube-apiserver 4.5.0-0.nightly-2020-05-27-111700 True False False 44m kube-controller-manager 4.5.0-0.nightly-2020-05-27-111700 True False False 44m kube-scheduler 4.5.0-0.nightly-2020-05-27-111700 True False False 44m kube-storage-version-migrator 4.5.0-0.nightly-2020-05-27-111700 True False False 7m25s machine-api 4.5.0-0.nightly-2020-05-27-111700 True False False 37m machine-approver 4.5.0-0.nightly-2020-05-27-111700 True False False 43m machine-config 4.5.0-0.nightly-2020-05-27-111700 True False False 39m marketplace 4.5.0-0.nightly-2020-05-27-111700 True False False 15m monitoring 4.5.0-0.nightly-2020-05-27-111700 True False False 28m network 4.5.0-0.nightly-2020-05-27-111700 True False False 46m node-tuning 4.5.0-0.nightly-2020-05-27-111700 True False False 46m openshift-apiserver 4.5.0-0.nightly-2020-05-27-111700 True False False 41m openshift-controller-manager 4.5.0-0.nightly-2020-05-27-111700 True False False 41m openshift-samples 4.5.0-0.nightly-2020-05-27-111700 True False False 40m operator-lifecycle-manager 4.5.0-0.nightly-2020-05-27-111700 True False False 45m operator-lifecycle-manager-catalog 4.5.0-0.nightly-2020-05-27-111700 True False False 45m operator-lifecycle-manager-packageserver 4.5.0-0.nightly-2020-05-27-111700 True False False 15m service-ca 4.5.0-0.nightly-2020-05-27-111700 True False False 46m storage 4.5.0-0.nightly-2020-05-27-111700 True False False 41m [weliang@weliang ~]$
You can not close it after only trying once. E.g. bug 1802481 not always reproduced but indeed is bug. As shown in comment 0 "How reproducible" and shown in https://issues.redhat.com/browse/OCPQE-551 , not definitely reproduce.
Rebuilt 3 jobs with latest 4.5.0-0.nightly-2020-05-28-023530. Didn't see the issue reproduced again. Will still monitor. If hit again will revisit.
Hello, I have found the similar issue in OCP Cluster 4.5. Can you please reopen the Bug for further investigation. ~~~ APIServicesAvailable: "template.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request) ~~~ Thanks, Avinash
message: 'APIServicesAvailable: "apps.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request) APIServicesAvailable: "template.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request) APIServicesAvailable: "user.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)'
Hit immediately when installation finishes? Or it is fine after installation finishes but hit after running some time (bug 1825219 series of clones)?
CU facing this issue after some time post installation finishes.
Bug 1825219#c51 has working workaround for the bug 1825219 issue.
Created attachment 1714906 [details] openshift-apiserver -o wide
*** This bug has been marked as a duplicate of bug 1825219 ***
ychoukse, https://bugzilla.redhat.com/show_bug.cgi?id=1935591
ychoukse, hard to know per the limited info. You can file bug to request Dev to triage. If you hit same as https://bugzilla.redhat.com/show_bug.cgi?id=1935591#c2 , open the bug in Networking component.