1840112 – openshift-apiserver not available in fresh env

Bug 1840112 - openshift-apiserver not available in fresh env

Summary: openshift-apiserver not available in fresh env

Keywords:
Status:	CLOSED DUPLICATE of bug 1825219
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Antonio Ojea
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-26 11:40 UTC by Xingxing Xia
Modified:	2024-03-25 15:59 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-17 13:57:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
openshift-apiserver -o wide (529 bytes, text/plain) 2020-09-15 09:36 UTC, Avinash Bodhe	no flags	Details
View All

Description Xingxing Xia 2020-05-26 11:40:43 UTC

Description of problem:
The issue is found in https://issues.redhat.com/browse/OCPQE-551 .

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-05-23-224107

How reproducible:
see 2 of 3 times reproduced.

Steps to Reproduce:
1. Launch 4_5/upi-on-azure/versioned-installer-http_proxy-fips-ci env
2. After installation completed, check COs

Actual results:
2. $ oc get co | grep -v "True.*False.*False"; oc get no
monitoring                                 4.5.0-0.nightly-2020-05-23-224107   False       True          True       17m
openshift-apiserver                        4.5.0-0.nightly-2020-05-23-224107   False       False         False      25m

Expected results:
2. All COs are normal

Additional info:

Comment 2 Xingxing Xia 2020-05-26 11:49:18 UTC

Intended to attach must-gather stuff, but failed:
$ oc adm must-gather --dest-dir ocpqe-551-oas-false
[must-gather      ] OUT the server was unable to return a response in the time allotted, but may still be processing the request (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest
[must-gather      ] OUT namespace/openshift-must-gather-5svv9 created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-c2l6q created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-c2l6q deleted
[must-gather      ] OUT namespace/openshift-must-gather-5svv9 deleted
Error from server (Forbidden): pods "must-gather-" is forbidden: error looking up service account openshift-must-gather-5svv9/default: serviceaccount "default" not found
$ oc get ns openshift-must-gather-5svv9
NAME                          STATUS        AGE
openshift-must-gather-5svv9   Terminating   86s
$ oc get serviceaccount -n openshift-must-gather-5svv9
NAME       SECRETS   AGE
builder    0         2m27s
deployer   0         2m27s
$ oc create namespace xxia-test
namespace/xxia-test created
$ oc get sa -n xxia-test
NAME       SECRETS   AGE
builder    0         6s
deployer   0         6s
$ oc get po -n openshift-apiserver-operator
NAME                                            READY   STATUS    RESTARTS   AGE
openshift-apiserver-operator-74557659fc-f2bqj   1/1     Running   0          91m
$ oc logs -n openshift-apiserver-operator openshift-apiserver-operator-74557659fc-f2bqj
...
I0526 11:29:02.705944       1 status_controller.go:172] clusteroperator/openshift-apiserver diff {"status":{"conditions":[{"lastTransitionTime":"2020-05-26T10:01:34Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-05-26T09:28:44Z","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2020-05-26T11:25:55Z","message":"APIServicesAvailable: \"template.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)","reason":"APIServices_Error","status":"False","type":"Available"},{"lastTransitionTime":"2020-05-26T09:24:07Z","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
E0526 11:29:02.706068       1 base_controller.go:180] "APIServiceController_openshift-apiserver" controller failed to sync "key", err: "template.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
I0526 11:29:02.722147       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"7ea12c68-f2e2-4eeb-ba24-6c98a4673518", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available message changed from "APIServicesAvailable: \"authorization.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"image.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"oauth.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"quota.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)\nAPIServicesAvailable: \"route.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)" to "APIServicesAvailable: \"template.openshift.io.v1\" is not ready: 503 (the server is currently unable to handle the request)"
I0526 11:29:02.912739       1 request.go:621] Throttling request took 1.595125864s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-apiserver/pods?labelSelector=apiserver%3Dtrue
I0526 11:29:02.973079       1 status_controller.go:172] clusteroperator/openshift-apiserver diff {"status":{"conditions":[{"lastTransitionTime":"2020-05-26T10:01:34Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-05-26T09:28:44Z","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2020-05-26T11:29:02Z","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2020-05-26T09:24:07Z","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0526 11:29:02.986210       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"7ea12c68-f2e2-4eeb-ba24-6c98a4673518", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-apiserver changed: Available changed from False to True ("")
I0526 11:29:03.912950       1 request.go:621] Throttling request took 1.595346066s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver
I0526 11:29:05.112723       1 request.go:621] Throttling request took 1.595419367s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-apiserver
I0526 11:29:06.112747       1 request.go:621] Throttling request took 1.595551867s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver
I0526 11:29:07.312732       1 request.go:621] Throttling request took 1.596010072s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver
I0526 11:29:08.512894       1 request.go:621] Throttling request took 1.597202383s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-apiserver/configmaps/image-import-ca
I0526 11:29:09.512929       1 request.go:621] Throttling request took 1.595750033s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver
I0526 11:29:10.117343       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-apiserver-operator", Name:"openshift-apiserver-operator", UID:"7ea12c68-f2e2-4eeb-ba24-6c98a4673518", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'OpenShiftAPICheckFailed' "authorization.openshift.io.v1" failed with HTTP status code 503 (the server is currently unable to handle the request)
I0526 11:29:10.712803       1 request.go:621] Throttling request took 1.396820581s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver
I0526 11:29:11.712848       1 request.go:621] Throttling request took 1.394234356s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?labelSelector=encryption.apiserver.operator.openshift.io%2Fcomponent%3Dopenshift-apiserver
I0526 11:29:12.712870       1 request.go:621] Throttling request took 1.394905463s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-image-registry/configmaps/image-registry-certificates
...

$ oc cp /bin/oc -n openshift-kube-apiserver kube-apiserver-xxia-ocpqe-551-05260854-master-0:/tmp/oc
Defaulting container name to kube-apiserver.
$ oc rsh -n openshift-kube-apiserver kube-apiserver-xxia-ocpqe-551-05260854-master-0
sh-4.2# /tmp/oc get --insecure-skip-tls-verify --raw "/" --server https://10.128.0.9:8443
Error from server (Forbidden): forbidden: User "system:anonymous" cannot get path "/" # different from bug 1825219

Comment 5 Stefan Schimanski 2020-05-26 12:54:44 UTC

Seeing openshift-apiserver failing to authorize requests by talking to the kube-apiserver.

Also seeing kube-apiserver full of network timeout errors:

  E0526 12:29:24.410022       1 controller.go:114] loading OpenAPI spec for "v1.project.openshift.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error trying to reach service: 'read tcp 10.130.0.1:46868->10.129.0.9:8443: read: connection timed out', Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]


This is an UPI environment with proxy. Both are usually high suspicious to be configuration issues of some kind if networking is unstable.

Comment 6 Stefan Schimanski 2020-05-27 10:06:38 UTC

To make progress here, moving to SDN team. Please investigate why we see all the timeouts on the service network.

Comment 7 Weibin Liang 2020-05-27 14:39:05 UTC

Hi Xingxing,

Can not reproduce the problem in aos-4_5/upi-on-azure/versioned-installer-http_proxy-fips-ci en using latest 4.5.0-0.nightly-2020-05-27-111700, close the bug now. You can reopen it if you see it again.


[weliang@weliang ~]$ oc get clusterversions
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-05-27-111700   True        False         22m     Cluster version is 4.5.0-0.nightly-2020-05-27-111700
[weliang@weliang ~]$ oc get nodes
NAME                                      STATUS   ROLES    AGE   VERSION
weliang-273-05271317-master-0             Ready    master   47m   v1.18.2+a8e5c63
weliang-273-05271317-master-1             Ready    master   47m   v1.18.2+a8e5c63
weliang-273-05271317-master-2             Ready    master   48m   v1.18.2+a8e5c63
weliang-273-05271317-worker-centralus-1   Ready    worker   30m   v1.18.2+a8e5c63
weliang-273-05271317-worker-centralus-2   Ready    worker   30m   v1.18.2+a8e5c63
weliang-273-05271317-worker-centralus-3   Ready    worker   31m   v1.18.2+a8e5c63
[weliang@weliang ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.5.0-0.nightly-2020-05-27-111700   True        False         False      22m
cloud-credential                           4.5.0-0.nightly-2020-05-27-111700   True        False         False      52m
cluster-autoscaler                         4.5.0-0.nightly-2020-05-27-111700   True        False         False      40m
config-operator                            4.5.0-0.nightly-2020-05-27-111700   True        False         False      40m
console                                    4.5.0-0.nightly-2020-05-27-111700   True        False         False      19m
csi-snapshot-controller                    4.5.0-0.nightly-2020-05-27-111700   True        False         False      12m
dns                                        4.5.0-0.nightly-2020-05-27-111700   True        False         False      45m
etcd                                       4.5.0-0.nightly-2020-05-27-111700   True        False         False      45m
image-registry                             4.5.0-0.nightly-2020-05-27-111700   True        False         False      12m
ingress                                    4.5.0-0.nightly-2020-05-27-111700   True        False         False      29m
insights                                   4.5.0-0.nightly-2020-05-27-111700   True        False         False      41m
kube-apiserver                             4.5.0-0.nightly-2020-05-27-111700   True        False         False      44m
kube-controller-manager                    4.5.0-0.nightly-2020-05-27-111700   True        False         False      44m
kube-scheduler                             4.5.0-0.nightly-2020-05-27-111700   True        False         False      44m
kube-storage-version-migrator              4.5.0-0.nightly-2020-05-27-111700   True        False         False      7m25s
machine-api                                4.5.0-0.nightly-2020-05-27-111700   True        False         False      37m
machine-approver                           4.5.0-0.nightly-2020-05-27-111700   True        False         False      43m
machine-config                             4.5.0-0.nightly-2020-05-27-111700   True        False         False      39m
marketplace                                4.5.0-0.nightly-2020-05-27-111700   True        False         False      15m
monitoring                                 4.5.0-0.nightly-2020-05-27-111700   True        False         False      28m
network                                    4.5.0-0.nightly-2020-05-27-111700   True        False         False      46m
node-tuning                                4.5.0-0.nightly-2020-05-27-111700   True        False         False      46m
openshift-apiserver                        4.5.0-0.nightly-2020-05-27-111700   True        False         False      41m
openshift-controller-manager               4.5.0-0.nightly-2020-05-27-111700   True        False         False      41m
openshift-samples                          4.5.0-0.nightly-2020-05-27-111700   True        False         False      40m
operator-lifecycle-manager                 4.5.0-0.nightly-2020-05-27-111700   True        False         False      45m
operator-lifecycle-manager-catalog         4.5.0-0.nightly-2020-05-27-111700   True        False         False      45m
operator-lifecycle-manager-packageserver   4.5.0-0.nightly-2020-05-27-111700   True        False         False      15m
service-ca                                 4.5.0-0.nightly-2020-05-27-111700   True        False         False      46m
storage                                    4.5.0-0.nightly-2020-05-27-111700   True        False         False      41m
[weliang@weliang ~]$

Comment 8 Xingxing Xia 2020-05-28 02:50:15 UTC

You can not close it after only trying once. E.g. bug 1802481 not always reproduced but indeed is bug. As shown in comment 0 "How reproducible" and shown in https://issues.redhat.com/browse/OCPQE-551 , not definitely reproduce.

Comment 9 Xingxing Xia 2020-05-28 05:38:26 UTC

Rebuilt 3 jobs with latest 4.5.0-0.nightly-2020-05-28-023530. Didn't see the issue reproduced again. Will still monitor. If hit again will revisit.

Comment 10 Avinash Bodhe 2020-08-28 09:55:08 UTC

Hello,

I have found the similar issue in OCP Cluster 4.5. Can you please reopen the Bug for further investigation.

~~~
APIServicesAvailable: "template.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)
~~~

Thanks, 
Avinash

Comment 11 Avinash Bodhe 2020-08-28 10:02:54 UTC

message: 'APIServicesAvailable: "apps.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)

      APIServicesAvailable: "template.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)

      APIServicesAvailable: "user.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request)'

Comment 12 Xingxing Xia 2020-08-28 10:06:03 UTC

Hit immediately when installation finishes? Or it is fine after installation finishes but hit after running some time (bug 1825219 series of clones)?

Comment 13 Avinash Bodhe 2020-08-28 10:10:52 UTC

CU facing this issue after some time post installation finishes.

Comment 14 Xingxing Xia 2020-08-28 10:52:55 UTC

Bug 1825219#c51 has working workaround for the bug 1825219 issue.

Comment 50 Avinash Bodhe 2020-09-15 09:36:21 UTC

Created attachment 1714906 [details]
openshift-apiserver -o wide

Comment 52 mcambria@redhat.com 2020-09-17 13:57:30 UTC


*** This bug has been marked as a duplicate of bug 1825219 ***

Comment 54 Xingxing Xia 2021-03-11 10:01:10 UTC

ychoukse, https://bugzilla.redhat.com/show_bug.cgi?id=1935591

Comment 56 Xingxing Xia 2021-03-11 10:41:55 UTC

ychoukse, hard to know per the limited info. You can file bug to request Dev to triage. If you hit same as https://bugzilla.redhat.com/show_bug.cgi?id=1935591#c2 , open the bug in Networking component.

Note You need to log in before you can comment on or make changes to this bug.