Description of problem: After running the cluster for about one day, got below errors when running `oc project openshift-apiserver`: "the server is currently unable to handle the request (get projects.project.openshift.io openshift-apiserver)", Check the logs of the apiserver pods, got below: E0123 03:04:58.782273 1 memcache.go:147] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0123 03:05:01.903420 1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid] ... Version-Release number of selected component (if applicable): Payload version: registry.svc.ci.openshift.org/openshift/origin-release:4.0.0-0.alpha-2019-01-22-015156 apiserver image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-23-022946@sha256:c89ba6c22943ba6793e067ac0287ddd7775020ce51e9ee6da71a738f9216ddfd How reproducible: often Steps to Reproduce: 1. Create the 4.0 cluster by using the payload. 2. After running about one day. 3. Switch the namesapces. For example, `oc project openshift-apiserver` Actual results: [core@ip-10-0-14-64 ~]$ oc project openshift-cluster-version Error from server (ServiceUnavailable): the server is currently unable to handle the request (get projects.project.openshift.io openshift-cluster-version) [core@ip-10-0-14-64 ~]$ oc whoami Error from server (ServiceUnavailable): the server is currently unable to handle the request (get users.user.openshift.io ~) Expected results: Can switch namesapces successfully. Additional info: Still get the errors after restarted the apiserver pods. [core@ip-10-0-14-64 ~]$ oc get pods -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-2rftt 1/1 Running 0 2h apiserver-gwrgb 1/1 Running 0 2h apiserver-zn8c8 1/1 Running 0 2h Check the logs of the apiserver pods: [core@ip-10-0-14-64 ~]$ oc logs ds/apiserver -n openshift-apiserver Found 3 pods, using pod/apiserver-zn8c8 ... E0123 03:04:58.782273 1 memcache.go:147] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0123 03:05:01.903420 1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid] E0123 03:05:01.903713 1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid] ... [core@ip-10-0-14-64 ~]$ oc project openshift-apiserver --loglevel=8 I0123 03:12:48.581686 23706 loader.go:359] Config loaded from file /home/core/.kube/config I0123 03:12:48.582367 23706 loader.go:359] Config loaded from file /home/core/.kube/config I0123 03:12:48.583185 23706 round_trippers.go:383] GET https://preserve-jian-api.qe.devcluster.openshift.com:6443/apis/project.openshift.io/v1/projects/openshift-apiserver I0123 03:12:48.583218 23706 round_trippers.go:390] Request Headers: I0123 03:12:48.583234 23706 round_trippers.go:393] User-Agent: oc/v1.11.0+406fc897d8 (linux/amd64) kubernetes/406fc89 I0123 03:12:48.583253 23706 round_trippers.go:393] Accept: application/json, */* I0123 03:12:48.606075 23706 round_trippers.go:408] Response Status: 503 Service Unavailable in 22 milliseconds I0123 03:12:48.606104 23706 round_trippers.go:411] Response Headers: I0123 03:12:48.606120 23706 round_trippers.go:414] Audit-Id: 1d8e3a6e-6a81-4f8d-b71d-a6193fc3b933 I0123 03:12:48.606138 23706 round_trippers.go:414] Cache-Control: no-store I0123 03:12:48.606152 23706 round_trippers.go:414] Content-Type: text/plain; charset=utf-8 I0123 03:12:48.606166 23706 round_trippers.go:414] X-Content-Type-Options: nosniff I0123 03:12:48.606179 23706 round_trippers.go:414] Content-Length: 20 I0123 03:12:48.606193 23706 round_trippers.go:414] Date: Wed, 23 Jan 2019 03:12:48 GMT I0123 03:12:48.606246 23706 request.go:897] Response Body: service unavailable I0123 03:12:48.606425 23706 helpers.go:201] server response object: [{ "metadata": {}, "status": "Failure", "message": "the server is currently unable to handle the request (get projects.project.openshift.io openshift-apiserver)", "reason": "ServiceUnavailable", "details": { "name": "openshift-apiserver", "group": "project.openshift.io", "kind": "projects", "causes": [ { "reason": "UnexpectedServerResponse", "message": "service unavailable" } ] }, "code": 503 }] F0123 03:12:48.606558 23706 helpers.go:119] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get projects.project.openshift.io openshift-apiserver)
This is a known problem in the beta that was fixed over the course of several pulls. To unstick yourself, try `oc -n openshift-apiserver delete pods --all`. Hopefully I found the right closure reason.
David, No, I don't think so. As I described in "Additional info:" section, the issue still exists after `oc -n openshift-apiserver delete pods --all`. Still get the errors after restarted the apiserver pods. [core@ip-10-0-14-64 ~]$ oc get pods -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-2rftt 1/1 Running 0 2h apiserver-gwrgb 1/1 Running 0 2h apiserver-zn8c8 1/1 Running 0 2h And, seems like the errors: "x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid" is different from we reported bug 1665842 for Beta-1. So reopen it. Please set the status to the "ON_QA" if you fixed it.
I don't see this in master
I didn't hit issue again, it works as expected by now, verify it. Details as below: [jzhang@dhcp-140-18 ocp-14]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-02-13-204401 True False 26h Cluster version is 4.0.0-0.nightly-2019-02-13-204401 [jzhang@dhcp-140-18 ocp-14]$ oc whoami system:admin [jzhang@dhcp-140-18 ocp-14]$ oc project openshift-apiserver Now using project "openshift-apiserver" on server "https://jian-14-api.qe.devcluster.openshift.com:6443"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758