Description of problem: Often encountered envs that had "metrics.k8s.io/v1beta1: the server is currently unable to handle the request". Then debug it through `oc get apiservices -o=custom-columns="name:.metadata.name,namespace:.spec.service.namespace,status:.status.conditions[0].status"`. Found v1beta1.metrics.k8s.io status is False (others are good status True). Then check its backend pod log, found: "Unable to authenticate the request due to an error ... x509: certificate signed by unknown authority". BTW, this log was found in the env for https://bugzilla.redhat.com/show_bug.cgi?id=1625194#c9 and the env for https://bugzilla.redhat.com/show_bug.cgi?id=1667030 and my today env. Noticed https://bugzilla.redhat.com/show_bug.cgi?id=1665842#c25 , thus I tried `oc delete pod prometheus-adapter-... -n openshift-monitoring`, the problem then is gone. Several days passed after that fix, but the error is still found in metrics pod, thus opening this bug. Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-01-18-115403 True False 1d Cluster version is 4.0.0-0.nightly-2019-01-18-115403 How reproducible: Seems often, not sure the clear reproducer condition. Steps to Reproduce: 1. Create a nextgen env 2. Check `oc api-resources`, or check `oc logs ds/apiserver -n openshift-apiserver`. When this issue occurs, `oc api-resources` will show: "unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request". `oc logs ds/apiserver -n openshift-apiserver` will show: "E0123 06:08:29.395219 1 memcache.go:147] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request" 3. Then check backend pods of v1beta1.metrics.k8s.io oc get apiservices v1beta1.metrics.k8s.io -o yaml ... service: name: prometheus-adapter namespace: openshift-monitoring oc logs deployment/prometheus-adapter -n openshift-monitoring ... E0123 04:17:43.075225 1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority] E0123 04:17:59.170792 1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority] ... Actual results: 3. Metrics backend pod log shows error "Unable to authenticate the request due to an error", this causes other issues https://bugzilla.redhat.com/show_bug.cgi?id=1625194#c9 https://bugzilla.redhat.com/show_bug.cgi?id=1667030 Expected results: Apiservice/v1beta1.metrics.k8s.io and its backend pod are in good contidion without the error Additional info:
Service catalog have the same issue, details in bug 1668534 [core@ip-10-0-8-244 ~]$ oc get pods NAME READY STATUS RESTARTS AGE apiserver-849f76f4b6-n7dnr 2/2 Running 3 17h caddy-docker 1/1 Running 0 17h centos-pod 1/1 Running 0 17h controller-manager-64b8dd67d-59msf 0/1 CrashLoopBackOff 17 1h [core@ip-10-0-8-244 ~]$ oc logs apiserver-849f76f4b6-n7dnr -c apiserver ... E0125 02:59:46.196362 1 authentication.go:62] Unable to authenticate the request due to an error: x509: certificate signed by unknown authority I0125 02:59:46.196411 1 wrap.go:42] GET /: (79.437µs) 401 [Go-http-client/2.0 10.130.0.1:44610] I0125 02:59:50.689977 1 run_server.go:127] etcd checker called
(In reply to Xingxing Xia from comment #0) > Noticed https://bugzilla.redhat.com/show_bug.cgi?id=1665842#c25 , thus I > tried `oc delete pod prometheus-adapter-... -n openshift-monitoring`, the > problem then is gone. Still meet again in latest payload 4.0.0-0.nightly-2019-01-25-205123. Although this workaround can solve the problem, the issue itself is an problem serious enough. So adding beta2blocker
Verified in 4.0.0-0.nightly-2019-02-17-024922 which contains above PR per comment 0 steps
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758