Bug 1668632 - [Nextgen] "Unable to authenticate the request due to an error ... x509: certificate signed by unknown authority"
Summary: [Nextgen] "Unable to authenticate the request due to an error ... x509: certi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-23 08:38 UTC by Xingxing Xia
Modified: 2019-06-04 10:42 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:42:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:42:23 UTC

Internal Links: 1625194

Description Xingxing Xia 2019-01-23 08:38:42 UTC
Description of problem:
Often encountered envs that had "metrics.k8s.io/v1beta1: the server is currently unable to handle the request".
Then debug it through `oc get apiservices -o=custom-columns="name:.metadata.name,namespace:.spec.service.namespace,status:.status.conditions[0].status"`.
Found v1beta1.metrics.k8s.io status is False (others are good status True).
Then check its backend pod log, found:
"Unable to authenticate the request due to an error ... x509: certificate signed by unknown authority". BTW, this log was found in the env for https://bugzilla.redhat.com/show_bug.cgi?id=1625194#c9 and the env for https://bugzilla.redhat.com/show_bug.cgi?id=1667030 and my today env.

Noticed https://bugzilla.redhat.com/show_bug.cgi?id=1665842#c25 , thus I tried `oc delete pod prometheus-adapter-... -n openshift-monitoring`, the problem then is gone.
Several days passed after that fix, but the error is still found in metrics pod, thus opening this bug.

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-01-18-115403   True        False         1d        Cluster version is 4.0.0-0.nightly-2019-01-18-115403

How reproducible:
Seems often, not sure the clear reproducer condition.

Steps to Reproduce:
1. Create a nextgen env

2. Check `oc api-resources`,
or check `oc logs ds/apiserver -n openshift-apiserver`.
When this issue occurs, `oc api-resources` will show:
"unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request".

`oc logs ds/apiserver -n openshift-apiserver` will show:
"E0123 06:08:29.395219       1 memcache.go:147] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request"

3. Then check backend pods of v1beta1.metrics.k8s.io
oc get apiservices v1beta1.metrics.k8s.io -o yaml
...
  service:
    name: prometheus-adapter
    namespace: openshift-monitoring

oc logs deployment/prometheus-adapter -n openshift-monitoring
...
E0123 04:17:43.075225       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]
E0123 04:17:59.170792       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]
...

Actual results:
3. Metrics backend pod log shows error "Unable to authenticate the request due to an error", this causes other issues https://bugzilla.redhat.com/show_bug.cgi?id=1625194#c9 https://bugzilla.redhat.com/show_bug.cgi?id=1667030

Expected results:
Apiservice/v1beta1.metrics.k8s.io and its backend pod are in good contidion without the error

Additional info:

Comment 6 Jian Zhang 2019-01-25 03:18:01 UTC
Service catalog have the same issue, details in bug 1668534

[core@ip-10-0-8-244 ~]$ oc get pods
NAME                                 READY     STATUS             RESTARTS   AGE
apiserver-849f76f4b6-n7dnr           2/2       Running            3          17h
caddy-docker                         1/1       Running            0          17h
centos-pod                           1/1       Running            0          17h
controller-manager-64b8dd67d-59msf   0/1       CrashLoopBackOff   17         1h
[core@ip-10-0-8-244 ~]$ oc logs apiserver-849f76f4b6-n7dnr -c apiserver
...
E0125 02:59:46.196362       1 authentication.go:62] Unable to authenticate the request due to an error: x509: certificate signed by unknown authority
I0125 02:59:46.196411       1 wrap.go:42] GET /: (79.437µs) 401 [Go-http-client/2.0 10.130.0.1:44610]
I0125 02:59:50.689977       1 run_server.go:127] etcd checker called

Comment 9 Xingxing Xia 2019-01-29 02:52:58 UTC
(In reply to Xingxing Xia from comment #0)
> Noticed https://bugzilla.redhat.com/show_bug.cgi?id=1665842#c25 , thus I
> tried `oc delete pod prometheus-adapter-... -n openshift-monitoring`, the
> problem then is gone.
Still meet again in latest payload 4.0.0-0.nightly-2019-01-25-205123. Although this workaround can solve the problem, the issue itself is an problem serious enough. So adding beta2blocker

Comment 18 Xingxing Xia 2019-02-19 02:17:36 UTC
Verified in 4.0.0-0.nightly-2019-02-17-024922 which contains above PR per comment 0 steps

Comment 21 errata-xmlrpc 2019-06-04 10:42:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.