Bug 1680594

Summary: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Product: OpenShift Container Platform Reporter: Mario Vázquez <mavazque>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: mloibl, sponnaga, surbania
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mario Vázquez 2019-02-25 11:43:40 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Mario Vázquez 2019-02-25 11:49:46 UTC
Sorry, missclicked enter.

Description of problem:

When trying to query all resources using ketall[1] tool, I hit the error in $SUBJECT.

$ oc get apiservices v1beta1.metrics.k8s.io -o yaml

Using the command above I identified the service which takes care of that service, prometheus-adapter in this case.

Looking at the prometheus-adapter pod logs I see these:

E0225 11:46:10.479962       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]
E0225 11:46:14.639511       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]
E0225 11:46:14.639553       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]
E0225 11:46:14.639511       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]
E0225 11:46:14.639514       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]
E0225 11:46:14.639520       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]

The cluster was installed using UHC.


[1] https://github.com/corneliusweig/ketall
Version-Release number of selected component (if applicable):


How reproducible:
Deploy cluster using UHC
Run ketall[1] tool. e.g: ketall --only-scope=namespace --namespace=openshift-monitoring


Steps to Reproduce:
1. Deploy cluster using UHC
2. Download ketall[1]
3. Run Ketall: ketall --only-scope=namespace --namespace=openshift-monitoring

Actual results:
unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Expected results:
Object list is returned.

Additional info:

Comment 2 Frederic Branczyk 2019-02-25 12:26:23 UTC
Which version of the payload is this? https://bugzilla.redhat.com/show_bug.cgi?id=1670994 was verified to be fixed by QE in latest payload versions. I'm suspecting this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1670994#c22.

Comment 3 Mario Vázquez 2019-02-25 14:33:57 UTC
Name:         version
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterVersion
Metadata:
  Creation Timestamp:  2019-02-21T17:47:42Z
  Generation:          1
  Resource Version:    1257055
  Self Link:           /apis/config.openshift.io/v1/clusterversions/version
  UID:                 c9879eda-3600-11e9-ba3e-0ed526041df0
Spec:
  Channel:     fast
  Cluster ID:  9897d1b5-c5c8-49b1-91f5-4c0bdbaf49c2
  Upstream:    http://localhost:8080/graph
Status:
  Available Updates:  <nil>
  Conditions:
    Last Transition Time:  2019-02-21T17:58:02Z
    Message:               Done applying 4.0.0-0.2
    Status:                True
    Type:                  Available
    Last Transition Time:  2019-02-22T16:04:32Z
    Status:                False
    Type:                  Failing
    Last Transition Time:  2019-02-21T17:58:02Z
    Message:               Cluster version is 4.0.0-0.2
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-02-21T17:47:58Z
    Message:               Unable to retrieve available updates: Get http://localhost:8080/graph: dial tcp [::1]:8080: connect: connection refused
    Reason:                RemoteFailed
    Status:                False
    Type:                  RetrievedUpdates
  Desired:
    Image:     quay.io/openshift-release-dev/ocp-release@sha256:8580a118ce951dd241e4a4b73a0e5f4cda3b56088b6c1ab56ccadbf8e270fb1d
    Version:   4.0.0-0.2
  Generation:  1
  History:
    Completion Time:  2019-02-21T17:58:02Z
    Image:            quay.io/openshift-release-dev/ocp-release@sha256:8580a118ce951dd241e4a4b73a0e5f4cda3b56088b6c1ab56ccadbf8e270fb1d
    Started Time:     2019-02-21T17:47:58Z
    State:            Completed
    Version:          4.0.0-0.2
  Version Hash:       Q52DgDhafr0=
Events:               <none>


Deleting the prometheus-adapter pod fixes the issue temporarily, after a while, the certificate errors start flooding the pod's log again.

Comment 4 Frederic Branczyk 2019-02-25 14:44:56 UTC
Which version of OpenShift 4 did you install here, because this looks identical to https://bugzilla.redhat.com/show_bug.cgi?id=1670994, which merged only a few days before the timestamps I'm seeing in the above object. Could you test with the latest payload please? That's what QE did and verified today that indeed everything works.

Comment 10 Mario Vázquez 2019-03-07 07:55:24 UTC
Hi team, 

We've tested with latest version deployed by UHC, and we confirm this is fixed.

Thanks.

Comment 13 errata-xmlrpc 2019-06-04 10:44:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758