Bug 1678929 - failed scraping metricis from {kube|openshift}-controller-manager-operator with "x509: certificate is valid for localhost, not metrics.*"
Summary: failed scraping metricis from {kube|openshift}-controller-manager-operator wi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.0
Assignee: David Eads
QA Contact: Xingxing Xia
URL:
Whiteboard:
: 1679922 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-19 22:43 UTC by Seth Jennings
Modified: 2019-06-04 10:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:44:14 UTC
Target Upstream Version:


Attachments (Terms of Use)
must-gather-kube-controller-manager-operator-20190219.tar.gz (777.30 KB, application/gzip)
2019-02-19 22:43 UTC, Seth Jennings
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:44:20 UTC

Description Seth Jennings 2019-02-19 22:43:43 UTC
Created attachment 1536541 [details]
must-gather-kube-controller-manager-operator-20190219.tar.gz

openshift-controller-manager-operator/openshift-controller-manager-operator scrape fails with:
Get https://10.130.0.14:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-controller-manager-operator.svc

openshift-kube-controller-manager-operator/kube-controller-manager-operator scrape fails with:
Get https://10.129.0.7:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-kube-controller-manager-operator.svc


See attached must-gather.

Comment 1 David Eads 2019-02-20 00:32:37 UTC
Michal: looks like reactor may not be working properly which is interesting because CI is showing this working on every CI run.  https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/268/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws/1229/artifacts/e2e-aws/pods/ for example.

Seth: anything unusual about the kubelet configuration there?  Is there a way for us to check to see if the kubelet is properly providing the new files on disk?

Comment 2 David Eads 2019-02-20 20:56:25 UTC
got a candidate fix https://github.com/openshift/cluster-kube-controller-manager-operator/pull/169

Comment 3 Junqi Zhao 2019-02-25 00:54:41 UTC
*** Bug 1679922 has been marked as a duplicate of this bug. ***

Comment 4 Seth Jennings 2019-02-26 17:30:19 UTC
I am not seeing this any more on a cluster that has been up for 2.5hrs (post rotation)

Comment 6 Michal Fojtik 2019-03-26 10:34:53 UTC
There were numerous cert rotation fixes since this was tested and we also extended the rotation period.

Comment 7 Xingxing Xia 2019-04-01 10:22:10 UTC
Verified in 4.0.0-0.nightly-2019-03-28-030453 per comment 5:
$ prometheus_route=$(oc -n openshift-monitoring  get route | grep prometheus-k8s | awk '{print $2}')
$ curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://${prometheus_route}/targets | grep -i x509
Didn't see error given the env is 7+ hours.

Comment 9 errata-xmlrpc 2019-06-04 10:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.