Bug 1678929

Summary:

failed scraping metricis from {kube|openshift}-controller-manager-operator with "x509: certificate is valid for localhost, not metrics.*"

Product:

OpenShift Container Platform

Reporter:

Seth Jennings <sjenning>

Component:

Master

Assignee:

David Eads <deads>

Status:

CLOSED ERRATA

QA Contact:

Xingxing Xia <xxia>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.1.0

CC:

aos-bugs, deads, jokerman, juzhao, mfojtik, mmccomas

Target Milestone:

---

Target Release:

4.1.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-06-04 10:44:14 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
must-gather-kube-controller-manager-operator-20190219.tar.gz	none

Description Seth Jennings 2019-02-19 22:43:43 UTC

Created attachment 1536541 [details]
must-gather-kube-controller-manager-operator-20190219.tar.gz

openshift-controller-manager-operator/openshift-controller-manager-operator scrape fails with:
Get https://10.130.0.14:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-controller-manager-operator.svc

openshift-kube-controller-manager-operator/kube-controller-manager-operator scrape fails with:
Get https://10.129.0.7:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-kube-controller-manager-operator.svc


See attached must-gather.

Comment 1 David Eads 2019-02-20 00:32:37 UTC

Michal: looks like reactor may not be working properly which is interesting because CI is showing this working on every CI run.  https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/268/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws/1229/artifacts/e2e-aws/pods/ for example.

Seth: anything unusual about the kubelet configuration there?  Is there a way for us to check to see if the kubelet is properly providing the new files on disk?

Comment 2 David Eads 2019-02-20 20:56:25 UTC

got a candidate fix https://github.com/openshift/cluster-kube-controller-manager-operator/pull/169

Comment 3 Junqi Zhao 2019-02-25 00:54:41 UTC

*** Bug 1679922 has been marked as a duplicate of this bug. ***

Comment 4 Seth Jennings 2019-02-26 17:30:19 UTC

I am not seeing this any more on a cluster that has been up for 2.5hrs (post rotation)

Comment 6 Michal Fojtik 2019-03-26 10:34:53 UTC

There were numerous cert rotation fixes since this was tested and we also extended the rotation period.

Comment 7 Xingxing Xia 2019-04-01 10:22:10 UTC

Verified in 4.0.0-0.nightly-2019-03-28-030453 per comment 5:
$ prometheus_route=$(oc -n openshift-monitoring  get route | grep prometheus-k8s | awk '{print $2}')
$ curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://${prometheus_route}/targets | grep -i x509
Didn't see error given the env is 7+ hours.

Comment 9 errata-xmlrpc 2019-06-04 10:44:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758