1678929 – failed scraping metricis from {kube|openshift}-controller-manager-operator with "x509: certificate is valid for localhost, not metrics.*"

Bug 1678929 - failed scraping metricis from {kube|openshift}-controller-manager-operator with "x509: certificate is valid for localhost, not metrics.*"

Summary: failed scraping metricis from {kube|openshift}-controller-manager-operator wi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.1.0
Assignee:	David Eads
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1679922 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-19 22:43 UTC by Seth Jennings
Modified:	2019-06-04 10:44 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:44:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
must-gather-kube-controller-manager-operator-20190219.tar.gz (777.30 KB, application/gzip) 2019-02-19 22:43 UTC, Seth Jennings	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:44:20 UTC

Description Seth Jennings 2019-02-19 22:43:43 UTC

Created attachment 1536541 [details]
must-gather-kube-controller-manager-operator-20190219.tar.gz

openshift-controller-manager-operator/openshift-controller-manager-operator scrape fails with:
Get https://10.130.0.14:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-controller-manager-operator.svc

openshift-kube-controller-manager-operator/kube-controller-manager-operator scrape fails with:
Get https://10.129.0.7:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-kube-controller-manager-operator.svc


See attached must-gather.

Comment 1 David Eads 2019-02-20 00:32:37 UTC

Michal: looks like reactor may not be working properly which is interesting because CI is showing this working on every CI run.  https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/268/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws/1229/artifacts/e2e-aws/pods/ for example.

Seth: anything unusual about the kubelet configuration there?  Is there a way for us to check to see if the kubelet is properly providing the new files on disk?

Comment 2 David Eads 2019-02-20 20:56:25 UTC

got a candidate fix https://github.com/openshift/cluster-kube-controller-manager-operator/pull/169

Comment 3 Junqi Zhao 2019-02-25 00:54:41 UTC

*** Bug 1679922 has been marked as a duplicate of this bug. ***

Comment 4 Seth Jennings 2019-02-26 17:30:19 UTC

I am not seeing this any more on a cluster that has been up for 2.5hrs (post rotation)

Comment 6 Michal Fojtik 2019-03-26 10:34:53 UTC

There were numerous cert rotation fixes since this was tested and we also extended the rotation period.

Comment 7 Xingxing Xia 2019-04-01 10:22:10 UTC

Verified in 4.0.0-0.nightly-2019-03-28-030453 per comment 5:
$ prometheus_route=$(oc -n openshift-monitoring  get route | grep prometheus-k8s | awk '{print $2}')
$ curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://${prometheus_route}/targets | grep -i x509
Didn't see error given the env is 7+ hours.

Comment 9 errata-xmlrpc 2019-06-04 10:44:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.