Bug 1678929

Summary: failed scraping metricis from {kube|openshift}-controller-manager-operator with "x509: certificate is valid for localhost, not metrics.*"
Product: OpenShift Container Platform Reporter: Seth Jennings <sjenning>
Component: MasterAssignee: David Eads <deads>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, deads, jokerman, juzhao, mfojtik, mmccomas
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
must-gather-kube-controller-manager-operator-20190219.tar.gz none

Description Seth Jennings 2019-02-19 22:43:43 UTC
Created attachment 1536541 [details]
must-gather-kube-controller-manager-operator-20190219.tar.gz

openshift-controller-manager-operator/openshift-controller-manager-operator scrape fails with:
Get https://10.130.0.14:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-controller-manager-operator.svc

openshift-kube-controller-manager-operator/kube-controller-manager-operator scrape fails with:
Get https://10.129.0.7:8443/metrics: x509: certificate is valid for localhost, not metrics.openshift-kube-controller-manager-operator.svc


See attached must-gather.

Comment 1 David Eads 2019-02-20 00:32:37 UTC
Michal: looks like reactor may not be working properly which is interesting because CI is showing this working on every CI run.  https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/268/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws/1229/artifacts/e2e-aws/pods/ for example.

Seth: anything unusual about the kubelet configuration there?  Is there a way for us to check to see if the kubelet is properly providing the new files on disk?

Comment 2 David Eads 2019-02-20 20:56:25 UTC
got a candidate fix https://github.com/openshift/cluster-kube-controller-manager-operator/pull/169

Comment 3 Junqi Zhao 2019-02-25 00:54:41 UTC
*** Bug 1679922 has been marked as a duplicate of this bug. ***

Comment 4 Seth Jennings 2019-02-26 17:30:19 UTC
I am not seeing this any more on a cluster that has been up for 2.5hrs (post rotation)

Comment 6 Michal Fojtik 2019-03-26 10:34:53 UTC
There were numerous cert rotation fixes since this was tested and we also extended the rotation period.

Comment 7 Xingxing Xia 2019-04-01 10:22:10 UTC
Verified in 4.0.0-0.nightly-2019-03-28-030453 per comment 5:
$ prometheus_route=$(oc -n openshift-monitoring  get route | grep prometheus-k8s | awk '{print $2}')
$ curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://${prometheus_route}/targets | grep -i x509
Didn't see error given the env is 7+ hours.

Comment 9 errata-xmlrpc 2019-06-04 10:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758