Hide Forgot
Description of problem: When deploying to OSE 3.2, Hawkular Metrics Deployer fails with the following error: Unable to connect to the server: x509: certificate signed by unknown authority Version-Release number of selected component (if applicable): OSE 3.2 Hawkular Metrics Images: IMAGE_PREFIX="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/" IMAGE_VERSION="3.2.0" How reproducible: Steps to Reproduce: 1. Attempt to deploy above Hawkular Metrics images to OSE 3.2 2. 3. Actual results: Expected results: Additional info:
Do you know where exactly this issue is coming from? Is this an error trying to connect to the Kubernetes master or to one of its nodes, or an error trying to connect to Hawkular Metrics itself? What deployment options were used? If I were to guess, I would say that either your Kubernetes master or nodes are not signed with the system CA (eg /var/run/secrets/kubernetes.io/serviceaccount/ca.crt). The metrics components use the CA available from within the container to verify that it should trust connections to OpenShift components and services running in the cluster. If you master or nodes are not signed with this CA, the metrics components will not trust connecting to them.
Deployed metrics on OSE 3.2 with this image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-deployer:3.2.0, successfully deployed it and get all metrics pods running
This is more of a setup problem with their cluster than anything else. The metric tests are still expected to pass. For some reason the CA certificate given to containers is not the same as the CA certificate used to sign the OpenShift components (eg master). We should be able to take the service account in a container (/var/run/secrets/kubernetes.io/serviceaccount/token) along with system ca certificate (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt) to connect to and validate the connection to OpenShift components (eg master endpoint, kubelet endpoint, etc). But with this setup it is not the case. We need to figure out why exactly this is happening. If this is because of an error in the docs on how to set this up, or a problem with one of our install tools, then we need to fix it. Otherwise metrics will fail with these types of installs. I am just waiting to figure out how exactly this cluster was installed so that we can assign to the proper team. This is not a metrics issue.
This looks to be an issue with a customized script which is used to configure and setup the cluster. I am lowering the priority until we get more feedback to see if this caused by anything in our docs or install tools.