Description of problem: During a new deploy of an environment behind a proxy, a /query to the hawkular endpoint returns the error 'Status Code:500 Kubernetes client request failure'. During the deploy, we ran into the bug 1466783 and performed the workaround to get the pods to scale. Also, when using the default value for MASTER_PUBLIC_URL: https://kubernetes.default.svc.cluster.local Hawkular metrics pod fails with error: # oc logs hawkular-metrics-3ch7c 2017-07-03 15:55:25 Starting Hawkular Metrics Error: the service account for Hawkular Metrics does not have permission to view resources in this namespace. View permissions are required for Hawkular Metrics to function properly. Usually this can be resolved by running: oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra Although the service account has the mentioned role: RoleBinding[hawkular-view]: Role: view Users: <none> Groups: <none> ServiceAccounts: hawkular Subjects: <none> However, setting MASTER_PUBLIC_URL: https://kubernetes.default.svc Allows the hawkular-metrics pod to start successfully but in the Openshift console it is not possible to see the metrics, the following error is shown for /query requests Request URL:https://hawkular-metrics.<TLDN>/hawkular/metrics/metrics/stats/query Request Method:POST Status Code:500 Kubernetes client request failure Remote Address:10.XX.48.61:443 Version-Release number of selected component (if applicable): 3.5.5 How reproducible: Have not reproduced this yet. Steps to Reproduce: 1. 2. 3. Actual results: Metrics endpoint returns '500 Kubernetes client request failure' Expected results: Query to the metrics endpoint returns metrics data. Additional info:
Can you please edit your Hawkular Metrics RC and see that that the 'KUBERNETES_MASTER_URL' value is? There should be something in there that looks like this under the command section for the pod: -DKUBERNETES_MASTER_URL=https://kubernetes.default.svc.cluster.local If you do not see that in the RC, then you may need to add the version that seems to be working for you (eg kubernetes.default.svc) Can you also check how many certificates are listed in the ca.crt for the hawkular metrics pod? (eg oc exec -it $HAWKULAR_METRICS_POD_NAME cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)
Hey Matt, The customer has confirmed that they previously modified the -DKUBERNETES_MASTER_URL vaule from https://kubernetes.default.svc.cluster.local to https://kubernetes.default.svc because the pod will not start without that change. They also provided the output of `oc exec -it $HAWKULAR_METRICS_POD_NAME` and then `cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt` and it appears to only be one certificate.
Can you please provide the output of 'oc get pods -n openshift-infra -o yaml'? I would like to double check what the value is of the KUBERNETES_MASTER_URL value. From https://bugzilla.redhat.com/show_bug.cgi?id=1467423#c0 it sounded like they set the MASTER_PUBLIC_URL value and not necessarily the KUBERNETES_MASTER_URL value.
It looks like this has already been fixed when we moved over to ansible for OCP 3.5 and greater.
@Matt, I changed -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from default value "https://kubernetes.default.svc.cluster.local" to "https://kubernetes.default.svc", and hawkular-metrics pod could be started up, metrics diagram also could be shown on web console. tried "curl https://kubernetes.default.svc.cluster.local", it still return result, but "curl https://kubernetes.default.svc" returned curl: (6) Could not resolve host: kubernetes.default.svc; Name or service not known It seems this is not expected, could you help to confirm?
Created attachment 1297491 [details] hawkular metrics rc, pod info
(In reply to Junqi Zhao from comment #10) > @Matt, > I changed -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from > default value "https://kubernetes.default.svc.cluster.local" to > "https://kubernetes.default.svc", and hawkular-metrics pod could be started > up, > metrics diagram also could be shown on web console. > > tried "curl https://kubernetes.default.svc.cluster.local", it still return > result, but "curl https://kubernetes.default.svc" returned > curl: (6) Could not resolve host: kubernetes.default.svc; Name or service > not known > > > It seems this is not expected, could you help to confirm? Are you curling that from within the pod or on master? I think you need to do that within the pod itself.
yes, curled within pod, it returned results sh-4.2$ curl -k https://kubernetes.default.svc { "paths": [ "/api", "/api/v1", "/apis", "/apis/apps", "/apis/apps/v1beta1", "/apis/authentication.k8s.io", "/apis/authentication.k8s.io/v1beta1", "/apis/autoscaling", "/apis/autoscaling/v1", "/apis/batch", "/apis/batch/v1", "/apis/batch/v2alpha1", "/apis/certificates.k8s.io", "/apis/certificates.k8s.io/v1alpha1", "/apis/extensions", "/apis/extensions/v1beta1", "/apis/policy", "/apis/policy/v1beta1", "/apis/storage.k8s.io", "/apis/storage.k8s.io/v1beta1", "/controllers", "/healthz", "/healthz/ping", "/healthz/poststarthook/bootstrap-controller", "/healthz/poststarthook/extensions/third-party-resources", "/healthz/ready", "/metrics", "/oapi", "/oapi/v1", "/osapi", "/swaggerapi/", "/version", "/version/openshift" ] } Will set this defect to VERIFIED, thanks
Verify steps: 1. Scale down rc hawkular-metrics and change -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from default value "https://kubernetes.default.svc.cluster.local" to "https://kubernetes.default.svc" 2. Scale up rc hawkular-metrics, wait for hawkular-metrics pod starts up 3. oc rsh ${hawkular-metrics-pod}, run command: curl -k https://kubernetes.default.svc the result should be similar to Comment 13 4. Make sure metrics diagram also could be shown on web console. Testing env: # openshift version openshift v3.5.5.31.4 kubernetes v1.5.2+43a9be4 etcd 3.1.0 images from ops registry metrics-hawkular-metrics v3.5 bba7b194fec5 7 days ago 1.27 GB metrics-heapster v3.5 4e29df6bda85 2 weeks ago 318.5 MB metrics-cassandra v3.5 15a64aac8593 2 weeks ago 540.5 MB
Anyone wanting to avoid this issue during installation should do: # sed -i -e 's,kubernetes.default.svc.cluster.local,kubernetes.default.svc,' /usr/share/ansible/openshift-ansible/roles/openshift_metrics/defaults/main.yaml
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049