Bug 1467423
| Summary: | A query to the Hawkular Metrics pod returns 'Status Code:500 Kubernetes client request failure' | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | emahoney | ||||
| Component: | Hawkular | Assignee: | Matt Wringe <mwringe> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.5.1 | CC: | aos-bugs, emahoney, erich, erjones, jkaur, juzhao, misalunk, mwringe, myllynen, rromerom, snegrea, wsun | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.5.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | 3.5.0 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-10-25 13:02:19 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
emahoney
2017-07-03 19:37:25 UTC
Can you please edit your Hawkular Metrics RC and see that that the 'KUBERNETES_MASTER_URL' value is? There should be something in there that looks like this under the command section for the pod: -DKUBERNETES_MASTER_URL=https://kubernetes.default.svc.cluster.local If you do not see that in the RC, then you may need to add the version that seems to be working for you (eg kubernetes.default.svc) Can you also check how many certificates are listed in the ca.crt for the hawkular metrics pod? (eg oc exec -it $HAWKULAR_METRICS_POD_NAME cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt) Hey Matt, The customer has confirmed that they previously modified the -DKUBERNETES_MASTER_URL vaule from https://kubernetes.default.svc.cluster.local to https://kubernetes.default.svc because the pod will not start without that change. They also provided the output of `oc exec -it $HAWKULAR_METRICS_POD_NAME` and then `cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt` and it appears to only be one certificate. Can you please provide the output of 'oc get pods -n openshift-infra -o yaml'? I would like to double check what the value is of the KUBERNETES_MASTER_URL value. From https://bugzilla.redhat.com/show_bug.cgi?id=1467423#c0 it sounded like they set the MASTER_PUBLIC_URL value and not necessarily the KUBERNETES_MASTER_URL value. It looks like this has already been fixed when we moved over to ansible for OCP 3.5 and greater. @Matt, I changed -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from default value "https://kubernetes.default.svc.cluster.local" to "https://kubernetes.default.svc", and hawkular-metrics pod could be started up, metrics diagram also could be shown on web console. tried "curl https://kubernetes.default.svc.cluster.local", it still return result, but "curl https://kubernetes.default.svc" returned curl: (6) Could not resolve host: kubernetes.default.svc; Name or service not known It seems this is not expected, could you help to confirm? Created attachment 1297491 [details]
hawkular metrics rc, pod info
(In reply to Junqi Zhao from comment #10) > @Matt, > I changed -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from > default value "https://kubernetes.default.svc.cluster.local" to > "https://kubernetes.default.svc", and hawkular-metrics pod could be started > up, > metrics diagram also could be shown on web console. > > tried "curl https://kubernetes.default.svc.cluster.local", it still return > result, but "curl https://kubernetes.default.svc" returned > curl: (6) Could not resolve host: kubernetes.default.svc; Name or service > not known > > > It seems this is not expected, could you help to confirm? Are you curling that from within the pod or on master? I think you need to do that within the pod itself. yes, curled within pod, it returned results sh-4.2$ curl -k https://kubernetes.default.svc { "paths": [ "/api", "/api/v1", "/apis", "/apis/apps", "/apis/apps/v1beta1", "/apis/authentication.k8s.io", "/apis/authentication.k8s.io/v1beta1", "/apis/autoscaling", "/apis/autoscaling/v1", "/apis/batch", "/apis/batch/v1", "/apis/batch/v2alpha1", "/apis/certificates.k8s.io", "/apis/certificates.k8s.io/v1alpha1", "/apis/extensions", "/apis/extensions/v1beta1", "/apis/policy", "/apis/policy/v1beta1", "/apis/storage.k8s.io", "/apis/storage.k8s.io/v1beta1", "/controllers", "/healthz", "/healthz/ping", "/healthz/poststarthook/bootstrap-controller", "/healthz/poststarthook/extensions/third-party-resources", "/healthz/ready", "/metrics", "/oapi", "/oapi/v1", "/osapi", "/swaggerapi/", "/version", "/version/openshift" ] } Will set this defect to VERIFIED, thanks Verify steps: 1. Scale down rc hawkular-metrics and change -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from default value "https://kubernetes.default.svc.cluster.local" to "https://kubernetes.default.svc" 2. Scale up rc hawkular-metrics, wait for hawkular-metrics pod starts up 3. oc rsh ${hawkular-metrics-pod}, run command: curl -k https://kubernetes.default.svc the result should be similar to Comment 13 4. Make sure metrics diagram also could be shown on web console. Testing env: # openshift version openshift v3.5.5.31.4 kubernetes v1.5.2+43a9be4 etcd 3.1.0 images from ops registry metrics-hawkular-metrics v3.5 bba7b194fec5 7 days ago 1.27 GB metrics-heapster v3.5 4e29df6bda85 2 weeks ago 318.5 MB metrics-cassandra v3.5 15a64aac8593 2 weeks ago 540.5 MB Anyone wanting to avoid this issue during installation should do: # sed -i -e 's,kubernetes.default.svc.cluster.local,kubernetes.default.svc,' /usr/share/ansible/openshift-ansible/roles/openshift_metrics/defaults/main.yaml Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049 |