This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1467423 - A query to the Hawkular Metrics pod returns 'Status Code:500 Kubernetes client request failure'
A query to the Hawkular Metrics pod returns 'Status Code:500 Kubernetes clien...
Status: VERIFIED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Metrics (Show other bugs)
3.5.1
Unspecified Unspecified
unspecified Severity high
: ---
: 3.5.z
Assigned To: Matt Wringe
Junqi Zhao
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-03 15:37 EDT by emahoney
Modified: 2017-09-08 09:02 EDT (History)
12 users (show)

See Also:
Fixed In Version: 3.5.0
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
hawkular metrics rc, pod info (17.99 KB, text/plain)
2017-07-13 06:04 EDT, Junqi Zhao
no flags Details

  None (edit)
Description emahoney 2017-07-03 15:37:25 EDT
Description of problem: During a new deploy of an environment behind a proxy, a /query to the hawkular endpoint returns the error 'Status Code:500 Kubernetes client request failure'. During the deploy, we ran into the bug 1466783 and performed the workaround to get the pods to scale. 

Also, when using the default value for MASTER_PUBLIC_URL: https://kubernetes.default.svc.cluster.local

Hawkular metrics pod fails with error:

# oc logs hawkular-metrics-3ch7c
2017-07-03 15:55:25 Starting Hawkular Metrics
Error: the service account for Hawkular Metrics does not have permission to view resources in this namespace. View permissions are required for Hawkular Metrics to function properly.
Usually this can be resolved by running: oc adm policy add-role-to-user view system:serviceaccount:openshift-infra:hawkular -n openshift-infra

Although the service account has the mentioned role:
RoleBinding[hawkular-view]:
                                        Role:                   view
                                        Users:                  <none>
                                        Groups:                 <none>
                                        ServiceAccounts:        hawkular
                                        Subjects:               <none>

However, setting MASTER_PUBLIC_URL: https://kubernetes.default.svc
Allows the hawkular-metrics pod to start successfully but in the Openshift console it is not possible to see the metrics, the following error is shown for /query requests
Request URL:https://hawkular-metrics.<TLDN>/hawkular/metrics/metrics/stats/query
Request Method:POST
Status Code:500 Kubernetes client request failure
Remote Address:10.XX.48.61:443

Version-Release number of selected component (if applicable):
3.5.5

How reproducible:
Have not reproduced this yet. 

Steps to Reproduce:
1.
2.
3.

Actual results:
Metrics endpoint returns '500 Kubernetes client request failure'

Expected results:
Query to the metrics endpoint returns metrics data. 

Additional info:
Comment 4 Matt Wringe 2017-07-06 16:04:31 EDT
Can you please edit your Hawkular Metrics RC and see that that the 'KUBERNETES_MASTER_URL' value is?

There should be something in there that looks like this under the command section for the pod:

-DKUBERNETES_MASTER_URL=https://kubernetes.default.svc.cluster.local

If you do not see that in the RC, then you may need to add the version that seems to be working for you (eg kubernetes.default.svc)

Can you also check how many certificates are listed in the ca.crt for the hawkular metrics pod? (eg oc exec -it $HAWKULAR_METRICS_POD_NAME cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)
Comment 5 Eric Jones 2017-07-10 11:51:32 EDT
Hey Matt,

The customer has confirmed that they previously modified the -DKUBERNETES_MASTER_URL vaule from https://kubernetes.default.svc.cluster.local to https://kubernetes.default.svc because the pod will not start without that change.

They also provided the output of `oc exec -it $HAWKULAR_METRICS_POD_NAME` and then `cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt` and it appears to only be one certificate.
Comment 6 Matt Wringe 2017-07-10 14:40:32 EDT
Can you please provide the output of 'oc get pods -n openshift-infra -o yaml'? 

I would like to double check what the value is of the KUBERNETES_MASTER_URL value. From https://bugzilla.redhat.com/show_bug.cgi?id=1467423#c0 it sounded like they set the MASTER_PUBLIC_URL value and not necessarily the KUBERNETES_MASTER_URL value.
Comment 9 Matt Wringe 2017-07-11 09:35:58 EDT
It looks like this has already been fixed when we moved over to ansible for OCP 3.5 and greater.
Comment 10 Junqi Zhao 2017-07-13 06:04:17 EDT
@Matt,
I changed -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from default value "https://kubernetes.default.svc.cluster.local" to "https://kubernetes.default.svc", and hawkular-metrics pod could be started up,
metrics diagram also could be shown on web console.

tried "curl https://kubernetes.default.svc.cluster.local", it still return result, but "curl https://kubernetes.default.svc" returned
curl: (6) Could not resolve host: kubernetes.default.svc; Name or service not known


It seems this is not expected, could you help to confirm?
Comment 11 Junqi Zhao 2017-07-13 06:04 EDT
Created attachment 1297491 [details]
hawkular metrics rc, pod info
Comment 12 Matt Wringe 2017-07-13 08:52:16 EDT
(In reply to Junqi Zhao from comment #10)
> @Matt,
> I changed -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from
> default value "https://kubernetes.default.svc.cluster.local" to
> "https://kubernetes.default.svc", and hawkular-metrics pod could be started
> up,
> metrics diagram also could be shown on web console.
> 
> tried "curl https://kubernetes.default.svc.cluster.local", it still return
> result, but "curl https://kubernetes.default.svc" returned
> curl: (6) Could not resolve host: kubernetes.default.svc; Name or service
> not known
> 
> 
> It seems this is not expected, could you help to confirm?

Are you curling that from within the pod or on master? I think you need to do that within the pod itself.
Comment 13 Junqi Zhao 2017-07-14 02:16:13 EDT
yes, curled within pod, it returned results
sh-4.2$ curl -k https://kubernetes.default.svc              
{
  "paths": [
    "/api",
    "/api/v1",
    "/apis",
    "/apis/apps",
    "/apis/apps/v1beta1",
    "/apis/authentication.k8s.io",
    "/apis/authentication.k8s.io/v1beta1",
    "/apis/autoscaling",
    "/apis/autoscaling/v1",
    "/apis/batch",
    "/apis/batch/v1",
    "/apis/batch/v2alpha1",
    "/apis/certificates.k8s.io",
    "/apis/certificates.k8s.io/v1alpha1",
    "/apis/extensions",
    "/apis/extensions/v1beta1",
    "/apis/policy",
    "/apis/policy/v1beta1",
    "/apis/storage.k8s.io",
    "/apis/storage.k8s.io/v1beta1",
    "/controllers",
    "/healthz",
    "/healthz/ping",
    "/healthz/poststarthook/bootstrap-controller",
    "/healthz/poststarthook/extensions/third-party-resources",
    "/healthz/ready",
    "/metrics",
    "/oapi",
    "/oapi/v1",
    "/osapi",
    "/swaggerapi/",
    "/version",
    "/version/openshift"
  ]
}

Will set this defect to VERIFIED, thanks
Comment 14 Junqi Zhao 2017-07-14 02:25:15 EDT
Verify steps:
1. Scale down rc hawkular-metrics and change -DKUBERNETES_MASTER_URL and env parameter MASTER_URL value from default value "https://kubernetes.default.svc.cluster.local" to "https://kubernetes.default.svc"

2. Scale up rc hawkular-metrics, wait for hawkular-metrics pod starts up

3. oc rsh ${hawkular-metrics-pod}, run command:
curl -k https://kubernetes.default.svc

the result should be similar to Comment 13

4. Make sure metrics diagram also could be shown on web console.

Testing env:

# openshift version
openshift v3.5.5.31.4
kubernetes v1.5.2+43a9be4
etcd 3.1.0

images from ops registry
metrics-hawkular-metrics   v3.5                bba7b194fec5        7 days ago          1.27 GB
metrics-heapster           v3.5                4e29df6bda85        2 weeks ago         318.5 MB
metrics-cassandra          v3.5                15a64aac8593        2 weeks ago         540.5 MB
Comment 17 Marko Myllynen 2017-08-10 03:18:44 EDT
Anyone wanting to avoid this issue during installation should do:

# sed -i -e 's,kubernetes.default.svc.cluster.local,kubernetes.default.svc,' /usr/share/ansible/openshift-ansible/roles/openshift_metrics/defaults/main.yaml

Note You need to log in before you can comment on or make changes to this bug.