Description of problem: hpa can't get metrics info from heaster, the hpa current always is "<waiting>" Version-Release number of selected component (if applicable): openshift v1.0.7-203-geebdecd kubernetes v1.2.0-alpha.1-1107-g4c8e6f4 etcd 2.1.2 How reproducible: Always Steps to Reproduce: 1.Deploy metrics in kube-system project, refer this: https://github.com/openshift/origin-metrics 2.Check metrics status [fedora@ip-172-18-15-150 sample-app]$ oc get pod -n kube-system NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-ndtc9 1/1 Running 0 1h hawkular-metrics-dayum 1/1 Running 0 1h heapster-7ekaj 1/1 Running 2 1h metrics-deployer-qpfun 0/1 Completed 0 1h 3.Create a hpa $ cat hpa.yaml apiVersion: extensions/v1beta1 kind: HorizontalPodAutoscaler metadata: name: php-apache spec: maxReplicas: 10 minReplicas: 1 scaleRef: kind: ReplicationController name: php-apache-1 namespace: dma1 cpuUtilization: targetPercentage: 10 $ oc create -f hpa.yaml 4.Check the hpa status [fedora@ip-172-18-15-150 sample-app]$ oc get hpa -n dma1 NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE php-apache ReplicationController/dma1/php-apache-1/ 10% <waiting> 859533317008 10 25m [fedora@ip-172-18-15-150 sample-app]$ oc get pod -n dma1 NAME READY STATUS RESTARTS AGE php-apache-1-ldqxr 1/1 Running 0 31m [fedora@ip-172-18-15-150 sample-app]$ oc get rc -n dma1 CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE php-apache-1 php-apache gcr.io/google_containers/hpa-example deployment=php-apache-1,deploymentconfig=php-apache,run=php-apache 1 33m Actual results: 4.The hpa CURRENT is always "<waiting>" Expected results: 4.Should show current pod cpu info Additional info: The metrics works well, in webconsole we can see the pod cpu/memory info correctly.
Please try deploying origin-metrics to the openshift-infra project.
(In reply to Andy Goldstein from comment #1) > Please try deploying origin-metrics to the openshift-infra project. Even deploy origin-metrics to openshift-infra project. hpa still can't get metrics. [fedora@ip-172-18-11-59 sample-app]$ cat hpa.yaml apiVersion: extensions/v1beta1 kind: HorizontalPodAutoscaler metadata: name: php-apache spec: scaleRef: kind: DeploymentConfig name: php-apache namespace: dma subresource: scale minReplicas: 1 maxReplicas: 10 cpuUtilization: targetPercentage: 50 [fedora@ip-172-18-11-59 sample-app]$ cat dc.yaml apiVersion: v1 kind: DeploymentConfig metadata: labels: run: php-apache name: php-apache spec: replicas: 1 selector: run: php-apache template: metadata: labels: run: php-apache spec: containers: - image: gcr.io/google_containers/hpa-example imagePullPolicy: IfNotPresent name: php-apache resources: requests: cpu: 200m securityContext: privileged: true restartPolicy: Always securityContext: {} terminationGracePeriodSeconds: 30 triggers: - type: ConfigChange [fedora@ip-172-18-11-59 sample-app]$ oc get pod -n openshift-infra NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-x61ln 1/1 Running 0 3h hawkular-metrics-s9ev0 1/1 Running 0 3h heapster-phcog 1/1 Running 2 3h metrics-deployer-zqvhv 0/1 Completed 0 3h [fedora@ip-172-18-11-59 sample-app]$ oc get dc -n dma NAME TRIGGERS LATEST php-apache ConfigChange 1 [fedora@ip-172-18-11-59 sample-app]$ oc get rc -n dma CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE php-apache-1 php-apache gcr.io/google_containers/hpa-example deployment=php-apache-1,deploymentconfig=php-apache,run=php-apache 1 14m [fedora@ip-172-18-11-59 sample-app]$ oc get pod -n dma NAME READY STATUS RESTARTS AGE php-apache-1-c3h6r 1/1 Running 0 14m [fedora@ip-172-18-11-59 sample-app]$ oc get hpa -n dma NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE php-apache DeploymentConfig/php-apache/scale 50% <waiting> 1 10 2m [fedora@ip-172-18-11-59 sample-app]$ oc get event -n dma FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT REASON SOURCE MESSAGE 15m 15m 1 php-apache-1-c3h6r Pod Scheduled {scheduler } Successfully assigned php-apache-1-c3h6r to ip-172-18-11-59 15m 15m 1 php-apache-1-c3h6r Pod implicitly required container POD Pulled {kubelet ip-172-18-11-59} Container image "openshift/origin-pod:v1.0.7" already present on machine 15m 15m 1 php-apache-1-c3h6r Pod implicitly required container POD Created {kubelet ip-172-18-11-59} Created with docker id 761ea4c117d7 15m 15m 1 php-apache-1-c3h6r Pod implicitly required container POD Started {kubelet ip-172-18-11-59} Started with docker id 761ea4c117d7 15m 15m 1 php-apache-1-c3h6r Pod spec.containers{php-apache} Pulled {kubelet ip-172-18-11-59} Container image "gcr.io/google_containers/hpa-example" already present on machine 15m 15m 1 php-apache-1-c3h6r Pod spec.containers{php-apache} Created {kubelet ip-172-18-11-59} Created with docker id f60bfdb1d3b7 15m 15m 1 php-apache-1-c3h6r Pod spec.containers{php-apache} Started {kubelet ip-172-18-11-59} Started with docker id f60bfdb1d3b7 15m 15m 1 php-apache-1-deploy Pod Scheduled {scheduler } Successfully assigned php-apache-1-deploy to ip-172-18-11-59 15m 15m 1 php-apache-1-deploy Pod implicitly required container POD Pulled {kubelet ip-172-18-11-59} Container image "openshift/origin-pod:v1.0.7" already present on machine 15m 15m 1 php-apache-1-deploy Pod implicitly required container POD Created {kubelet ip-172-18-11-59} Created with docker id 0f8dc5ade998 15m 15m 1 php-apache-1-deploy Pod implicitly required container POD Started {kubelet ip-172-18-11-59} Started with docker id 0f8dc5ade998 15m 15m 1 php-apache-1-deploy Pod spec.containers{deployment} Pulled {kubelet ip-172-18-11-59} Container image "openshift/origin-deployer:v1.0.7" already present on machine 15m 15m 1 php-apache-1-deploy Pod spec.containers{deployment} Created {kubelet ip-172-18-11-59} Created with docker id d8ceb4c1c65d 15m 15m 1 php-apache-1-deploy Pod spec.containers{deployment} Started {kubelet ip-172-18-11-59} Started with docker id d8ceb4c1c65d 14m 14m 1 php-apache-1-deploy Pod implicitly required container POD Killing {kubelet ip-172-18-11-59} Killing with docker id 0f8dc5ade998 15m 15m 1 php-apache-1 ReplicationController failedUpdate {deployer } Error updating deployment dma/php-apache-1 status to Pending 15m 15m 1 php-apache-1 ReplicationController SuccessfulCreate {replication-controller } Created pod: php-apache-1-c3h6r 14m 2m 24 php-apache HorizontalPodAutoscaler FailedGetMetrics {horizontal-pod-autoscaler } failed to get CPU consumption and request: metrics obtained for 0/1 of pods 14m 2m 24 php-apache HorizontalPodAutoscaler FailedComputeReplicas {horizontal-pod-autoscaler } failed to get cpu utilization: failed to get CPU consumption and request: metrics obtained for 0/1 of pods 2m 6s 5 php-apache HorizontalPodAutoscaler FailedGetMetrics {horizontal-pod-autoscaler } failed to get CPU consumption and request: metrics obtained for 0/1 of pods 2m 6s 5 php-apache HorizontalPodAutoscaler FailedComputeReplicas {horizontal-pod-autoscaler } failed to get cpu utilization: failed to get CPU consumption and request: metrics obtained for 0/1 of pods
heapster logs: [fedora@ip-172-18-11-59 sample-app]$ oc logs heapster-phcog -n openshift-infra Starting Heapster with the following arguments: --source=kubernetes:https://172.18.11.59:8443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=SH1cvYyCmhTjfZh&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users= I1105 03:14:45.203968 1 heapster.go:60] /heapster --source=kubernetes:https://172.18.11.59:8443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250 --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=SH1cvYyCmhTjfZh&filter=label(container_name:^/system.slice.*|^/user.slice) --logtostderr=true --tls_cert=/secrets/heapster.cert --tls_key=/secrets/heapster.key --tls_client_ca=/secrets/heapster.client-ca --allowed_users= I1105 03:14:45.204103 1 heapster.go:61] Heapster version 0.18.2 I1105 03:14:45.204791 1 kube_factory.go:169] Using Kubernetes client with master "https://172.18.11.59:8443" and version "v1" I1105 03:14:45.204806 1 kube_factory.go:170] Using kubelet port 10250 I1105 03:14:45.205128 1 driver.go:492] Initialised Hawkular Sink with parameters {_system https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=hawkular&pass=SH1cvYyCmhTjfZh&filter=label(container_name:^/system.slice.*|^/user.slice) 0xc20811cd80 } I1105 03:14:45.401380 1 heapster.go:71] Starting heapster on port 8082 W1105 04:14:45.308246 1 reflector.go:224] /home/michael/projects/go/src/k8s.io/heapster/sources/pods.go:173: watch of *api.Pod ended with: very short watch W1105 05:14:45.306260 1 reflector.go:224] /home/michael/projects/go/src/k8s.io/heapster/sources/nodes/kube.go:156: watch of *api.Node ended with: very short watch
Can we get API server logs? Did you give the heapster service account the correct permissions?
The upstream Heapster images do not have any security enabled on them which is why they may work for you. It is deemed a security risk to be able to access Heapster directly, and as such direct access to Heapster is disabled by default with the Heapster images provided by the Origin Metrics containers. If you want to be able to access Heapster using the Origin Metrics containers, you need to use certificate based authentication. As part of the deploy, you will need to specify a few secrets. Please see https://github.com/openshift/origin-metrics/blob/master/docs/advanced_configuration.md for more information. Specifically the secrets you need to provide: heapster_client_ca.cert: the CA used to sign the client's certificate heapster_allowed_users: a comman separated list of CN's to accept from the CA certificate. Please also note that the HPA is not something that has been tested with the Origin Metrics components.
I tried to reproduce and it seems that I see very similar behaviour as reported. [root@localhost ~]# oc get hpa --config=/root/origin/openshift.local.config/master/admin.kubeconfig NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE php-apache ReplicationController/php-apache-1/scale 10% <waiting> 1 10 17m Built openshift from latest origin code and started master/node on same F22 vm: #openshift start master --config=./openshift.local.config/master/master-config.yaml #openshift start node --config=./openshift.local.config/node-192.168.122.253/node-config.yaml I followed https://github.com/openshift/origin-metrics to deploy metrics. [root@localhost ~]# oc get pods --config=/root/origin/openshift.local.config/master/admin.kubeconfig NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-3p5j8 1/1 Running 0 21m hawkular-metrics-smis0 1/1 Running 0 21m heapster-a9c2y 1/1 Running 3 21m metrics-deployer-20ctn 0/1 Completed 0 21m #oc get project --config=/root/origin/openshift.local.config/master/admin.kubeconfig NAME DISPLAY NAME STATUS openshift Active openshift-infra Active metrics Active default Active please note that I had to add "subresource", otherwise it failed: [root@localhost ~]# cat hpa.yaml apiVersion: extensions/v1beta1 kind: HorizontalPodAutoscaler metadata: name: php-apache spec: maxReplicas: 10 minReplicas: 1 scaleRef: kind: ReplicationController name: php-apache-1 namespace: dma1 subresource: scale cpuUtilization: targetPercentage: 10 Secrets and docker ps -a: http://fpaste.org/287456/62852144/ Events: http://fpaste.org/287454/76232014/
(In reply to Solly Ross from comment #4) > Can we get API server logs? Did you give the heapster service account the > correct permissions? 1. Deployer metrics by flow opts. $ oc create -f https://raw.githubusercontent.com/openshift/origin-metrics/master/metrics-deployer-setup.yaml -n openshift-infra $ oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer -n openshift-infra $ oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster -n openshift-infra $ oc secrets new metrics-deployer nothing=/dev/null -n openshift-infra $ oc process -f https://raw.githubusercontent.com/openshift/origin-metrics/master/metrics.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com,IMAGE_PREFIX=openshift/origin-,IMAGE_VERSION=devel,USE_PERSISTENT_STORAGE=false,MASTER_URL=https://172.18.11.59:8443 | oc create -f - -n openshift-infra "$ oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster -n openshift-infra" have gave heapster service account the cluster-reader permissions
Please see my comment above https://bugzilla.redhat.com/show_bug.cgi?id=1277914#c5 and why the HPA will not be able to connect using those commands you have listed.
The HPA is using the API proxy to reach the heapster service, and assuming the http scheme to do so. To reach heapster securely through the API proxy, we need to do the following: 1. Switch (optionally, via configuration?) the metrics client to specify the https scheme for proxying to the backend service 2. When deploying heapster, include the master CA as the client CA, and "system:master-proxy" as an allowed user
Matt, the master CA is already injected into pods... is it possible to tell heapster to use an existing path for the client ca?
Work to allow HPA to use https through the API proxy in https://github.com/openshift/origin/pull/5763
Hey Jordan Yeah, I can definitely set it up so by default Heapster uses the master CA and trusts any certificates signed with "system:master-proxy" in the certificates CN. And if an admin needs something else, they can override the CA and allowed users list. If this will fix the issue, how does one quickly test that the HPA is working and has access? Just so I can verify before pushing out a commit.
@mwringe: take a look at https://github.com/openshift/openshift-docs/pull/1147 for detailed instructions (but the short story is: create an HPA pointing to a RC or DC with a CPU request, wait a couple minutes, see if the target replicas has changed). You should see in the log "Successfully scaled {hpa-name}" (assuming you create the HPA such that a scale will occur).
The HPA uses the API service proxy to access heapster. If you can access heapster via the API service proxy, the HPA will be able to as well. It might be simpler to just try accessing it via the API service proxy. After starting openshift, and deploying heapster (and setting up a service to point to it), try using the cluster admin credentials to access the heapster API. Assuming it was deployed in the openshift-infra namespace, and the service is named 'heapster', it would look like this: curl -k --cert ./admin.crt --key ./admin.key 'https://master:8443/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/<heapster api path...>'
https://github.com/openshift/origin/pull/5763 merged to master, which makes origin use https://<master>/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/... to contact the heapster service. Note that the origin HPA uses https INSTEAD OF http to contact heapster now, which means that test and setup scenarios using the unsecured upstream images will no longer work. https://github.com/openshift/origin/issues/5781 is open to surface options for those things, but for 3.1, https will need to be used. If heapster is configured correctly (to serve over https, to use the master's CA to verify client certs, and to accept the "system:master-proxy" client CN, this will work with secured heapster now. Not sure whether to mark this ON_QA now, or wait for heapster image defaults to get updated.
Fixed with changes in https://github.com/openshift/origin/pull/5813 https://github.com/openshift/origin-metrics/pull/29
Created attachment 1092579 [details] openshift.log
Hi Jorda, now it always error with "failed to unmarshall heapster response: invalid character 'h' in literal true (expecting 'r')" 6s 6s 1 php-apache HorizontalPodAutoscaler FailedGetMetrics {horizontal-pod-autoscaler } failed to get CPU consumption and request: failed to unmarshall heapster response: invalid character 'h' in literal true (expecting 'r') 6s 6s 1 php-apache HorizontalPodAutoscaler FailedComputeReplicas {horizontal-pod-autoscaler } failed to get cpu utilization: failed to get CPU consumption and request: failed to unmarshall heapster response: invalid character 'h' in literal true (expecting 'r') I attach the openshift.log file
What does the heapster pod log show? What do you see if you curl the URL directly (using the cluster admin credentials)? curl --cacert path/to/ca.crt --cert path/to/admin.crt --key path/to/admin.key 'https://$MASTER:$PORT/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/namespaces/...'
It seem the certificate is not correctly. http://fpaste.org/289086/18976144/ As the doc describe to config metrics with my your own certificates https://github.com/openshift/origin-metrics/blob/master/docs/advanced_configuration.md#configuring-the-deployer Create secret command is "oc secrets new metrics-deployer hawkular-metrics.pem=/my/dir/hm.pem hawkular-metrics-ca.cert=/my/dir/hm-ca.cert " After start openshift we have some certificate file "ca.crt admin.crt admin.key ...". My question is how to use those certificate file to config origin-metrics ? How to use those certificate file to create secret "metrics-deployer" ?
curl is not using the cert/key... you have to use ./admin.crt and ./admin.key when the cert/key file is in the same directory
After debugging, it looks like heapster returns malformed responses to API requests until metrics data is available, resulting in these errors: FailedGetMetrics {horizontal-pod-autoscaler } failed to get CPU consumption and request: failed to unmarshall heapster response: invalid character 'h' in literal true (expecting 'r') After waiting for ~ 2 minutes, the HPA was able to request data about pods successfully
hpa can get metrics info now, verity this bug, for the responses error will trace it on another bug. [fedora@ip-172-18-15-226 sample-app]$ openshift version openshift v1.0.8-16-gd81eca7-dirty kubernetes v1.1.0-origin-1107-g4c8e6f4 [fedora@ip-172-18-15-226 sample-app]$ oc get hpa -n dma NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE php-apache DeploymentConfig/php-apache/scale 50% 471% 1 10 36m [fedora@ip-172-18-15-226 sample-app]$ oc get rc -n dma CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE php-apache-1 php-apache gcr.io/google_containers/hpa-example deployment=php-apache-1,deploymentconfig=php-apache,run=php-apache 10 1h [fedora@ip-172-18-15-226 sample-app]$ oc get pod -n dma NAME READY STATUS RESTARTS AGE php-apache-1-47s2z 1/1 Running 0 1m php-apache-1-5ewnw 1/1 Running 0 1m php-apache-1-6sgaw 1/1 Running 0 1m php-apache-1-ebo2r 1/1 Running 0 1m php-apache-1-f4asq 1/1 Running 0 1m php-apache-1-gnsd8 1/1 Running 0 1m php-apache-1-lvwb2 1/1 Running 0 1m php-apache-1-mdb0p 1/1 Running 0 1m php-apache-1-t3e1f 1/1 Running 0 1m php-apache-1-yx970 1/1 Running 0 1h [fedora@ip-172-18-15-226 sample-app]$ oc get pod -n dma NAME READY STATUS RESTARTS AGE php-apache-1-47s2z 1/1 Running 0 2m php-apache-1-5ewnw 1/1 Running 0 2m php-apache-1-6sgaw 1/1 Running 0 2m php-apache-1-ebo2r 1/1 Running 0 2m php-apache-1-f4asq 1/1 Running 0 2m php-apache-1-gnsd8 1/1 Running 0 2m php-apache-1-lvwb2 1/1 Running 0 2m php-apache-1-mdb0p 1/1 Running 0 2m php-apache-1-t3e1f 1/1 Running 0 2m php-apache-1-yx970 1/1 Running 0 1h [fedora@ip-172-18-15-226 sample-app]$ oc get rc -n dma CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE php-apache-1 php-apache gcr.io/google_containers/hpa-example deployment=php-apache-1,deploymentconfig=php-apache,run=php-apache 10 1h [fedora@ip-172-18-15-226 sample-app]$ oc get hpa -n dma NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE php-apache DeploymentConfig/php-apache/scale 50% 45% 1 10 37m