Description of problem: hpa can't get metrics info from heaster, the hpa current always is "<waiting>" Version-Release number of selected component (if applicable): openshift v1.1-368-ge35815d kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 How reproducible: always Steps to Reproduce: 1.Deploy metrics in kube-system project, refer this: https://github.com/openshift/origin-metrics 2.Check metrics status [fedora@ip-172-18-15-150 sample-app]$ oc get pod -n kube-system NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-ndtc9 1/1 Running 0 1h hawkular-metrics-dayum 1/1 Running 0 1h heapster-7ekaj 1/1 Running 2 1h metrics-deployer-qpfun 0/1 Completed 0 1h 3.Create a hpa $ cat hpa.yaml apiVersion: extensions/v1beta1 kind: HorizontalPodAutoscaler metadata: name: php-apache spec: scaleRef: kind: DeploymentConfig name: php-apache subresource: scale minReplicas: 1 maxReplicas: 10 cpuUtilization: targetPercentage: 50 $ oc create -f hpa.yaml 4.Check the hpa status [fedora@ip-172-18-15-150 sample-app]$ oc get hpa -n dma1 NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE php-apache ReplicationController/dma1/php-apache-1/ 10% <waiting> 859533317008 10 25m [fedora@ip-172-18-15-150 sample-app]$ oc get pod -n dma1 NAME READY STATUS RESTARTS AGE php-apache-1-ldqxr 1/1 Running 0 31m [fedora@ip-172-18-15-150 sample-app]$ oc get rc -n dma1 CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE php-apache-1 php-apache gcr.io/google_containers/hpa-example deployment=php-apache-1,deploymentconfig=php-apache,run=php-apache 1 33m Actual results: 4.The hpa CURRENT is always "<waiting>" Expected results: 4.Should show current pod cpu info Additional info: The metrics works well, in webconsole we can see the pod cpu/memory info correctly.
Reassigning to the Node team.
Could you please post your controller-manager logs and the logs for your heapster container? They are both generally very helpful in debugging issues with the HPA. In addition, it is useful to see the deployment config definition -- you need to a CPU request in your pod templates
(In reply to Solly Ross from comment #5) > Could you please post your controller-manager logs and the logs for your > heapster container? They are both generally very helpful in debugging > issues with the HPA. In addition, it is useful to see the deployment config > definition -- you need to a CPU request in your pod templates Some logs maybe can help. There is always Forbidden error. //------------------------------------------------------------- I1218 04:21:45.072975 2601 authorizer.go:30] allowed=true, reason=allowed by cluster rule I1218 04:21:45.073683 2601 server.go:1082] GET /stats/openshift-infra/hawkular-cassandra-1-37fcb/ddeed198-a549-11e5-893d-0ef030a6b03f/hawkular-cassandra-1: (868.525µs) 0 [[Go 1.1 package http] 172.17.0.11:57551] I1218 04:21:46.681823 2601 metrics_client.go:137] Sum of cpu requested: {0.200 DecimalSI} I1218 04:21:46.684055 2601 metrics_client.go:174] Metrics available: { "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "User \"system:serviceaccount:openshift-infra:hpa-controller\" cannot \"proxy\" \"services\" with name \"https:heapster:\" in project \"openshift-infra\"", "reason": "Forbidden", "details": { "name": "https:heapster:", "kind": "services" }, "code": 403 } W1218 04:21:46.684124 2601 horizontal.go:189] Failed to reconcile php-apache: failed to compute desired number of replicas based on CPU utilization for DeploymentConfig/dma/php-apache: failed to get cpu utilization: failed to get CPU consumption and request: metrics obtained for 0/1 of pods
Please refer to comment 6.
You need to reconcile your cluster roles: https://docs.openshift.org/latest/install_config/upgrades.html#updating-policy-definitions
(In reply to Andy Goldstein from comment #8) > You need to reconcile your cluster roles: > https://docs.openshift.org/latest/install_config/upgrades.html#updating- > policy-definitions All the policy is default set, we don't change any policy, why need reconcile. The env is all-in-one build from latest code. [fedora@ip-172-18-9-28 sample-app]$ openshift version openshift v1.1-586-gb504019 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 detail info: http://pastebin.test.redhat.com/336594
When run "oadm policy reconcile-cluster-roles", nothing is output.
If you do "oc describe clusterPolicy default" does the hpa-controller list access for proxying the heapster service? system:hpa-controller ... [proxy] [services] [https:heapster:] [] Is this just a default install? Or have you modified any of the security settings?
did you reconcile cluster role bindings as well? oadm policy reconcile-cluster-role-bindings --confirm If you upgrade from a previous build with old data in etcd, default roles and rolebindings need updating.
(In reply to Jordan Liggitt from comment #12) > did you reconcile cluster role bindings as well? > > oadm policy reconcile-cluster-role-bindings --confirm > > If you upgrade from a previous build with old data in etcd, default roles > and rolebindings need updating. The env is new one. I don't upgrade from a previous build.
(In reply to Matt Wringe from comment #11) > If you do "oc describe clusterPolicy default" does the hpa-controller list > access for proxying the heapster service? > > system:hpa-controller > ... > [proxy] [services] [https:heapster:] [] > > Is this just a default install? Or have you modified any of the security > settings? All the clusterPolicy default is http://pastebin.test.redhat.com/336837 system:hpa-controller has the policy [proxy] [services] [https:heapster:]
I'm unable to reproduce this on a fresh OSE 3.1 install. I noticed that your pastebin link has "[proxy] [services] [https:heapster]" and not "[proxy] [services] [https:heapster:]" (note the extra colon on the end). ATM, I'm not certain whether or not that's just a display issue, however. Can you generate a new set of policy using oadm, and then post the generated policy YAML?
(In reply to Solly Ross from comment #15) > I'm unable to reproduce this on a fresh OSE 3.1 install. I noticed that > your pastebin link has "[proxy] [services] [https:heapster]" and not > "[proxy] [services] [https:heapster:]" (note the extra colon on the end). > ATM, I'm not certain whether or not that's just a display issue, however. > Can you generate a new set of policy using oadm, and then post the generated > policy YAML? When edit the clusterRoles/clusterRoles system:hpa-controller to 'https:heapster:' works well. $ oc edit clusterRoles system:hpa-controller -o yaml As default the resourceName is "https:heapster" not "https:heapster:" in code. https://github.com/openshift/origin/blob/master/pkg/cmd/server/bootstrappolicy/infra_sa_policy.go#L286 When change the code and build, restart openshift works well too. thanks for your comment.
This is an accidental regression in the policy. Jordan will fix.
Fixed in https://github.com/openshift/origin/pull/6554
In merge queue, marking ON_QA
[fedora@ip-172-18-6-98 sample-app]$ openshift version openshift v1.1-764-g7e08c95 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 Verify this bug. [root@dma amd64]# ./oc get hpa NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE php-apache DeploymentConfig/php-apache/scale 50% 0% 1 10 4m
please refer to comment 20.