Bug 1289503
Summary: | The hpa can't get metrics info | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | zhou ying <yinzhou> | |
Component: | apiserver-auth | Assignee: | Jordan Liggitt <jliggitt> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | DeShuai Ma <dma> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | unspecified | CC: | agoldste, aos-bugs, dmace, dma, jliggitt, mmccomas, mwringe, sdodson, yinzhou | |
Target Milestone: | --- | Keywords: | Regression | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1309938 (view as bug list) | Environment: | ||
Last Closed: | 2016-05-12 17:13:08 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1309938 |
Description
zhou ying
2015-12-08 10:49:41 UTC
Reassigning to the Node team. Could you please post your controller-manager logs and the logs for your heapster container? They are both generally very helpful in debugging issues with the HPA. In addition, it is useful to see the deployment config definition -- you need to a CPU request in your pod templates (In reply to Solly Ross from comment #5) > Could you please post your controller-manager logs and the logs for your > heapster container? They are both generally very helpful in debugging > issues with the HPA. In addition, it is useful to see the deployment config > definition -- you need to a CPU request in your pod templates Some logs maybe can help. There is always Forbidden error. //------------------------------------------------------------- I1218 04:21:45.072975 2601 authorizer.go:30] allowed=true, reason=allowed by cluster rule I1218 04:21:45.073683 2601 server.go:1082] GET /stats/openshift-infra/hawkular-cassandra-1-37fcb/ddeed198-a549-11e5-893d-0ef030a6b03f/hawkular-cassandra-1: (868.525µs) 0 [[Go 1.1 package http] 172.17.0.11:57551] I1218 04:21:46.681823 2601 metrics_client.go:137] Sum of cpu requested: {0.200 DecimalSI} I1218 04:21:46.684055 2601 metrics_client.go:174] Metrics available: { "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "User \"system:serviceaccount:openshift-infra:hpa-controller\" cannot \"proxy\" \"services\" with name \"https:heapster:\" in project \"openshift-infra\"", "reason": "Forbidden", "details": { "name": "https:heapster:", "kind": "services" }, "code": 403 } W1218 04:21:46.684124 2601 horizontal.go:189] Failed to reconcile php-apache: failed to compute desired number of replicas based on CPU utilization for DeploymentConfig/dma/php-apache: failed to get cpu utilization: failed to get CPU consumption and request: metrics obtained for 0/1 of pods You need to reconcile your cluster roles: https://docs.openshift.org/latest/install_config/upgrades.html#updating-policy-definitions (In reply to Andy Goldstein from comment #8) > You need to reconcile your cluster roles: > https://docs.openshift.org/latest/install_config/upgrades.html#updating- > policy-definitions All the policy is default set, we don't change any policy, why need reconcile. The env is all-in-one build from latest code. [fedora@ip-172-18-9-28 sample-app]$ openshift version openshift v1.1-586-gb504019 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 detail info: http://pastebin.test.redhat.com/336594 When run "oadm policy reconcile-cluster-roles", nothing is output. If you do "oc describe clusterPolicy default" does the hpa-controller list access for proxying the heapster service? system:hpa-controller ... [proxy] [services] [https:heapster:] [] Is this just a default install? Or have you modified any of the security settings? did you reconcile cluster role bindings as well? oadm policy reconcile-cluster-role-bindings --confirm If you upgrade from a previous build with old data in etcd, default roles and rolebindings need updating. (In reply to Jordan Liggitt from comment #12) > did you reconcile cluster role bindings as well? > > oadm policy reconcile-cluster-role-bindings --confirm > > If you upgrade from a previous build with old data in etcd, default roles > and rolebindings need updating. The env is new one. I don't upgrade from a previous build. (In reply to Matt Wringe from comment #11) > If you do "oc describe clusterPolicy default" does the hpa-controller list > access for proxying the heapster service? > > system:hpa-controller > ... > [proxy] [services] [https:heapster:] [] > > Is this just a default install? Or have you modified any of the security > settings? All the clusterPolicy default is http://pastebin.test.redhat.com/336837 system:hpa-controller has the policy [proxy] [services] [https:heapster:] I'm unable to reproduce this on a fresh OSE 3.1 install. I noticed that your pastebin link has "[proxy] [services] [https:heapster]" and not "[proxy] [services] [https:heapster:]" (note the extra colon on the end). ATM, I'm not certain whether or not that's just a display issue, however. Can you generate a new set of policy using oadm, and then post the generated policy YAML? (In reply to Solly Ross from comment #15) > I'm unable to reproduce this on a fresh OSE 3.1 install. I noticed that > your pastebin link has "[proxy] [services] [https:heapster]" and not > "[proxy] [services] [https:heapster:]" (note the extra colon on the end). > ATM, I'm not certain whether or not that's just a display issue, however. > Can you generate a new set of policy using oadm, and then post the generated > policy YAML? When edit the clusterRoles/clusterRoles system:hpa-controller to 'https:heapster:' works well. $ oc edit clusterRoles system:hpa-controller -o yaml As default the resourceName is "https:heapster" not "https:heapster:" in code. https://github.com/openshift/origin/blob/master/pkg/cmd/server/bootstrappolicy/infra_sa_policy.go#L286 When change the code and build, restart openshift works well too. thanks for your comment. This is an accidental regression in the policy. Jordan will fix. In merge queue, marking ON_QA [fedora@ip-172-18-6-98 sample-app]$ openshift version openshift v1.1-764-g7e08c95 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 Verify this bug. [root@dma amd64]# ./oc get hpa NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE php-apache DeploymentConfig/php-apache/scale 50% 0% 1 10 4m please refer to comment 20. |