Bug 1289503

Summary:	The hpa can't get metrics info
Product:	OpenShift Container Platform	Reporter:	zhou ying <yinzhou>
Component:	apiserver-auth	Assignee:	Jordan Liggitt <jliggitt>
Status:	CLOSED CURRENTRELEASE	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	CC:	agoldste, aos-bugs, dmace, dma, jliggitt, mmccomas, mwringe, sdodson, yinzhou
Target Milestone:	---	Keywords:	Regression
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1309938 (view as bug list)		Environment:
Last Closed:	2016-05-12 17:13:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1309938

Description zhou ying 2015-12-08 10:49:41 UTC

Description of problem:
hpa can't get metrics info from heaster, the hpa current always is "<waiting>"

Version-Release number of selected component (if applicable):
openshift v1.1-368-ge35815d
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

How reproducible:
always

Steps to Reproduce:
1.Deploy metrics in kube-system project, refer this: https://github.com/openshift/origin-metrics

2.Check metrics status
[fedora@ip-172-18-15-150 sample-app]$ oc get pod -n kube-system
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-ndtc9   1/1       Running     0          1h
hawkular-metrics-dayum       1/1       Running     0          1h
heapster-7ekaj               1/1       Running     2          1h
metrics-deployer-qpfun       0/1       Completed   0          1h

3.Create a hpa
$ cat hpa.yaml
apiVersion: extensions/v1beta1
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleRef:
    kind: DeploymentConfig
    name: php-apache
    subresource: scale
  minReplicas: 1
  maxReplicas: 10
  cpuUtilization:
    targetPercentage: 50
$ oc create -f hpa.yaml

4.Check the hpa status
[fedora@ip-172-18-15-150 sample-app]$ oc get hpa -n dma1
NAME         REFERENCE                                  TARGET    CURRENT     MINPODS        MAXPODS   AGE
php-apache   ReplicationController/dma1/php-apache-1/   10%       <waiting>   859533317008   10        25m
[fedora@ip-172-18-15-150 sample-app]$ oc get pod -n dma1
NAME                 READY     STATUS    RESTARTS   AGE
php-apache-1-ldqxr   1/1       Running   0          31m
[fedora@ip-172-18-15-150 sample-app]$ oc get rc -n dma1
CONTROLLER     CONTAINER(S)   IMAGE(S)                               SELECTOR                                                             REPLICAS   AGE
php-apache-1   php-apache     gcr.io/google_containers/hpa-example   deployment=php-apache-1,deploymentconfig=php-apache,run=php-apache   1          33m

Actual results:
4.The hpa CURRENT is always "<waiting>"

Expected results:
4.Should show current pod cpu info

Additional info:
The metrics works well, in webconsole we can see the pod cpu/memory info correctly.

Comment 3 Dan Mace 2015-12-08 14:04:24 UTC

Reassigning to the Node team.

Comment 5 Solly Ross 2015-12-17 15:25:42 UTC

Could you please post your controller-manager logs and the logs for your heapster container?  They are both generally very helpful in debugging issues with the HPA.  In addition, it is useful to see the deployment config definition -- you need to a CPU request in your pod templates

Comment 6 DeShuai Ma 2015-12-18 09:31:53 UTC

(In reply to Solly Ross from comment #5)
> Could you please post your controller-manager logs and the logs for your
> heapster container?  They are both generally very helpful in debugging
> issues with the HPA.  In addition, it is useful to see the deployment config
> definition -- you need to a CPU request in your pod templates

Some logs maybe can help. There is always Forbidden error.

//-------------------------------------------------------------
I1218 04:21:45.072975    2601 authorizer.go:30] allowed=true, reason=allowed by cluster rule
I1218 04:21:45.073683    2601 server.go:1082] GET /stats/openshift-infra/hawkular-cassandra-1-37fcb/ddeed198-a549-11e5-893d-0ef030a6b03f/hawkular-cassandra-1: (868.525µs) 0 [[Go 1.1 package http] 172.17.0.11:57551]
I1218 04:21:46.681823    2601 metrics_client.go:137] Sum of cpu requested: {0.200 DecimalSI}
I1218 04:21:46.684055    2601 metrics_client.go:174] Metrics available: {
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "User \"system:serviceaccount:openshift-infra:hpa-controller\" cannot \"proxy\" \"services\" with name \"https:heapster:\" in project \"openshift-infra\"",
  "reason": "Forbidden",
  "details": {
    "name": "https:heapster:",
    "kind": "services"
  },
  "code": 403
}
W1218 04:21:46.684124    2601 horizontal.go:189] Failed to reconcile php-apache: failed to compute desired number of replicas based on CPU utilization for DeploymentConfig/dma/php-apache: failed to get cpu utilization: failed to get CPU consumption and request: metrics obtained for 0/1 of pods

Comment 7 zhou ying 2015-12-18 10:07:08 UTC

Please refer to comment 6.

Comment 8 Andy Goldstein 2015-12-18 16:54:05 UTC

You need to reconcile your cluster roles: https://docs.openshift.org/latest/install_config/upgrades.html#updating-policy-definitions

Comment 9 DeShuai Ma 2015-12-21 04:37:16 UTC

(In reply to Andy Goldstein from comment #8)
> You need to reconcile your cluster roles:
> https://docs.openshift.org/latest/install_config/upgrades.html#updating-
> policy-definitions

All the policy is default set, we don't change any policy, why need reconcile. The env is all-in-one build from latest code.

[fedora@ip-172-18-9-28 sample-app]$ openshift version
openshift v1.1-586-gb504019
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

detail info: http://pastebin.test.redhat.com/336594

Comment 10 DeShuai Ma 2015-12-21 04:38:58 UTC

When run "oadm policy reconcile-cluster-roles", nothing is output.

Comment 11 Matt Wringe 2015-12-21 16:36:18 UTC

If you do "oc describe clusterPolicy default" does the hpa-controller list access for proxying the heapster service?

system:hpa-controller
...
[proxy]    [services]    [https:heapster:]    []

Is this just a default install? Or have you modified any of the security settings?

Comment 12 Jordan Liggitt 2015-12-21 18:57:03 UTC

did you reconcile cluster role bindings as well?

oadm policy reconcile-cluster-role-bindings --confirm

If you upgrade from a previous build with old data in etcd, default roles and rolebindings need updating.

Comment 13 DeShuai Ma 2015-12-22 01:18:37 UTC

(In reply to Jordan Liggitt from comment #12)
> did you reconcile cluster role bindings as well?
> 
> oadm policy reconcile-cluster-role-bindings --confirm
> 
> If you upgrade from a previous build with old data in etcd, default roles
> and rolebindings need updating.

The env is new one. I don't upgrade from a previous build.

Comment 14 DeShuai Ma 2015-12-22 01:43:09 UTC

(In reply to Matt Wringe from comment #11)
> If you do "oc describe clusterPolicy default" does the hpa-controller list
> access for proxying the heapster service?
> 
> system:hpa-controller
> ...
> [proxy]    [services]    [https:heapster:]    []
> 
> Is this just a default install? Or have you modified any of the security
> settings?

All the clusterPolicy default is http://pastebin.test.redhat.com/336837
system:hpa-controller has the policy
[proxy]    [services]    [https:heapster:]

Comment 15 Solly Ross 2016-01-05 19:10:48 UTC

I'm unable to reproduce this on a fresh OSE 3.1 install.  I noticed that your pastebin link has "[proxy] [services] [https:heapster]" and not "[proxy] [services] [https:heapster:]" (note the extra colon on the end).  ATM, I'm not certain whether or not that's just a display issue, however.  Can you generate a new set of policy using oadm, and then post the generated policy YAML?

Comment 16 DeShuai Ma 2016-01-06 10:07:32 UTC

(In reply to Solly Ross from comment #15)
> I'm unable to reproduce this on a fresh OSE 3.1 install.  I noticed that
> your pastebin link has "[proxy] [services] [https:heapster]" and not
> "[proxy] [services] [https:heapster:]" (note the extra colon on the end). 
> ATM, I'm not certain whether or not that's just a display issue, however. 
> Can you generate a new set of policy using oadm, and then post the generated
> policy YAML?

When edit the clusterRoles/clusterRoles system:hpa-controller to 'https:heapster:' works well.
$ oc edit clusterRoles system:hpa-controller -o yaml
As default the resourceName is "https:heapster" not "https:heapster:" in code.
https://github.com/openshift/origin/blob/master/pkg/cmd/server/bootstrappolicy/infra_sa_policy.go#L286

When change the code and build, restart openshift works well too.

thanks for your comment.

Comment 17 Andy Goldstein 2016-01-06 14:04:30 UTC

This is an accidental regression in the policy. Jordan will fix.

Comment 18 Jordan Liggitt 2016-01-12 15:42:35 UTC

Fixed in https://github.com/openshift/origin/pull/6554

Comment 19 Jordan Liggitt 2016-01-12 15:42:53 UTC

In merge queue, marking ON_QA

Comment 20 DeShuai Ma 2016-01-13 03:19:39 UTC

[fedora@ip-172-18-6-98 sample-app]$ openshift version
openshift v1.1-764-g7e08c95
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

Verify this bug.

[root@dma amd64]# ./oc get hpa
NAME         REFERENCE                           TARGET    CURRENT   MINPODS   MAXPODS   AGE
php-apache   DeploymentConfig/php-apache/scale   50%       0%        1         10        4m

Comment 21 zhou ying 2016-07-26 01:43:15 UTC

please refer to comment 20.