Bug 1707785

Summary: Autoscaling for Memory Utilization is not working
Product: OpenShift Container Platform Reporter: Weinan Liu <weinliu>
Component: NodeAssignee: Joel Smith <joelsmith>
Status: VERIFIED --- QA Contact: Weinan Liu <weinliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: aos-bugs, erich, gblomqui, joelsmith, jokerman, mmccomas, schoudha, sjenning, suchaudh, wsun
Target Milestone: ---Keywords: Reopened
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-08 13:22:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Weinan Liu 2019-05-08 10:55:28 UTC
Autoscaling for Memory Utilization is not working as expected. creating HPA for Memory based autoscaling is failing while looking for resource.

Version-Release number of selected component (if applicable):
$ oc version
oc v4.0.0-0.101.0
kubernetes v1.11.0+0e214b3455
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://10.1.10.122:6443
kubernetes v1.13.4+a469c28
[nathan@dhcp-140-7 0508]$ oc get clusterverion
error: the server doesn't have a resource type "clusterverion"
[nathan@dhcp-140-7 0508]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.1.0-0.nightly-2019-05-07-233329   True        False         6h52m     Cluster version is 4.1.0-0.nightly-2019-05-07-233329


How reproducible:
Always

Steps to Reproduce:

make sure autoscaling/v2beta1 available
$ oc get --raw /apis/autoscaling/v2beta1
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"autoscaling/v2beta1","resources":[{"name":"horizontalpodautoscalers","singularName":"","namespaced":true,"kind":"HorizontalPodAutoscaler","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["hpa"],"categories":["all"]},{"name":"horizontalpodautoscalers/status","singularName":"","namespaced":true,"kind":"HorizontalPodAutoscaler","verbs":["get","patch","update"]}]}


1. Deploy app
~~~
# oc get dc,pods
NAME                                         REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfig.apps.openshift.io/ruby-ex   1          1         1         config,image(ruby-ex:latest)

NAME                  READY     STATUS      RESTARTS   AGE
pod/ruby-ex-1-6kt4v   1/1       Running     0          2d
pod/ruby-ex-1-build   0/1       Completed   0          2d
~~~

2. Create the HPA config file:
~~~
# cat hpa.yaml 
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-resource-metrics-memory 
spec:
  scaleTargetRef:
    apiVersion: apps/v1 
    kind: DepoymentConfig 
    name: ruby-ex
  minReplicas: 1 
  maxReplicas: 10 
  metrics:
  - type: Resource
    resource:
      name: memory
      targetAverageUtilization: 50 
~~~

3. Create the autoscale:
~~~
[nathan@dhcp-140-7 0508]$ oc create -f hpa.yaml 
horizontalpodautoscaler.autoscaling/hpa-resource-metrics-memory created
[nathan@dhcp-140-7 0508]$ oc describe hpa 
Name:                                                     hpa-resource-metrics-memory
Namespace:                                                weinliu
Labels:                                                   <none>
Annotations:                                              <none>
CreationTimestamp:                                        Wed, 08 May 2019 18:45:13 +0800
Reference:                                                DepoymentConfig/ruby-ex
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  <unknown> / 50%
Min replicas:                                             1
Max replicas:                                             10
DepoymentConfig pods:                                     0 current / 0 desired
Conditions:
  Type         Status  Reason          Message
  ----         ------  ------          -------
  AbleToScale  False   FailedGetScale  the HPA controller was unable to get the target's current scale: no matches for kind "DepoymentConfig" in group "apps"
Events:
  Type     Reason          Age   From                       Message
  ----     ------          ----  ----                       -------
  Warning  FailedGetScale  1s    horizontal-pod-autoscaler  no matches for kind "DepoymentConfig" in group "apps"


Name:                        resource-memory
Namespace:                   weinliu
Labels:                      <none>
Annotations:                 <none>
CreationTimestamp:           Wed, 08 May 2019 18:16:33 +0800
Reference:                   ReplicationController/hello-openshift
Metrics:                     ( current / target )
  resource memory on pods:   <unknown> / 500Mi
Min replicas:                1
Max replicas:                10
ReplicationController pods:  0 current / 1 desired
Conditions:
  Type         Status  Reason            Message
  ----         ------  ------            -------
  AbleToScale  True    SucceededRescale  the HPA controller was able to update the target scale to 1
Events:
  Type    Reason             Age                 From                       Message
  ----    ------             ----                ----                       -------
  Normal  SuccessfulRescale  3m (x101 over 28m)  horizontal-pod-autoscaler  New size: 1; reason: Current number of replicas below Spec.MinReplicas
~~~

4. Check what apiVersion is HPA created:
~~~[nathan@dhcp-140-7 0508]$ oc get hpa -o yaml hpa-resource-metrics-memory
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"False","lastTransitionTime":"2019-05-08T10:45:28Z","reason":"FailedGetScale","message":"the
      HPA controller was unable to get the target''s current scale: no matches for
      kind \"DepoymentConfig\" in group \"apps\""}]'
    autoscaling.alpha.kubernetes.io/metrics: '[{"type":"Resource","resource":{"name":"memory","targetAverageUtilization":50}}]'
  creationTimestamp: 2019-05-08T10:45:13Z
  name: hpa-resource-metrics-memory
  namespace: weinliu
  resourceVersion: "162589"
  selfLink: /apis/autoscaling/v1/namespaces/weinliu/horizontalpodautoscalers/hpa-resource-metrics-memory
  uid: 5c3bddf4-717e-11e9-aa80-801844ef10ac
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: DepoymentConfig
    name: ruby-ex
status:
  currentReplicas: 0
  desiredReplicas: 0
~~~

from the 2 and 4 we can see that the apiVersion of the resources are getting changes. 


Actual results:
The HPA is failing.

Expected results:
The pods should be able to autoscale based on the Memory utilization.

Additional Info
The same issue on 3.11 is still new
https://bugzilla.redhat.com/show_bug.cgi?id=1701469

Comment 1 Seth Jennings 2019-05-08 13:22:33 UTC
"DeploymentConfig" is misspelled

Comment 3 Seth Jennings 2019-05-09 14:08:02 UTC
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#api-object

This is working fine afiact

$ oc new-project demo
Now using project "demo" on server "https://api.lab.variantweb.net:6443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git

to build a new example application in Ruby.

$ ls
memory.yaml  rc.yaml

$ cat rc.yaml 
apiVersion: v1
kind: ReplicationController
metadata:
 labels:
   run: hello-openshift
 name: hello-openshift
spec:
 replicas: 1
 selector:
   run: hello-openshift
 template:
   metadata:
     creationTimestamp: null
     labels:
       run: hello-openshift
   spec:
       containers:
       - image: openshift/hello-openshift
         imagePullPolicy: Always
         name: hello-openshift
         ports:
         - containerPort: 8080
           protocol: TCP
         resources:
           limits:
             cpu: 500m
             memory: 512Mi
           requests:
             cpu: 100m
             memory: 256Mi
         terminationMessagePath: /dev/termination-log
       dnsPolicy: ClusterFirst
       restartPolicy: Always
       securityContext: {}
       terminationGracePeriodSeconds: 30

$ oc create -f rc.yaml 
replicationcontroller/hello-openshift created

$ cat memory.yaml 
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
 name: resource-memory
spec:
 scaleTargetRef:
   kind: ReplicationController
   name: hello-openshift
 minReplicas: 1
 maxReplicas: 3
 metrics:
 - type: Resource
   resource:
     name: memory
     targetAverageValue: 500Mi

$ oc create -f memory.yaml 
horizontalpodautoscaler.autoscaling/resource-memory created

$ oc describe hpa resource-memory 
Name:                        resource-memory
Namespace:                   demo
Labels:                      <none>
Annotations:                 <none>
CreationTimestamp:           Thu, 09 May 2019 09:01:48 -0500
Reference:                   ReplicationController/hello-openshift
Metrics:                     ( current / target )
  resource memory on pods:   <unknown> / 500Mi
Min replicas:                1
Max replicas:                3
ReplicationController pods:  1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Warning  FailedGetResourceMetric       43s (x3 over 73s)  horizontal-pod-autoscaler  unable to get metrics for resource memory: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  43s (x3 over 73s)  horizontal-pod-autoscaler  failed to get memory utilization: unable to get metrics for resource memory: no metrics returned from resource metrics API

The events are transient while the pod metrics start to be reported by cadvisor.  If you delay the creation of the hpa, the events won't appear.

The hpa is `ReadyForNewScale=True` and `ValidMetricFound=True` and calculated desired count properly `ReplicationController pods:  1 current / 1 desired`

What is the problem exactly?

Comment 5 Weinan Liu 2019-05-10 07:39:53 UTC
[nathan@localhost 0510]$ oc get hpa
NAME              REFERENCE                               TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
resource-memory   ReplicationController/hello-openshift   <unknown>/500Mi   1         10        1          24h

Hi Seth,
We can not get current memory usage, so the memory autoscaling is not working

Comment 6 Seth Jennings 2019-05-10 13:34:30 UTC
Joel can you look into why we are getting <unknown> even though ScalingActive=True which suggest we can get the metric?

Comment 8 Seth Jennings 2019-05-16 13:43:43 UTC
*** Bug 1701469 has been marked as a duplicate of this bug. ***

Comment 9 Seth Jennings 2019-06-18 19:50:12 UTC
Just tried this on a 4.2 CI build and it works fine

$ ogcv
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.ci-2019-06-18-161924   True        False         39m     Cluster version is 4.2.0-0.ci-2019-06-18-161924

$ oc describe hpa
Name:                        resource-memory
Namespace:                   demo
Labels:                      <none>
Annotations:                 <none>
CreationTimestamp:           Tue, 18 Jun 2019 14:46:47 -0500
Reference:                   ReplicationController/hello-openshift
Metrics:                     ( current / target )
  resource memory on pods:   1355776 / 512Mi <----
Min replicas:                1
Max replicas:                3
ReplicationController pods:  1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>