Bug 1780742

Summary:	HPA not working as expected when initContainers are defined
Product:	OpenShift Container Platform	Reporter:	Anand Paladugu <apaladug>
Component:	Node	Assignee:	Joel Smith <joelsmith>
Status:	CLOSED DUPLICATE	QA Contact:	Sunil Choudhary <schoudha>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.1.z	CC:	aos-bugs, joelsmith, jokerman, mburke, nagrawal
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-02-27 16:43:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Anand Paladugu 2019-12-06 19:07:57 UTC

Description of problem:

HPA not working with stateful set, docs don't mention stateful set support with HPA

Version-Release number of selected component (if applicable):

OCP 4.1


How reproducible:

Always


Steps to Reproduce:
1. Create a stateful set with resource requests (mem, cpu)
2. Create HPA and provide the stateful set as target ref
3. Notice the status in HPA

Actual results:

HPA not working with the stateful set.

Expected results:

The customer is coming from native Kubernetes env and he claims it worked in that env and expecting the same in OCP.


Additional info:

Doc ref about only deployment config, replication controller and replica set being supported with HPA.

"As a developer, you can use a horizontal pod autoscaler (HPA) to specify how OpenShift Container Platform should automatically increase or decrease the scale of a replication controller or deployment configuration"

The doc link is: https://docs.openshift.com/container-platform/4.1/nodes/pods/nodes-pods-autoscaling.html

Comment 1 Anand Paladugu 2019-12-09 02:41:52 UTC

In my quicklab OCP 4.1 setup or OCP 4.2 AWS IPI setup,  I see autoscale/v1 as the api ref in the HPA definition,  but the customer sees autoscale/v2beta2.  I am not sure how customer is getting that API version.  That may be what is causing the issue for HPA to not only not work with statefulset, but it's also not working with deployment config.

Customer attached must gather to the case.

Comment 3 Anand Paladugu 2019-12-09 17:21:32 UTC

Neelesh

I can check with the customer to see if they want to avail statefulset support with HPA in 4.4.    Can you help with fixing the deployment config issue?

Comment 4 Ryan Phillips 2019-12-09 19:38:46 UTC

There is an upstream issue [1], and a potential PR [2].

1. https://github.com/kubernetes/kubernetes/issues/79365
2. https://github.com/kubernetes/kubernetes/pull/86044

Comment 5 Anand Paladugu 2019-12-10 13:48:32 UTC

@Ryan

A simple deployment config (below) worked,  so the customer is going to retry his case based on this example.  The question we now have to answer is if HPA supports statefulset.





1.  oc new-project my-hpa

2.  echo '---
kind: LimitRange
apiVersion: v1
metadata:
  name: limits
spec:
  limits:
  - type: Pod
    max:
      cpu: 100m
      memory: 750Mi
    min:
      cpu: 10m
      memory: 5Mi
  - type: Container
    max:
      cpu: 100m
      memory: 750Mi
    min:
      cpu: 10m
      memory: 5Mi
    default:
      cpu: 50m
      memory: 100Mi
' | oc create -f - -n my-hpa

3.  oc new-app quay.io/gpte-devops-automation/pod-autoscale-lab:rc0 --name=pod-autoscale -n my-hpa

4.  oc expose svc pod-autoscale

5. oc autoscale dc/pod-autoscale --min 1 --max 5 --cpu-percent=60

6. oc describe hpa pod-autoscale -n my-hpa

7. oc rsh -n my-hpa $(oc get ep pod-autoscale -n my-hpa -o jsonpath='{ .subsets[].addresses[0].targetRef.name }')

    $ while true;do true;done

8. The above #7 should increase CPU utilization in the pod and HPA should scale to add another pod.

Comment 6 Ryan Phillips 2019-12-10 15:29:26 UTC

HPA does support statefulsets.

Comment 7 Anand Paladugu 2019-12-16 18:42:51 UTC

@Ryan  As discussed I have updated this bug title and here is how I tested the HPA not working when initContainers are defined.


1. Working scenario:

a. Created limitranges as below:

> oc describe limitranges limits
Name:       limits
Namespace:  my-hpa
Type        Resource  Min  Max    Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---  ---    ---------------  -------------  -----------------------
Pod         cpu       10m  100m   -                -              -
Pod         memory    5Mi  750Mi  -                -              -
Container   cpu       10m  100m   50m              50m            -
Container   memory    5Mi  750Mi  100Mi            100Mi          -

b. Created new app as below:

oc new-app quay.io/gpte-devops-automation/pod-autoscale-lab:rc0 --name=pod-autoscale -n my-hpa


c. Create HPA as below:

oc autoscale dc/pod-autoscale --min 1 --max 5 --cpu-percent=60

d. Create load on the pod.  rsh to pod and "while true; do true;done"

e. oc adm top pod shows below:

> oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
pod-autoscale-4-d4k79   33m          9Mi

f. autoscaling kicks in

> oc get pods
NAME                    READY   STATUS              RESTARTS   AGE
pod-autoscale-4-2p9tv   0/1     ContainerCreating   0          7s
pod-autoscale-4-d4k79   1/1     Running             0          102s

g. Events show these:

5m44s       Normal    ReplicationControllerScaled    deploymentconfig/pod-autoscale          Scaled replication controller "pod-autoscale-4" from 0 to 1
4m9s        Normal    ReplicationControllerScaled    deploymentconfig/pod-autoscale          Scaled replication controller "pod-autoscale-4" from 1 to 2




2. Broken scenario:

a. Scaled down the dc and modified the above dc and added initContainers as below.

spec:
      containers:
      - image: quay.io/gpte-devops-automation/pod-autoscale-lab@sha256:33149f578844f045876e6050c9ba4dd43de29f668c4efe64f200b42c94f66b48
        imagePullPolicy: IfNotPresent
        name: pod-autoscale
        ports:
        - containerPort: 8080
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - echo The app is running! && sleep 120
        image: busybox:1.28
        imagePullPolicy: IfNotPresent
        name: init-service
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

b. Scale up the dc to 1.

c. Created load on the pod.  rsh to pod and "while true; do true;done"

d. Pod is taking up 50m CPU but no autoscaling happens.

> oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
pod-autoscale-3-prbfq   50m          15Mi  

> oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
pod-autoscale-3-prbfq   65m          15Mi


e.  See this in the events:

4m44s       Warning   FailedGetResourceMetric        horizontalpodautoscaler/pod-autoscale   did not receive metrics for any ready pods

Comment 8 Anand Paladugu 2020-01-02 19:33:45 UTC

Hi @Joel

Any update?

Thanks

Anand

Comment 9 Anand Paladugu 2020-01-28 14:04:07 UTC

Hi @Joel

Any update?

Thanks

Anand

Comment 10 Joel Smith 2020-02-24 17:40:22 UTC

I believe that I have found the source of this bug and I will be working on potential solutions. The problem stems from podmetrics treating containers and initContainers the same, so once an initContainer has finished running, it has no CPU metrics. When HPA sees that the CPU metrics are missing from the podmetrics, it returns no metrics at all.

See https://github.com/kubernetes/kubernetes/blob/3aa59f7f3077642592dc8a864fcef8ba98699894/pkg/controller/podautoscaler/metrics/rest_metrics_client.go#L84-L86

I'll update this bug once we have a solution proposed.

Comment 11 Joel Smith 2020-02-27 16:43:32 UTC


*** This bug has been marked as a duplicate of bug 1749468 ***