1780742 – HPA not working as expected when initContainers are defined

Bug 1780742 - HPA not working as expected when initContainers are defined

Summary: HPA not working as expected when initContainers are defined

Keywords:
Status:	CLOSED DUPLICATE of bug 1749468
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Joel Smith
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-06 19:07 UTC by Anand Paladugu
Modified:	2023-10-06 18:52 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-02-27 16:43:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Anand Paladugu 2019-12-06 19:07:57 UTC

Description of problem:

HPA not working with stateful set, docs don't mention stateful set support with HPA

Version-Release number of selected component (if applicable):

OCP 4.1


How reproducible:

Always


Steps to Reproduce:
1. Create a stateful set with resource requests (mem, cpu)
2. Create HPA and provide the stateful set as target ref
3. Notice the status in HPA

Actual results:

HPA not working with the stateful set.

Expected results:

The customer is coming from native Kubernetes env and he claims it worked in that env and expecting the same in OCP.


Additional info:

Doc ref about only deployment config, replication controller and replica set being supported with HPA.

"As a developer, you can use a horizontal pod autoscaler (HPA) to specify how OpenShift Container Platform should automatically increase or decrease the scale of a replication controller or deployment configuration"

The doc link is: https://docs.openshift.com/container-platform/4.1/nodes/pods/nodes-pods-autoscaling.html

Comment 1 Anand Paladugu 2019-12-09 02:41:52 UTC

In my quicklab OCP 4.1 setup or OCP 4.2 AWS IPI setup,  I see autoscale/v1 as the api ref in the HPA definition,  but the customer sees autoscale/v2beta2.  I am not sure how customer is getting that API version.  That may be what is causing the issue for HPA to not only not work with statefulset, but it's also not working with deployment config.

Customer attached must gather to the case.

Comment 3 Anand Paladugu 2019-12-09 17:21:32 UTC

Neelesh

I can check with the customer to see if they want to avail statefulset support with HPA in 4.4.    Can you help with fixing the deployment config issue?

Comment 4 Ryan Phillips 2019-12-09 19:38:46 UTC

There is an upstream issue [1], and a potential PR [2].

1. https://github.com/kubernetes/kubernetes/issues/79365
2. https://github.com/kubernetes/kubernetes/pull/86044

Comment 5 Anand Paladugu 2019-12-10 13:48:32 UTC

@Ryan

A simple deployment config (below) worked,  so the customer is going to retry his case based on this example.  The question we now have to answer is if HPA supports statefulset.





1.  oc new-project my-hpa

2.  echo '---
kind: LimitRange
apiVersion: v1
metadata:
  name: limits
spec:
  limits:
  - type: Pod
    max:
      cpu: 100m
      memory: 750Mi
    min:
      cpu: 10m
      memory: 5Mi
  - type: Container
    max:
      cpu: 100m
      memory: 750Mi
    min:
      cpu: 10m
      memory: 5Mi
    default:
      cpu: 50m
      memory: 100Mi
' | oc create -f - -n my-hpa

3.  oc new-app quay.io/gpte-devops-automation/pod-autoscale-lab:rc0 --name=pod-autoscale -n my-hpa

4.  oc expose svc pod-autoscale

5. oc autoscale dc/pod-autoscale --min 1 --max 5 --cpu-percent=60

6. oc describe hpa pod-autoscale -n my-hpa

7. oc rsh -n my-hpa $(oc get ep pod-autoscale -n my-hpa -o jsonpath='{ .subsets[].addresses[0].targetRef.name }')

    $ while true;do true;done

8. The above #7 should increase CPU utilization in the pod and HPA should scale to add another pod.

Comment 6 Ryan Phillips 2019-12-10 15:29:26 UTC

HPA does support statefulsets.

Comment 7 Anand Paladugu 2019-12-16 18:42:51 UTC

@Ryan  As discussed I have updated this bug title and here is how I tested the HPA not working when initContainers are defined.


1. Working scenario:

a. Created limitranges as below:

> oc describe limitranges limits
Name:       limits
Namespace:  my-hpa
Type        Resource  Min  Max    Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---  ---    ---------------  -------------  -----------------------
Pod         cpu       10m  100m   -                -              -
Pod         memory    5Mi  750Mi  -                -              -
Container   cpu       10m  100m   50m              50m            -
Container   memory    5Mi  750Mi  100Mi            100Mi          -

b. Created new app as below:

oc new-app quay.io/gpte-devops-automation/pod-autoscale-lab:rc0 --name=pod-autoscale -n my-hpa


c. Create HPA as below:

oc autoscale dc/pod-autoscale --min 1 --max 5 --cpu-percent=60

d. Create load on the pod.  rsh to pod and "while true; do true;done"

e. oc adm top pod shows below:

> oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
pod-autoscale-4-d4k79   33m          9Mi

f. autoscaling kicks in

> oc get pods
NAME                    READY   STATUS              RESTARTS   AGE
pod-autoscale-4-2p9tv   0/1     ContainerCreating   0          7s
pod-autoscale-4-d4k79   1/1     Running             0          102s

g. Events show these:

5m44s       Normal    ReplicationControllerScaled    deploymentconfig/pod-autoscale          Scaled replication controller "pod-autoscale-4" from 0 to 1
4m9s        Normal    ReplicationControllerScaled    deploymentconfig/pod-autoscale          Scaled replication controller "pod-autoscale-4" from 1 to 2




2. Broken scenario:

a. Scaled down the dc and modified the above dc and added initContainers as below.

spec:
      containers:
      - image: quay.io/gpte-devops-automation/pod-autoscale-lab@sha256:33149f578844f045876e6050c9ba4dd43de29f668c4efe64f200b42c94f66b48
        imagePullPolicy: IfNotPresent
        name: pod-autoscale
        ports:
        - containerPort: 8080
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - echo The app is running! && sleep 120
        image: busybox:1.28
        imagePullPolicy: IfNotPresent
        name: init-service
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

b. Scale up the dc to 1.

c. Created load on the pod.  rsh to pod and "while true; do true;done"

d. Pod is taking up 50m CPU but no autoscaling happens.

> oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
pod-autoscale-3-prbfq   50m          15Mi  

> oc adm top pod
NAME                    CPU(cores)   MEMORY(bytes)   
pod-autoscale-3-prbfq   65m          15Mi


e.  See this in the events:

4m44s       Warning   FailedGetResourceMetric        horizontalpodautoscaler/pod-autoscale   did not receive metrics for any ready pods

Comment 8 Anand Paladugu 2020-01-02 19:33:45 UTC

Hi @Joel

Any update?

Thanks

Anand

Comment 9 Anand Paladugu 2020-01-28 14:04:07 UTC

Hi @Joel

Any update?

Thanks

Anand

Comment 10 Joel Smith 2020-02-24 17:40:22 UTC

I believe that I have found the source of this bug and I will be working on potential solutions. The problem stems from podmetrics treating containers and initContainers the same, so once an initContainer has finished running, it has no CPU metrics. When HPA sees that the CPU metrics are missing from the podmetrics, it returns no metrics at all.

See https://github.com/kubernetes/kubernetes/blob/3aa59f7f3077642592dc8a864fcef8ba98699894/pkg/controller/podautoscaler/metrics/rest_metrics_client.go#L84-L86

I'll update this bug once we have a solution proposed.

Comment 11 Joel Smith 2020-02-27 16:43:32 UTC


*** This bug has been marked as a duplicate of bug 1749468 ***

Note You need to log in before you can comment on or make changes to this bug.