Description of problem: HPA not working with stateful set, docs don't mention stateful set support with HPA Version-Release number of selected component (if applicable): OCP 4.1 How reproducible: Always Steps to Reproduce: 1. Create a stateful set with resource requests (mem, cpu) 2. Create HPA and provide the stateful set as target ref 3. Notice the status in HPA Actual results: HPA not working with the stateful set. Expected results: The customer is coming from native Kubernetes env and he claims it worked in that env and expecting the same in OCP. Additional info: Doc ref about only deployment config, replication controller and replica set being supported with HPA. "As a developer, you can use a horizontal pod autoscaler (HPA) to specify how OpenShift Container Platform should automatically increase or decrease the scale of a replication controller or deployment configuration" The doc link is: https://docs.openshift.com/container-platform/4.1/nodes/pods/nodes-pods-autoscaling.html
In my quicklab OCP 4.1 setup or OCP 4.2 AWS IPI setup, I see autoscale/v1 as the api ref in the HPA definition, but the customer sees autoscale/v2beta2. I am not sure how customer is getting that API version. That may be what is causing the issue for HPA to not only not work with statefulset, but it's also not working with deployment config. Customer attached must gather to the case.
Neelesh I can check with the customer to see if they want to avail statefulset support with HPA in 4.4. Can you help with fixing the deployment config issue?
There is an upstream issue [1], and a potential PR [2]. 1. https://github.com/kubernetes/kubernetes/issues/79365 2. https://github.com/kubernetes/kubernetes/pull/86044
@Ryan A simple deployment config (below) worked, so the customer is going to retry his case based on this example. The question we now have to answer is if HPA supports statefulset. 1. oc new-project my-hpa 2. echo '--- kind: LimitRange apiVersion: v1 metadata: name: limits spec: limits: - type: Pod max: cpu: 100m memory: 750Mi min: cpu: 10m memory: 5Mi - type: Container max: cpu: 100m memory: 750Mi min: cpu: 10m memory: 5Mi default: cpu: 50m memory: 100Mi ' | oc create -f - -n my-hpa 3. oc new-app quay.io/gpte-devops-automation/pod-autoscale-lab:rc0 --name=pod-autoscale -n my-hpa 4. oc expose svc pod-autoscale 5. oc autoscale dc/pod-autoscale --min 1 --max 5 --cpu-percent=60 6. oc describe hpa pod-autoscale -n my-hpa 7. oc rsh -n my-hpa $(oc get ep pod-autoscale -n my-hpa -o jsonpath='{ .subsets[].addresses[0].targetRef.name }') $ while true;do true;done 8. The above #7 should increase CPU utilization in the pod and HPA should scale to add another pod.
HPA does support statefulsets.
@Ryan As discussed I have updated this bug title and here is how I tested the HPA not working when initContainers are defined. 1. Working scenario: a. Created limitranges as below: > oc describe limitranges limits Name: limits Namespace: my-hpa Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio ---- -------- --- --- --------------- ------------- ----------------------- Pod cpu 10m 100m - - - Pod memory 5Mi 750Mi - - - Container cpu 10m 100m 50m 50m - Container memory 5Mi 750Mi 100Mi 100Mi - b. Created new app as below: oc new-app quay.io/gpte-devops-automation/pod-autoscale-lab:rc0 --name=pod-autoscale -n my-hpa c. Create HPA as below: oc autoscale dc/pod-autoscale --min 1 --max 5 --cpu-percent=60 d. Create load on the pod. rsh to pod and "while true; do true;done" e. oc adm top pod shows below: > oc adm top pod NAME CPU(cores) MEMORY(bytes) pod-autoscale-4-d4k79 33m 9Mi f. autoscaling kicks in > oc get pods NAME READY STATUS RESTARTS AGE pod-autoscale-4-2p9tv 0/1 ContainerCreating 0 7s pod-autoscale-4-d4k79 1/1 Running 0 102s g. Events show these: 5m44s Normal ReplicationControllerScaled deploymentconfig/pod-autoscale Scaled replication controller "pod-autoscale-4" from 0 to 1 4m9s Normal ReplicationControllerScaled deploymentconfig/pod-autoscale Scaled replication controller "pod-autoscale-4" from 1 to 2 2. Broken scenario: a. Scaled down the dc and modified the above dc and added initContainers as below. spec: containers: - image: quay.io/gpte-devops-automation/pod-autoscale-lab@sha256:33149f578844f045876e6050c9ba4dd43de29f668c4efe64f200b42c94f66b48 imagePullPolicy: IfNotPresent name: pod-autoscale ports: - containerPort: 8080 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst initContainers: - command: - sh - -c - echo The app is running! && sleep 120 image: busybox:1.28 imagePullPolicy: IfNotPresent name: init-service resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 b. Scale up the dc to 1. c. Created load on the pod. rsh to pod and "while true; do true;done" d. Pod is taking up 50m CPU but no autoscaling happens. > oc adm top pod NAME CPU(cores) MEMORY(bytes) pod-autoscale-3-prbfq 50m 15Mi > oc adm top pod NAME CPU(cores) MEMORY(bytes) pod-autoscale-3-prbfq 65m 15Mi e. See this in the events: 4m44s Warning FailedGetResourceMetric horizontalpodautoscaler/pod-autoscale did not receive metrics for any ready pods
Hi @Joel Any update? Thanks Anand
I believe that I have found the source of this bug and I will be working on potential solutions. The problem stems from podmetrics treating containers and initContainers the same, so once an initContainer has finished running, it has no CPU metrics. When HPA sees that the CPU metrics are missing from the podmetrics, it returns no metrics at all. See https://github.com/kubernetes/kubernetes/blob/3aa59f7f3077642592dc8a864fcef8ba98699894/pkg/controller/podautoscaler/metrics/rest_metrics_client.go#L84-L86 I'll update this bug once we have a solution proposed.
*** This bug has been marked as a duplicate of bug 1749468 ***