Bug 1780742
| Summary: | HPA not working as expected when initContainers are defined | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Anand Paladugu <apaladug> |
| Component: | Node | Assignee: | Joel Smith <joelsmith> |
| Status: | CLOSED DUPLICATE | QA Contact: | Sunil Choudhary <schoudha> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.z | CC: | aos-bugs, joelsmith, jokerman, mburke, nagrawal |
| Target Milestone: | --- | ||
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-02-27 16:43:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Anand Paladugu
2019-12-06 19:07:57 UTC
In my quicklab OCP 4.1 setup or OCP 4.2 AWS IPI setup, I see autoscale/v1 as the api ref in the HPA definition, but the customer sees autoscale/v2beta2. I am not sure how customer is getting that API version. That may be what is causing the issue for HPA to not only not work with statefulset, but it's also not working with deployment config. Customer attached must gather to the case. Neelesh I can check with the customer to see if they want to avail statefulset support with HPA in 4.4. Can you help with fixing the deployment config issue? There is an upstream issue [1], and a potential PR [2]. 1. https://github.com/kubernetes/kubernetes/issues/79365 2. https://github.com/kubernetes/kubernetes/pull/86044 @Ryan
A simple deployment config (below) worked, so the customer is going to retry his case based on this example. The question we now have to answer is if HPA supports statefulset.
1. oc new-project my-hpa
2. echo '---
kind: LimitRange
apiVersion: v1
metadata:
name: limits
spec:
limits:
- type: Pod
max:
cpu: 100m
memory: 750Mi
min:
cpu: 10m
memory: 5Mi
- type: Container
max:
cpu: 100m
memory: 750Mi
min:
cpu: 10m
memory: 5Mi
default:
cpu: 50m
memory: 100Mi
' | oc create -f - -n my-hpa
3. oc new-app quay.io/gpte-devops-automation/pod-autoscale-lab:rc0 --name=pod-autoscale -n my-hpa
4. oc expose svc pod-autoscale
5. oc autoscale dc/pod-autoscale --min 1 --max 5 --cpu-percent=60
6. oc describe hpa pod-autoscale -n my-hpa
7. oc rsh -n my-hpa $(oc get ep pod-autoscale -n my-hpa -o jsonpath='{ .subsets[].addresses[0].targetRef.name }')
$ while true;do true;done
8. The above #7 should increase CPU utilization in the pod and HPA should scale to add another pod.
HPA does support statefulsets. @Ryan As discussed I have updated this bug title and here is how I tested the HPA not working when initContainers are defined. 1. Working scenario: a. Created limitranges as below: > oc describe limitranges limits Name: limits Namespace: my-hpa Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio ---- -------- --- --- --------------- ------------- ----------------------- Pod cpu 10m 100m - - - Pod memory 5Mi 750Mi - - - Container cpu 10m 100m 50m 50m - Container memory 5Mi 750Mi 100Mi 100Mi - b. Created new app as below: oc new-app quay.io/gpte-devops-automation/pod-autoscale-lab:rc0 --name=pod-autoscale -n my-hpa c. Create HPA as below: oc autoscale dc/pod-autoscale --min 1 --max 5 --cpu-percent=60 d. Create load on the pod. rsh to pod and "while true; do true;done" e. oc adm top pod shows below: > oc adm top pod NAME CPU(cores) MEMORY(bytes) pod-autoscale-4-d4k79 33m 9Mi f. autoscaling kicks in > oc get pods NAME READY STATUS RESTARTS AGE pod-autoscale-4-2p9tv 0/1 ContainerCreating 0 7s pod-autoscale-4-d4k79 1/1 Running 0 102s g. Events show these: 5m44s Normal ReplicationControllerScaled deploymentconfig/pod-autoscale Scaled replication controller "pod-autoscale-4" from 0 to 1 4m9s Normal ReplicationControllerScaled deploymentconfig/pod-autoscale Scaled replication controller "pod-autoscale-4" from 1 to 2 2. Broken scenario: a. Scaled down the dc and modified the above dc and added initContainers as below. spec: containers: - image: quay.io/gpte-devops-automation/pod-autoscale-lab@sha256:33149f578844f045876e6050c9ba4dd43de29f668c4efe64f200b42c94f66b48 imagePullPolicy: IfNotPresent name: pod-autoscale ports: - containerPort: 8080 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst initContainers: - command: - sh - -c - echo The app is running! && sleep 120 image: busybox:1.28 imagePullPolicy: IfNotPresent name: init-service resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 b. Scale up the dc to 1. c. Created load on the pod. rsh to pod and "while true; do true;done" d. Pod is taking up 50m CPU but no autoscaling happens. > oc adm top pod NAME CPU(cores) MEMORY(bytes) pod-autoscale-3-prbfq 50m 15Mi > oc adm top pod NAME CPU(cores) MEMORY(bytes) pod-autoscale-3-prbfq 65m 15Mi e. See this in the events: 4m44s Warning FailedGetResourceMetric horizontalpodautoscaler/pod-autoscale did not receive metrics for any ready pods Hi @Joel Any update? Thanks Anand Hi @Joel Any update? Thanks Anand I believe that I have found the source of this bug and I will be working on potential solutions. The problem stems from podmetrics treating containers and initContainers the same, so once an initContainer has finished running, it has no CPU metrics. When HPA sees that the CPU metrics are missing from the podmetrics, it returns no metrics at all. See https://github.com/kubernetes/kubernetes/blob/3aa59f7f3077642592dc8a864fcef8ba98699894/pkg/controller/podautoscaler/metrics/rest_metrics_client.go#L84-L86 I'll update this bug once we have a solution proposed. *** This bug has been marked as a duplicate of bug 1749468 *** |