Bug 1382855
| Summary: | HPA fails to collect accurate information when there are failed pods in certain situations | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Eric Jones <erjones> |
| Component: | Node | Assignee: | Solly Ross <sross> |
| Status: | CLOSED ERRATA | QA Contact: | DeShuai Ma <dma> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.1.0 | CC: | aos-bugs, decarr, eparis, erich, jokerman, mmccomas, pweil, sjenning, tdawson |
| Target Milestone: | --- | Keywords: | Performance, Unconfirmed |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Horizontal Pod Autoscalers would fail to scale when it could not retrieve metrics for pods matching its target selector. Therefore, dead pods and newly-created pods would cause Horizontal Pod Autoscalers to skip scaling.
The Horizontal Pod Autoscaler controller now has logic which assumes conservative metric values (depending on the state of the pod and the direction of the scale) when metrics are missing or pods are marked as unready or not active, meaning newly created or dead pods will no longer block scaling.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-04-12 19:07:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Eric Jones
2016-10-07 21:51:02 UTC
upstream issue: https://github.com/kubernetes/kubernetes/pull/33593 This has been merged upstream and picked up in the Origin 1.5 rebase Waitting https://bugzilla.redhat.com/show_bug.cgi?id=1419481 fix, then verify this bug. Verify on v3.5.0.19+199197c
Version-Release number of selected component (if applicable):
openshift v3.5.0.19+199197c
kubernetes v1.5.2+43a9be4
etcd 3.1.0
Steps:
1. Create a scale resource
oc run resource-consumer --image=docker.io/ocpqe/resource_consumer:v1 --replicas=1 --expose --port 8080 --requests='cpu=100m,memory=256Mi' -n dma
2. Create hpa for it
oc autoscale dc/resource-consumer --min=1 --max=30 --cpu-percent=80 -n dma
3. Create some pods(with label match dc's selector) consume cpu then pods will become complete
for i in `seq 1 1 3`; do oc create -f pod.yaml -n dma; done
4. Check pods status and watch how hpa scale up/down pods
Actual results:
4. As the pods created in steps consume some cpus, when they become complete, hpa will scale up create new pods.
Then wait about 5mins hpa will scale down pods to 1.
Expected results:
4. As the pods created in steps consume some cpus, when they become complete, hpa will scale up create new pods.
Then wait about 5mins hpa will scale down pod
Additional info:
Detail verify steps result:
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
resource-consumer-1-0223q 1/1 Running 0 5h
[root@host-8-174-253 dma]# oc get hpa -n dma
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
resource-consumer DeploymentConfig/resource-consumer 80% 0% 1 30 5h
[root@host-8-174-253 dma]# for i in `seq 1 1 3`; do oc create -f pod.yaml -n dma; done
pod "hpa-fake-lzpjm" created
pod "hpa-fake-xbw06" created
pod "hpa-fake-trkht" created
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 1/1 Running 0 9s
hpa-fake-trkht 1/1 Running 0 8s
hpa-fake-xbw06 1/1 Running 0 9s
resource-consumer-1-0223q 1/1 Running 0 5h
[root@host-8-174-253 dma]# date
Tue Feb 14 03:46:00 EST 2017
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 1/1 Running 0 17s
hpa-fake-trkht 1/1 Running 0 16s
hpa-fake-xbw06 1/1 Running 0 17s
resource-consumer-1-0223q 1/1 Running 0 5h
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 1/1 Running 0 31s
hpa-fake-trkht 1/1 Running 0 30s
hpa-fake-xbw06 1/1 Running 0 31s
resource-consumer-1-0223q 1/1 Running 0 5h
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 0/1 Completed 0 1m
hpa-fake-trkht 0/1 Completed 0 1m
hpa-fake-xbw06 0/1 Completed 0 1m
resource-consumer-1-0223q 1/1 Running 0 5h
resource-consumer-1-6t1kb 1/1 Running 0 28s
resource-consumer-1-fpbv0 1/1 Running 0 28s
resource-consumer-1-sn2mb 1/1 Running 0 28s
[root@host-8-174-253 dma]# oc get hpa -n dma
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
resource-consumer DeploymentConfig/resource-consumer 80% 0% 1 30 5h
[root@host-8-174-253 dma]# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
run: resource-consumer
generateName: hpa-fake-
spec:
containers:
- image: docker.io/ocpqe/resource_consumer:v1
command:
- /consume-cpu/consume-cpu
args:
- -duration-sec=60
- -millicores=200
imagePullPolicy: IfNotPresent
name: hpa-fake
ports:
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: 100m
memory: 256Mi
securityContext:
capabilities: {}
privileged: false
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
serviceAccount: ""
[root@host-8-174-253 dma]# oc get hpa -n dma
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
resource-consumer DeploymentConfig/resource-consumer 80% 0% 1 30 5h
[root@host-8-174-253 dma]# oc describe hpa resource-consumer -n dma
Name: resource-consumer
Namespace: dma
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 13 Feb 2017 22:10:56 -0500
Reference: DeploymentConfig/resource-consumer
Target CPU utilization: 80%
Current CPU utilization: 0%
Min replicas: 1
Max replicas: 30
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 1h 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 6 (avgCPUutil: 138, current replicas: 1)
57m 57m 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 154, current replicas: 1)
57m 57m 2 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 154, current replicas: 4)
54m 54m 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 3 (avgCPUutil: 0, current replicas: 4)
5h 52m 14 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 2 (avgCPUutil: 0, current replicas: 4)
3h 52m 5 {horizontal-pod-autoscaler } Normal SuccessfulRescale New size: 1; reason: All metrics below target
51m 51m 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 149, current replicas: 1)
49m 49m 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 151, current replicas: 1)
5h 49m 107 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed (events with common reason combined)
3h 4m 54 {horizontal-pod-autoscaler } Warning FailedGetMetrics unable to get metrics for resource cpu: no metrics returned from heapster
3h 2m 373 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 0 (avgCPUutil: 0, current replicas: 1)
5h 1m 8 {horizontal-pod-autoscaler } Normal SuccessfulRescale New size: 4; reason: CPU utilization above target
1h 1m 3 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 152, current replicas: 1)
1h 3s 38 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 0 (avgCPUutil: 0, current replicas: 4)
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 0/1 Completed 0 2m
hpa-fake-trkht 0/1 Completed 0 2m
hpa-fake-xbw06 0/1 Completed 0 2m
resource-consumer-1-0223q 1/1 Running 0 5h
resource-consumer-1-6t1kb 1/1 Running 0 2m
resource-consumer-1-fpbv0 1/1 Running 0 2m
resource-consumer-1-sn2mb 1/1 Running 0 2m
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 0/1 Completed 0 3m
hpa-fake-trkht 0/1 Completed 0 3m
hpa-fake-xbw06 0/1 Completed 0 3m
resource-consumer-1-0223q 1/1 Running 0 5h
resource-consumer-1-6t1kb 1/1 Running 0 2m
resource-consumer-1-fpbv0 1/1 Running 0 2m
resource-consumer-1-sn2mb 1/1 Running 0 2m
[root@host-8-174-253 dma]# date
Tue Feb 14 03:49:25 EST 2017
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 0/1 Completed 0 3m
hpa-fake-trkht 0/1 Completed 0 3m
hpa-fake-xbw06 0/1 Completed 0 3m
resource-consumer-1-0223q 1/1 Running 0 5h
resource-consumer-1-6t1kb 1/1 Running 0 3m
resource-consumer-1-fpbv0 1/1 Running 0 3m
resource-consumer-1-sn2mb 1/1 Running 0 3m
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 0/1 Completed 0 4m
hpa-fake-trkht 0/1 Completed 0 4m
hpa-fake-xbw06 0/1 Completed 0 4m
resource-consumer-1-0223q 1/1 Running 0 6h
resource-consumer-1-6t1kb 1/1 Running 0 3m
resource-consumer-1-fpbv0 1/1 Running 0 3m
resource-consumer-1-sn2mb 1/1 Running 0 3m
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 0/1 Completed 0 4m
hpa-fake-trkht 0/1 Completed 0 4m
hpa-fake-xbw06 0/1 Completed 0 4m
resource-consumer-1-0223q 1/1 Running 0 6h
resource-consumer-1-6t1kb 1/1 Running 0 4m
resource-consumer-1-fpbv0 1/1 Running 0 4m
resource-consumer-1-sn2mb 1/1 Running 0 4m
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]#
[root@host-8-174-253 dma]# oc get pods -n dma
NAME READY STATUS RESTARTS AGE
hpa-fake-lzpjm 0/1 Completed 0 7m
hpa-fake-trkht 0/1 Completed 0 7m
hpa-fake-xbw06 0/1 Completed 0 7m
resource-consumer-1-0223q 1/1 Running 0 6h
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884 |