Description of problem: HPA claims "failed to get CPU consumption and request: failed to get metrics..." But the customer provided the following information: "The root of the problem was that the ab-service project had 4 dead pods in there that failed when the hosts had the cpu resource exhaustion. HPA was attempting to pull metrics on these dead pods. It's not very resilient or forgiving, any sort of error and it won't complete the update. I resolved this by deleting the 4 dead pods. Now hpa gets good status and completed the minimal scale out of 2" Additional details included in attachment (coming soon)
upstream issue: https://github.com/kubernetes/kubernetes/pull/33593
This has been merged upstream and picked up in the Origin 1.5 rebase
Waitting https://bugzilla.redhat.com/show_bug.cgi?id=1419481 fix, then verify this bug.
Verify on v3.5.0.19+199197c Version-Release number of selected component (if applicable): openshift v3.5.0.19+199197c kubernetes v1.5.2+43a9be4 etcd 3.1.0 Steps: 1. Create a scale resource oc run resource-consumer --image=docker.io/ocpqe/resource_consumer:v1 --replicas=1 --expose --port 8080 --requests='cpu=100m,memory=256Mi' -n dma 2. Create hpa for it oc autoscale dc/resource-consumer --min=1 --max=30 --cpu-percent=80 -n dma 3. Create some pods(with label match dc's selector) consume cpu then pods will become complete for i in `seq 1 1 3`; do oc create -f pod.yaml -n dma; done 4. Check pods status and watch how hpa scale up/down pods Actual results: 4. As the pods created in steps consume some cpus, when they become complete, hpa will scale up create new pods. Then wait about 5mins hpa will scale down pods to 1. Expected results: 4. As the pods created in steps consume some cpus, when they become complete, hpa will scale up create new pods. Then wait about 5mins hpa will scale down pod Additional info: Detail verify steps result: [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE resource-consumer-1-0223q 1/1 Running 0 5h [root@host-8-174-253 dma]# oc get hpa -n dma NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE resource-consumer DeploymentConfig/resource-consumer 80% 0% 1 30 5h [root@host-8-174-253 dma]# for i in `seq 1 1 3`; do oc create -f pod.yaml -n dma; done pod "hpa-fake-lzpjm" created pod "hpa-fake-xbw06" created pod "hpa-fake-trkht" created [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 1/1 Running 0 9s hpa-fake-trkht 1/1 Running 0 8s hpa-fake-xbw06 1/1 Running 0 9s resource-consumer-1-0223q 1/1 Running 0 5h [root@host-8-174-253 dma]# date Tue Feb 14 03:46:00 EST 2017 [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 1/1 Running 0 17s hpa-fake-trkht 1/1 Running 0 16s hpa-fake-xbw06 1/1 Running 0 17s resource-consumer-1-0223q 1/1 Running 0 5h [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 1/1 Running 0 31s hpa-fake-trkht 1/1 Running 0 30s hpa-fake-xbw06 1/1 Running 0 31s resource-consumer-1-0223q 1/1 Running 0 5h [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 0/1 Completed 0 1m hpa-fake-trkht 0/1 Completed 0 1m hpa-fake-xbw06 0/1 Completed 0 1m resource-consumer-1-0223q 1/1 Running 0 5h resource-consumer-1-6t1kb 1/1 Running 0 28s resource-consumer-1-fpbv0 1/1 Running 0 28s resource-consumer-1-sn2mb 1/1 Running 0 28s [root@host-8-174-253 dma]# oc get hpa -n dma NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE resource-consumer DeploymentConfig/resource-consumer 80% 0% 1 30 5h [root@host-8-174-253 dma]# cat pod.yaml apiVersion: v1 kind: Pod metadata: labels: run: resource-consumer generateName: hpa-fake- spec: containers: - image: docker.io/ocpqe/resource_consumer:v1 command: - /consume-cpu/consume-cpu args: - -duration-sec=60 - -millicores=200 imagePullPolicy: IfNotPresent name: hpa-fake ports: - containerPort: 8080 protocol: TCP resources: requests: cpu: 100m memory: 256Mi securityContext: capabilities: {} privileged: false terminationMessagePath: /dev/termination-log dnsPolicy: ClusterFirst restartPolicy: OnFailure serviceAccount: "" [root@host-8-174-253 dma]# oc get hpa -n dma NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE resource-consumer DeploymentConfig/resource-consumer 80% 0% 1 30 5h [root@host-8-174-253 dma]# oc describe hpa resource-consumer -n dma Name: resource-consumer Namespace: dma Labels: <none> Annotations: <none> CreationTimestamp: Mon, 13 Feb 2017 22:10:56 -0500 Reference: DeploymentConfig/resource-consumer Target CPU utilization: 80% Current CPU utilization: 0% Min replicas: 1 Max replicas: 30 Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 6 (avgCPUutil: 138, current replicas: 1) 57m 57m 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 154, current replicas: 1) 57m 57m 2 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 154, current replicas: 4) 54m 54m 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 3 (avgCPUutil: 0, current replicas: 4) 5h 52m 14 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 2 (avgCPUutil: 0, current replicas: 4) 3h 52m 5 {horizontal-pod-autoscaler } Normal SuccessfulRescale New size: 1; reason: All metrics below target 51m 51m 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 149, current replicas: 1) 49m 49m 1 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 151, current replicas: 1) 5h 49m 107 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed (events with common reason combined) 3h 4m 54 {horizontal-pod-autoscaler } Warning FailedGetMetrics unable to get metrics for resource cpu: no metrics returned from heapster 3h 2m 373 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 0 (avgCPUutil: 0, current replicas: 1) 5h 1m 8 {horizontal-pod-autoscaler } Normal SuccessfulRescale New size: 4; reason: CPU utilization above target 1h 1m 3 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 8 (avgCPUutil: 152, current replicas: 1) 1h 3s 38 {horizontal-pod-autoscaler } Normal DesiredReplicasComputed Computed the desired num of replicas: 0 (avgCPUutil: 0, current replicas: 4) [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 0/1 Completed 0 2m hpa-fake-trkht 0/1 Completed 0 2m hpa-fake-xbw06 0/1 Completed 0 2m resource-consumer-1-0223q 1/1 Running 0 5h resource-consumer-1-6t1kb 1/1 Running 0 2m resource-consumer-1-fpbv0 1/1 Running 0 2m resource-consumer-1-sn2mb 1/1 Running 0 2m [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 0/1 Completed 0 3m hpa-fake-trkht 0/1 Completed 0 3m hpa-fake-xbw06 0/1 Completed 0 3m resource-consumer-1-0223q 1/1 Running 0 5h resource-consumer-1-6t1kb 1/1 Running 0 2m resource-consumer-1-fpbv0 1/1 Running 0 2m resource-consumer-1-sn2mb 1/1 Running 0 2m [root@host-8-174-253 dma]# date Tue Feb 14 03:49:25 EST 2017 [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 0/1 Completed 0 3m hpa-fake-trkht 0/1 Completed 0 3m hpa-fake-xbw06 0/1 Completed 0 3m resource-consumer-1-0223q 1/1 Running 0 5h resource-consumer-1-6t1kb 1/1 Running 0 3m resource-consumer-1-fpbv0 1/1 Running 0 3m resource-consumer-1-sn2mb 1/1 Running 0 3m [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 0/1 Completed 0 4m hpa-fake-trkht 0/1 Completed 0 4m hpa-fake-xbw06 0/1 Completed 0 4m resource-consumer-1-0223q 1/1 Running 0 6h resource-consumer-1-6t1kb 1/1 Running 0 3m resource-consumer-1-fpbv0 1/1 Running 0 3m resource-consumer-1-sn2mb 1/1 Running 0 3m [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 0/1 Completed 0 4m hpa-fake-trkht 0/1 Completed 0 4m hpa-fake-xbw06 0/1 Completed 0 4m resource-consumer-1-0223q 1/1 Running 0 6h resource-consumer-1-6t1kb 1/1 Running 0 4m resource-consumer-1-fpbv0 1/1 Running 0 4m resource-consumer-1-sn2mb 1/1 Running 0 4m [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# [root@host-8-174-253 dma]# oc get pods -n dma NAME READY STATUS RESTARTS AGE hpa-fake-lzpjm 0/1 Completed 0 7m hpa-fake-trkht 0/1 Completed 0 7m hpa-fake-xbw06 0/1 Completed 0 7m resource-consumer-1-0223q 1/1 Running 0 6h
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884