Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1382855

Summary:	HPA fails to collect accurate information when there are failed pods in certain situations
Product:	OpenShift Container Platform	Reporter:	Eric Jones <erjones>
Component:	Node	Assignee:	Solly Ross <sross>
Status:	CLOSED ERRATA	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.1.0	CC:	aos-bugs, decarr, eparis, erich, jokerman, mmccomas, pweil, sjenning, tdawson
Target Milestone:	---	Keywords:	Performance, Unconfirmed
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Horizontal Pod Autoscalers would fail to scale when it could not retrieve metrics for pods matching its target selector. Therefore, dead pods and newly-created pods would cause Horizontal Pod Autoscalers to skip scaling. The Horizontal Pod Autoscaler controller now has logic which assumes conservative metric values (depending on the state of the pod and the direction of the scale) when metrics are missing or pods are marked as unready or not active, meaning newly created or dead pods will no longer block scaling.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-04-12 19:07:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric Jones 2016-10-07 21:51:02 UTC

Description of problem:
HPA claims "failed to get CPU consumption and request: failed to get metrics..." But the customer provided the following information:

"The root of the problem was that the ab-service project had 4 dead pods in there that failed when the hosts had the cpu resource exhaustion.  HPA was attempting to pull metrics on these dead pods.  It's not very resilient or forgiving, any sort of error and it won't complete the update.

I resolved this by deleting the 4 dead pods.  Now hpa gets good status and completed the minimal scale out of 2"

Additional details included in attachment (coming soon)

Comment 2 DeShuai Ma 2016-10-27 02:35:58 UTC

upstream issue: https://github.com/kubernetes/kubernetes/pull/33593

Comment 4 Seth Jennings 2017-01-27 03:29:50 UTC

This has been merged upstream and picked up in the Origin 1.5 rebase

Comment 5 DeShuai Ma 2017-02-07 01:56:58 UTC

Waitting https://bugzilla.redhat.com/show_bug.cgi?id=1419481 fix, then verify this bug.

Comment 6 DeShuai Ma 2017-02-14 09:03:18 UTC

Verify on v3.5.0.19+199197c

Version-Release number of selected component (if applicable):
openshift v3.5.0.19+199197c
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Steps:
1. Create a scale resource
oc run resource-consumer --image=docker.io/ocpqe/resource_consumer:v1 --replicas=1 --expose --port 8080 --requests='cpu=100m,memory=256Mi' -n dma

2. Create hpa for it
oc autoscale dc/resource-consumer --min=1 --max=30 --cpu-percent=80 -n dma

3. Create some pods(with label match dc's selector) consume cpu then pods will become complete
for i in `seq 1 1 3`; do oc create -f pod.yaml -n dma; done

4. Check pods status and watch how hpa scale up/down pods


Actual results:
4. As the pods created in steps consume some cpus, when they become complete, hpa will scale up create new pods.
Then wait about 5mins hpa will scale down pods to 1.

Expected results:
4. As the pods created in steps consume some cpus, when they become complete, hpa will scale up create new pods.
Then wait about 5mins hpa will scale down pod

Additional info:
Detail verify steps result:
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS    RESTARTS   AGE
resource-consumer-1-0223q   1/1       Running   0          5h
[root@host-8-174-253 dma]# oc get hpa -n dma
NAME                REFERENCE                            TARGET    CURRENT   MINPODS   MAXPODS   AGE
resource-consumer   DeploymentConfig/resource-consumer   80%       0%        1         30        5h
[root@host-8-174-253 dma]# for i in `seq 1 1 3`; do oc create -f pod.yaml -n dma; done
pod "hpa-fake-lzpjm" created
pod "hpa-fake-xbw06" created
pod "hpa-fake-trkht" created
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS    RESTARTS   AGE
hpa-fake-lzpjm              1/1       Running   0          9s
hpa-fake-trkht              1/1       Running   0          8s
hpa-fake-xbw06              1/1       Running   0          9s
resource-consumer-1-0223q   1/1       Running   0          5h
[root@host-8-174-253 dma]# date
Tue Feb 14 03:46:00 EST 2017
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS    RESTARTS   AGE
hpa-fake-lzpjm              1/1       Running   0          17s
hpa-fake-trkht              1/1       Running   0          16s
hpa-fake-xbw06              1/1       Running   0          17s
resource-consumer-1-0223q   1/1       Running   0          5h
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS    RESTARTS   AGE
hpa-fake-lzpjm              1/1       Running   0          31s
hpa-fake-trkht              1/1       Running   0          30s
hpa-fake-xbw06              1/1       Running   0          31s
resource-consumer-1-0223q   1/1       Running   0          5h
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS      RESTARTS   AGE
hpa-fake-lzpjm              0/1       Completed   0          1m
hpa-fake-trkht              0/1       Completed   0          1m
hpa-fake-xbw06              0/1       Completed   0          1m
resource-consumer-1-0223q   1/1       Running     0          5h
resource-consumer-1-6t1kb   1/1       Running     0          28s
resource-consumer-1-fpbv0   1/1       Running     0          28s
resource-consumer-1-sn2mb   1/1       Running     0          28s
[root@host-8-174-253 dma]# oc get hpa -n dma
NAME                REFERENCE                            TARGET    CURRENT   MINPODS   MAXPODS   AGE
resource-consumer   DeploymentConfig/resource-consumer   80%       0%        1         30        5h
[root@host-8-174-253 dma]# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: resource-consumer
  generateName: hpa-fake-
spec:
  containers:
    - image: docker.io/ocpqe/resource_consumer:v1
      command:
       - /consume-cpu/consume-cpu
      args:
       - -duration-sec=60
       - -millicores=200
      imagePullPolicy: IfNotPresent
      name: hpa-fake
      ports:
        - containerPort: 8080
          protocol: TCP
      resources:
        requests:
          cpu: 100m
          memory: 256Mi
      securityContext:
        capabilities: {}
        privileged: false
      terminationMessagePath: /dev/termination-log
  dnsPolicy: ClusterFirst
  restartPolicy: OnFailure
  serviceAccount: ""
[root@host-8-174-253 dma]# oc get hpa -n dma
NAME                REFERENCE                            TARGET    CURRENT   MINPODS   MAXPODS   AGE
resource-consumer   DeploymentConfig/resource-consumer   80%       0%        1         30        5h
[root@host-8-174-253 dma]# oc describe hpa resource-consumer -n dma
Name:				resource-consumer
Namespace:			dma
Labels:				<none>
Annotations:			<none>
CreationTimestamp:		Mon, 13 Feb 2017 22:10:56 -0500
Reference:			DeploymentConfig/resource-consumer
Target CPU utilization:		80%
Current CPU utilization:	0%
Min replicas:			1
Max replicas:			30
Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  1h		1h		1	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 6 (avgCPUutil: 138, current replicas: 1)
  57m		57m		1	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 8 (avgCPUutil: 154, current replicas: 1)
  57m		57m		2	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 8 (avgCPUutil: 154, current replicas: 4)
  54m		54m		1	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 3 (avgCPUutil: 0, current replicas: 4)
  5h		52m		14	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 2 (avgCPUutil: 0, current replicas: 4)
  3h		52m		5	{horizontal-pod-autoscaler }			Normal		SuccessfulRescale	New size: 1; reason: All metrics below target
  51m		51m		1	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 8 (avgCPUutil: 149, current replicas: 1)
  49m		49m		1	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 8 (avgCPUutil: 151, current replicas: 1)
  5h		49m		107	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	(events with common reason combined)
  3h		4m		54	{horizontal-pod-autoscaler }			Warning		FailedGetMetrics	unable to get metrics for resource cpu: no metrics returned from heapster
  3h		2m		373	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 0 (avgCPUutil: 0, current replicas: 1)
  5h		1m		8	{horizontal-pod-autoscaler }			Normal		SuccessfulRescale	New size: 4; reason: CPU utilization above target
  1h		1m		3	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 8 (avgCPUutil: 152, current replicas: 1)
  1h		3s		38	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 0 (avgCPUutil: 0, current replicas: 4)
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS      RESTARTS   AGE
hpa-fake-lzpjm              0/1       Completed   0          2m
hpa-fake-trkht              0/1       Completed   0          2m
hpa-fake-xbw06              0/1       Completed   0          2m
resource-consumer-1-0223q   1/1       Running     0          5h
resource-consumer-1-6t1kb   1/1       Running     0          2m
resource-consumer-1-fpbv0   1/1       Running     0          2m
resource-consumer-1-sn2mb   1/1       Running     0          2m
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS      RESTARTS   AGE
hpa-fake-lzpjm              0/1       Completed   0          3m
hpa-fake-trkht              0/1       Completed   0          3m
hpa-fake-xbw06              0/1       Completed   0          3m
resource-consumer-1-0223q   1/1       Running     0          5h
resource-consumer-1-6t1kb   1/1       Running     0          2m
resource-consumer-1-fpbv0   1/1       Running     0          2m
resource-consumer-1-sn2mb   1/1       Running     0          2m
[root@host-8-174-253 dma]# date
Tue Feb 14 03:49:25 EST 2017
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS      RESTARTS   AGE
hpa-fake-lzpjm              0/1       Completed   0          3m
hpa-fake-trkht              0/1       Completed   0          3m
hpa-fake-xbw06              0/1       Completed   0          3m
resource-consumer-1-0223q   1/1       Running     0          5h
resource-consumer-1-6t1kb   1/1       Running     0          3m
resource-consumer-1-fpbv0   1/1       Running     0          3m
resource-consumer-1-sn2mb   1/1       Running     0          3m
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS      RESTARTS   AGE
hpa-fake-lzpjm              0/1       Completed   0          4m
hpa-fake-trkht              0/1       Completed   0          4m
hpa-fake-xbw06              0/1       Completed   0          4m
resource-consumer-1-0223q   1/1       Running     0          6h
resource-consumer-1-6t1kb   1/1       Running     0          3m
resource-consumer-1-fpbv0   1/1       Running     0          3m
resource-consumer-1-sn2mb   1/1       Running     0          3m
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS      RESTARTS   AGE
hpa-fake-lzpjm              0/1       Completed   0          4m
hpa-fake-trkht              0/1       Completed   0          4m
hpa-fake-xbw06              0/1       Completed   0          4m
resource-consumer-1-0223q   1/1       Running     0          6h
resource-consumer-1-6t1kb   1/1       Running     0          4m
resource-consumer-1-fpbv0   1/1       Running     0          4m
resource-consumer-1-sn2mb   1/1       Running     0          4m
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# 
[root@host-8-174-253 dma]# oc get pods -n dma
NAME                        READY     STATUS      RESTARTS   AGE
hpa-fake-lzpjm              0/1       Completed   0          7m
hpa-fake-trkht              0/1       Completed   0          7m
hpa-fake-xbw06              0/1       Completed   0          7m
resource-consumer-1-0223q   1/1       Running     0          6h

Comment 8 errata-xmlrpc 2017-04-12 19:07:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884