1899423 – Vertical Pod Autoscaler (VPA) could not take any action based on the recommendation

Bug 1899423 - Vertical Pod Autoscaler (VPA) could not take any action based on the recommendation

Summary: Vertical Pod Autoscaler (VPA) could not take any action based on the recommen...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Joel Smith
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1938260
TreeView+	depends on / blocked

Reported:	2020-11-19 07:54 UTC by Rutvik
Modified:	2024-03-25 17:09 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-19 19:44:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift vertical-pod-autoscaler-operator pull 54	0	None	open	Bug 1899423: Fix VPA Updater	2021-03-12 05:02:52 UTC

Description Rutvik 2020-11-19 07:54:39 UTC

Description of problem:


The "vpa-recommender" correctly adds a "recommendation" but never seems to take any action to actually scale the Pods.


* vpa-updater logs at the time we expect the pods to be evicted:
~~~
I1116 14:45:32.477853       1 main.go:73] Vertical Pod Autoscaler 0.8.0 Updater
I1116 14:45:32.586672       1 fetcher.go:100] Initial sync of CronJob completed
I1116 14:45:32.687371       1 fetcher.go:100] Initial sync of DaemonSet completed
I1116 14:45:32.787849       1 fetcher.go:100] Initial sync of Deployment completed
I1116 14:45:32.888267       1 fetcher.go:100] Initial sync of ReplicaSet completed
I1116 14:45:32.988791       1 fetcher.go:100] Initial sync of StatefulSet completed
I1116 14:45:33.089079       1 fetcher.go:100] Initial sync of ReplicationController completed
I1116 14:45:33.190080       1 fetcher.go:100] Initial sync of Job completed
I1116 14:45:33.394133       1 updater.go:241] Rate limit disabled
I1116 14:45:33.795215       1 api.go:94] Initial VPA synced successfully
E1116 14:46:33.809364       1 updater.go:116] Error getting Admission Controller status: leases.coordination.k8s.io "vpa-admission-controller" not found. Skipping eviction loop
E1116 14:47:33.806088       1 updater.go:116] Error getting Admission Controller status: leases.coordination.k8s.io "vpa-admission-controller" not found. Skipping eviction loop
E1116 14:48:33.803952       1 updater.go:116] Error getting Admission Controller status: leases.coordination.k8s.io "vpa-admission-controller" not found. Skipping eviction loop
E1116 14:49:33.818845       1 updater.go:116] Error getting Admission Controller status: leases.coordination.k8s.io "vpa-admission-controller" not found. Skipping eviction loop
~~~

# vpa status
~~~
 updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2020-11-16T14:47:33Z"
    status: "True"
    type: RecommendationProvided
~~~


Version-Release number of selected component (if applicable):


How reproducible:
Always 


Actual results:
VPA isn't taking any actions per the recommendation.

Expected results:
VPA should take action as per the recommendation.

Comment 3 Joel Smith 2020-11-20 18:12:05 UTC

Hi,
I see that your stateful set has only one replica. VPA as configured in OpenShift will never kill your pod if you have just one replica. The system does not want to cause downtime for your application and assumes that killing and replacing a lone pod (in order to update the resource requests) is a bad idea.

If you restart the pod manually, does the new pod start with the recommended limits?

Also, if the application can run with two replicas, you might try retrying your test with two or more replicas and see if VPA will kill and replace pods that need updated resource requests.

Comment 8 Joel Smith 2021-03-15 18:15:01 UTC

*** Bug 1938949 has been marked as a duplicate of this bug. ***

Comment 11 Red Hat Bugzilla 2023-09-15 00:51:27 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.