Bug 1899423

Summary: Vertical Pod Autoscaler (VPA) could not take any action based on the recommendation
Product: OpenShift Container Platform Reporter: Rutvik <rkshirsa>
Component: NodeAssignee: Joel Smith <joelsmith>
Node sub component: Autoscaler (HPA, VPA) QA Contact: Weinan Liu <weinliu>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, rkshirsa, rphillips, tsweeney, weinliu
Version: 4.6.z   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-19 19:44:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1938260    

Description Rutvik 2020-11-19 07:54:39 UTC
Description of problem:


The "vpa-recommender" correctly adds a "recommendation" but never seems to take any action to actually scale the Pods.


* vpa-updater logs at the time we expect the pods to be evicted:
~~~
I1116 14:45:32.477853       1 main.go:73] Vertical Pod Autoscaler 0.8.0 Updater
I1116 14:45:32.586672       1 fetcher.go:100] Initial sync of CronJob completed
I1116 14:45:32.687371       1 fetcher.go:100] Initial sync of DaemonSet completed
I1116 14:45:32.787849       1 fetcher.go:100] Initial sync of Deployment completed
I1116 14:45:32.888267       1 fetcher.go:100] Initial sync of ReplicaSet completed
I1116 14:45:32.988791       1 fetcher.go:100] Initial sync of StatefulSet completed
I1116 14:45:33.089079       1 fetcher.go:100] Initial sync of ReplicationController completed
I1116 14:45:33.190080       1 fetcher.go:100] Initial sync of Job completed
I1116 14:45:33.394133       1 updater.go:241] Rate limit disabled
I1116 14:45:33.795215       1 api.go:94] Initial VPA synced successfully
E1116 14:46:33.809364       1 updater.go:116] Error getting Admission Controller status: leases.coordination.k8s.io "vpa-admission-controller" not found. Skipping eviction loop
E1116 14:47:33.806088       1 updater.go:116] Error getting Admission Controller status: leases.coordination.k8s.io "vpa-admission-controller" not found. Skipping eviction loop
E1116 14:48:33.803952       1 updater.go:116] Error getting Admission Controller status: leases.coordination.k8s.io "vpa-admission-controller" not found. Skipping eviction loop
E1116 14:49:33.818845       1 updater.go:116] Error getting Admission Controller status: leases.coordination.k8s.io "vpa-admission-controller" not found. Skipping eviction loop
~~~

# vpa status
~~~
 updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2020-11-16T14:47:33Z"
    status: "True"
    type: RecommendationProvided
~~~


Version-Release number of selected component (if applicable):


How reproducible:
Always 


Actual results:
VPA isn't taking any actions per the recommendation.

Expected results:
VPA should take action as per the recommendation.

Comment 3 Joel Smith 2020-11-20 18:12:05 UTC
Hi,
I see that your stateful set has only one replica. VPA as configured in OpenShift will never kill your pod if you have just one replica. The system does not want to cause downtime for your application and assumes that killing and replacing a lone pod (in order to update the resource requests) is a bad idea.

If you restart the pod manually, does the new pod start with the recommended limits?

Also, if the application can run with two replicas, you might try retrying your test with two or more replicas and see if VPA will kill and replace pods that need updated resource requests.

Comment 8 Joel Smith 2021-03-15 18:15:01 UTC
*** Bug 1938949 has been marked as a duplicate of this bug. ***

Comment 11 Red Hat Bugzilla 2023-09-15 00:51:27 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days