Bug 1956993 - Increase initialDelaySeconds for hco-operator and hco-webhook for upgrade scenarios
Summary: Increase initialDelaySeconds for hco-operator and hco-webhook for upgrade sce...
Keywords:
Status: MODIFIED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 2.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 2.6.3
Assignee: Simone Tiraboschi
QA Contact: Inbar Rose
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-04 19:53 UTC by Kobig
Modified: 2021-05-11 07:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 1325 0 None closed Increase the initialDelay for the liveness probe 2021-05-10 16:40:38 UTC
Github kubevirt hyperconverged-cluster-operator pull 1326 0 None closed [release-1.4] Increase the initialDelay for the liveness probe 2021-05-10 16:40:39 UTC
Github kubevirt hyperconverged-cluster-operator pull 1327 0 None closed [release-1.3] Increase the initialDelay for the liveness probe 2021-05-10 16:40:36 UTC
Github kubevirt hyperconverged-cluster-operator pull 1328 0 None closed Fix initialDelay for the liveness probe 2021-05-11 07:12:00 UTC

Description Kobig 2021-05-04 19:53:24 UTC
Description of problem:
Similer to BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1924137
but with hco-operator and hco-webhook. 

We manually update the initialprobe delay to 45 sec on both deployment to make the pods up and running, but after a while, the operator put back the original values ( which is normal, since the operator is managing both deployment ). 

Version-Release number of selected component (if applicable): 2.6


How reproducible:


Steps to Reproduce:
1.Change hco-operator and hco-webhook initialprobe delay from 15 to 45 
2.
3.

Actual results:
hco-operator and hco-webhook initialprobe is 15, and when changed operator reset it back(expected result for the operator) 

Expected results:
Change hco-operator and hco-webhook initialprobe to 45 


Additional info:

Comment 1 sgott 2021-05-05 12:07:22 UTC
Re-assigning this to the Install component. Please feel free to override this if you feel this is in error.

Comment 2 Simone Tiraboschi 2021-05-05 21:08:05 UTC
The current value for initialDelaySeconds is 5 seconds for both the readiness and liveness probes container so the first checks are going to be executed 5 seconds after the container has started.
failureThreshold is currently set to 1 so the first failure will restart the container and this can potentially cause an endless loop on really overloaded clusters.
I'm proposing to increase initialDelaySeconds to 10 seconds to maintain a certain responsiveness but raising failureThreshold to 3 so that the pod will not be restarted in the first 30 seconds.


Note You need to log in before you can comment on or make changes to this bug.