Bug 2096326 - Installer needs parameters to increase various timeout values
Summary: Installer needs parameters to increase various timeout values
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.12.0
Assignee: OCP Installer
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-13 14:32 UTC by Wolfgang Kulhanek
Modified: 2022-11-08 07:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-25 14:07:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Wolfgang Kulhanek 2022-06-13 14:32:15 UTC
Version: 4.10.6

Platform: aws (but really any)

Please specify: IPI

What happened?

We are deploying clusters with c5.metal instance types on AWS (for OpenShift Virtualization). It takes AWS sometimes too long to provision these instances and the installer therefore times out.

We could of course add manual logic to "wait-for-install-complete" but I don't think there is a timeout parameter for that command - it'll just sit there forever.

It would be a quick fix to add a few timeout parameters to the openshift-install command.

e.g. --wait-for-bootstrap-timeout = 20m, --wait-for-cluster-complete=40m

These would have sensible defaults (e.g. what is hardcoded right now) but could be overwritten if necessary.

Note while our use case is a Single Node Bare Metal install on AWS (which technically is not a supported environment) this problem surfaces even more when a whole control plane would be provisioned on bare metal instances.

Also note that this may not just be AWS - bare metal providers have much longer times to create machines than VMs.

Comment 1 Rafael Fonseca 2022-07-21 16:30:10 UTC
Can you try with a newer installer version, e.g 4.10.24? We've increased the bootstrap timeout for baremetall installs [1]

[1] https://github.com/openshift/installer/pull/6017/files

Comment 2 Rafael Fonseca 2022-07-21 16:57:24 UTC
About configurable timeouts, see discussions at https://github.com/openshift/installer/pull/5979

Comment 3 Wolfgang Kulhanek 2022-07-25 13:32:02 UTC
I don't think the referenced pull request will help - how does it determine that it's bare metal.

My use case is just an IPI install on AWS usinc c5.metal instances.

Comment 4 Rafael Fonseca 2022-07-25 13:54:26 UTC
(In reply to Wolfgang Kulhanek from comment #3)
> I don't think the referenced pull request will help - how does it determine
> that it's bare metal.

It determines by the platform name used in the install-config.


> My use case is just an IPI install on AWS usinc c5.metal instances.

Ah, you had mentioned "Note while our use case is a Single Node Bare Metal install on AWS (which technically is not a supported environment) this problem surfaces even more when a whole control plane would be provisioned on bare metal instances.", so I assumed you set the platform to "baremetal" in the install-config.

Comment 5 Patrick Dillon 2022-07-25 14:07:45 UTC
Thanks for bringing this up. I am closing this as NOTABUG, because we are managing the addition of this functionality in https://issues.redhat.com/browse/CORS-2087 

I have added a link to this BZ in that Jira card. You can track the work there.


Note You need to log in before you can comment on or make changes to this bug.