Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2096326

Summary: Installer needs parameters to increase various timeout values
Product: OpenShift Container Platform Reporter: Wolfgang Kulhanek <wkulhane>
Component: InstallerAssignee: OCP Installer <ocp-installer>
Installer sub component: openshift-installer QA Contact: Gaoyun Pei <gpei>
Status: CLOSED NOTABUG Docs Contact:
Severity: low    
Priority: unspecified CC: bbarbach, lwan, padillon, rdossant, yunjiang
Version: 4.10   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-25 14:07:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wolfgang Kulhanek 2022-06-13 14:32:15 UTC
Version: 4.10.6

Platform: aws (but really any)

Please specify: IPI

What happened?

We are deploying clusters with c5.metal instance types on AWS (for OpenShift Virtualization). It takes AWS sometimes too long to provision these instances and the installer therefore times out.

We could of course add manual logic to "wait-for-install-complete" but I don't think there is a timeout parameter for that command - it'll just sit there forever.

It would be a quick fix to add a few timeout parameters to the openshift-install command.

e.g. --wait-for-bootstrap-timeout = 20m, --wait-for-cluster-complete=40m

These would have sensible defaults (e.g. what is hardcoded right now) but could be overwritten if necessary.

Note while our use case is a Single Node Bare Metal install on AWS (which technically is not a supported environment) this problem surfaces even more when a whole control plane would be provisioned on bare metal instances.

Also note that this may not just be AWS - bare metal providers have much longer times to create machines than VMs.

Comment 1 Rafael Fonseca 2022-07-21 16:30:10 UTC
Can you try with a newer installer version, e.g 4.10.24? We've increased the bootstrap timeout for baremetall installs [1]

[1] https://github.com/openshift/installer/pull/6017/files

Comment 2 Rafael Fonseca 2022-07-21 16:57:24 UTC
About configurable timeouts, see discussions at https://github.com/openshift/installer/pull/5979

Comment 3 Wolfgang Kulhanek 2022-07-25 13:32:02 UTC
I don't think the referenced pull request will help - how does it determine that it's bare metal.

My use case is just an IPI install on AWS usinc c5.metal instances.

Comment 4 Rafael Fonseca 2022-07-25 13:54:26 UTC
(In reply to Wolfgang Kulhanek from comment #3)
> I don't think the referenced pull request will help - how does it determine
> that it's bare metal.

It determines by the platform name used in the install-config.


> My use case is just an IPI install on AWS usinc c5.metal instances.

Ah, you had mentioned "Note while our use case is a Single Node Bare Metal install on AWS (which technically is not a supported environment) this problem surfaces even more when a whole control plane would be provisioned on bare metal instances.", so I assumed you set the platform to "baremetal" in the install-config.

Comment 5 Patrick Dillon 2022-07-25 14:07:45 UTC
Thanks for bringing this up. I am closing this as NOTABUG, because we are managing the addition of this functionality in https://issues.redhat.com/browse/CORS-2087 

I have added a link to this BZ in that Jira card. You can track the work there.