Bug 2039965
| Summary: | [IBMCLOUD] Poor network performance (mostly detected in NA-based regions) cause "wait-for install-complete" to fail but installation succeeds on its own after some minutes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Pedro Amoedo <pamoedom> |
| Component: | Installer | Assignee: | aos-install |
| Installer sub component: | openshift-installer | QA Contact: | Pedro Amoedo <pamoedom> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | CC: | cschaefe, jnowicki |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-01-28 10:23:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Pedro Amoedo
2022-01-12 19:35:31 UTC
This same kind of issue, where there is network connectivity issues causing deployment delays and requiring a followup "wait-for install-complete" is not limited to NA regions, it has been seen in EU regions. IBM Cloud is investigating the issue and hopefully improving stability/reliability on related resources to help prevent this issue in the future. Thanks Christopher, I'm updating the summary to better reflect the situation, in my case I've only seen that behavior in NA-based ones, maybe those are more saturated locations. After switching to a different instance type (specifically bx2-4x16) we observed high installation success in local testing and also noticed CI test success as well. Previously the bx2d-4x16 instance type was being used and was unreliable/problematic due to provisioning of limited availability storage. We are working to ensure that bx2-4x16 is the default instance type. Pedro - could you try your test again, ensuring bx2-4x16 is the instance type for bootstrap, master and worker nodes? I suspect you will not be seeing the described issue as regularly going forward (if not at all). Sure Jeff, I'll make some tests with that profile on US-based regions, which show the problem with a highest ratio than others, I'll keep you posted. Best Regards. Hi Jeff, after switching back to default "OpenShiftSDN" network type as discussed, and overridden instance type to "bx2-4x16" in align with openshift/installer#5578[1], the tests have significantly improved. I have tested 6 different installations in US-based supported regions like "us-south", "us-east" & "ca-tor", 4 of 6 tests have run flawlessly from begin to end, the other 2 that failed presented some minimal issues but not during the installation itself but rather during our post_action scripts (I have used nightly versions so this is not remarkable). In summary, the new profiles in combination with default SDN network are presenting better success ratio, even on US-based regions that were showing worst results initially, thanks. [1] - https://github.com/openshift/installer/pull/5578 NOTE: the PR is linked/tracked via BZ#2045916, therefore I'm closing this one as duplicate. Best Regards. *** This bug has been marked as a duplicate of bug 2045916 *** |