Bug 1999594
Summary: | IPI deployment fails when master POST time differ | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Lubov <lshilin> | ||||
Component: | Installer | Assignee: | Beth White <beth.white> | ||||
Installer sub component: | OpenShift on Bare Metal IPI | QA Contact: | Amit Ugol <augol> | ||||
Status: | CLOSED DUPLICATE | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | aguclu, bfournie, ccrum, derekh, tsedovic | ||||
Version: | 4.9 | ||||||
Target Milestone: | --- | ||||||
Target Release: | 4.9.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-10-18 10:42:52 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Lubov
2021-08-31 11:47:23 UTC
I don't think this is specific to ibmcloud, The problem appears to happen when there is a mismatch on POST times on servers (not likely to happen in virt environments) I believe that the api server on the bootstrap node is shutting down before the slowest server manages to get its ignition data as one of other master servers (with a quicker POST time) has signalled it is ready. I've been able to reproduce this on virt by adding a 5 minute delay on one of the master VM's, This wasn't happening last week before the rhcos image version was bumped, I assume the longer time to pivot gave the slower master enough time to get it ignition data. Moving over to installer where somebody familiar with installer can investigate. Removing Triaged to mark for re-triaging by the team. https://bugzilla.redhat.com/show_bug.cgi?id=1998643 urgent bug and this bug was opened nearly in same times(after the CoreOS bump). Although the root causes might be different(both of them were suffered from bootstrap apiserver unhealthiness), latter one was fixed. Could you please retry again to assure that this bug is still relevant or not? @derekh do we still have the slow machine in out setup? (In reply to Lubov from comment #6) > @derekh do we still have the slow machine in out setup? We don't but if I remember correctly at the time we did come to the conclusion that this problem happened as a result of the RHCOS version bump. So I'm happy to close this as a DUP. Alternatively if you want to try a reproduce in virt you could pause a master for 5+ minutes after it gets provisioned and rebooted while still in POST. Then unpause and see if everything comes up. > (In reply to Lubov from comment #6) > > @derekh do we still have the slow machine in out setup? > > We don't but if I remember correctly at the time we did come to the > conclusion that this > problem happened as a result of the RHCOS version bump. So I'm happy to > close this as a DUP. I'm happy with you closing the bz :)(In reply to Derek Higgins from comment #7) I'm closing this bug as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1998643. Since parent bug is fixed, this bug should also be fixed. *** This bug has been marked as a duplicate of bug 1998643 *** |