Bug 1916890

Summary: [OCP 4.7] api or api-int not available during installation
Product: OpenShift Container Platform Reporter: Mario Abajo <mabajodu>
Component: NetworkingAssignee: Ben Nemec <bnemec>
Networking sub component: runtime-cfg QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: bbreard, bnemec, bsmitley, dkaylor, fortinj66, gpei, imcleod, jligon, jmalde, malonso, m.andre, miabbott, mschwabe, mstaeble, nschuetz, nstielau, smilner, stbenjam, trees, wking
Version: 4.7Keywords: Reopened, Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Incorrect keepalived setting. Consequence: The VIP may end up on an incorrect system and be unable to move back. Fix: Remove the incorrect setting. Result: The VIP ends up on the correct system.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:36:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1971864    

Comment 4 Micah Abbott 2021-01-15 20:29:51 UTC
Is there a thought why this is filed as an RHCOS BZ?  Once any of the nodes are booted into the OS and containers have started, RHCOS is mostly out of the picture.

> My guess is that the VIPs moves from the bootstrap to the masters before the control plane is completely ready.`

If this is the case, this is not something that RHCOS controls, but I think would be handled by the installer or maybe api server itself?

Comment 5 Micah Abbott 2021-01-15 20:52:57 UTC
Higher priority work has prevented this issue from being solved; adding the UpcomingSprint keyword

Comment 6 Steve Milner 2021-01-18 18:42:12 UTC
Moving to the installer team as this does seem to be related to installation flow.

Comment 7 Matthew Staebler 2021-01-27 15:49:36 UTC
At first glance, this may be an issue with the keepalived logic. Moving to the kni team, as they maintain that.

Comment 8 Matthew Staebler 2021-02-25 15:17:15 UTC
*** Bug 1932464 has been marked as a duplicate of this bug. ***

Comment 9 John Fortin 2021-03-09 14:53:21 UTC
Just checking in...  We are still seeing this issue with OKD installs.  I'm not sure if there is any additional information I can provide

Comment 10 John Fortin 2021-03-09 14:55:28 UTC
I did this this while I was investigating.  It does look like there was some work done regarding API VIP failover fairly recently

https://github.com/openshift/machine-config-operator/pull/2107

Comment 11 Matthew Staebler 2021-05-24 14:43:36 UTC
*** Bug 1963161 has been marked as a duplicate of this bug. ***

Comment 12 Nick Schuetz 2021-06-02 18:13:35 UTC
I've verified that when the fixes in the following are applied I am able to get a successful install on VMware via IPI:

https://github.com/openshift/machine-config-operator/pull/2586
https://github.com/openshift/installer/pull/4972

Comment 13 Ben Nemec 2021-06-10 17:33:02 UTC
I believe this problem was fixed by https://github.com/openshift/installer/pull/4973. Duplicating to that bug.

*** This bug has been marked as a duplicate of bug 1966862 ***

Comment 16 Brandon Smitley 2021-06-14 15:41:49 UTC

*** This bug has been marked as a duplicate of bug 1966862 ***

Comment 18 Martin André 2021-06-17 06:24:06 UTC
(In reply to Ben Nemec from comment #13)
> I believe this problem was fixed by
> https://github.com/openshift/installer/pull/4973. Duplicating to that bug.
> 
> *** This bug has been marked as a duplicate of bug 1966862 ***

In my understanding bug 1966862 is a different issue affecting only vsphere platform.

The issue reported in this bug affects all platform and was fixed with https://github.com/openshift/machine-config-operator/pull/2586.

They should be treated as different issues. Marking them as duplicates prevents the backport for https://github.com/openshift/machine-config-operator/pull/2586 to merge in 4.7. Can we sort this out?

Comment 19 Ben Nemec 2021-06-17 14:47:21 UTC
Shoot, you're right. This didn't get closed properly because of the other patch attached to it, not because this fix didn't merge.

Comment 26 Victor Voronkov 2021-06-30 06:57:24 UTC
Verified on 4.8.0-0.nightly-2021-06-29-033219

Successfull deployment of IPI vSphere

Comment 28 errata-xmlrpc 2021-07-27 22:36:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438