Bug 1866265

Summary: [baremetal] API-VIP sometimes not accessible when cluster deployed with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true
Product: OpenShift Container Platform Reporter: Yossi Boaron <yboaron>
Component: Machine Config OperatorAssignee: Yossi Boaron <yboaron>
Status: CLOSED ERRATA QA Contact: Nataf Sharabi <nsharabi>
Severity: low Docs Contact:
Priority: low    
Version: 4.6CC: asegurap, bperkins, jerzhang
Target Milestone: ---Keywords: Triaged
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:24:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yossi Boaron 2020-08-05 09:07:02 UTC
I deployed 4.6 dev-scripts, Keepalived set to work in Unicast mode and the VIPs were set correctly.


Post deployment, a Keepalived mode change to multicast was triggered by creating /etc/keepalived/monitor-user.conf in all nodes.

Though it seems that Keepalived mode changed to multicast in all the nodes successfully, I wasn't able to reach API-VIP from the provisioning host.

From the ARP table in the provisioning host, it seems that API-VIP was resolved to a wrong MAC address.

Comment 2 Yossi Boaron 2020-08-23 09:33:23 UTC
I think I understood the root cause of this issue,  I deployed my dev-scripts environment with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true


So we have the following case :
1. After bootstrap completed phase, kepalived-monitor in the bootstrap continues to read node details from API and keep keepalived.conf up 2 date .
 The kubeconfig file in the bootstrap node points to    https://localhost:6443

 2. So, after bootstrap complete phase, keepalived monitor fails to read node details ( localhost API is disabled) and as a result of that a keepalived.conf without the masters as peers being rendered.

 3. After rebooting the masters or removing keepalived-monitor container on master nodes,  they fail to retrieve the bootstrap IP address in the current implementation  , keepalived conf on masters being rendered without the bootstrap IP


At this point, we have two separate keepalived domains for API VIP

Comment 3 Yu Qi Zhang 2020-09-24 14:22:10 UTC
Due to https://github.com/openshift/machine-config-operator/pull/2107 having merged I think this is good to move to the modified state. The PR doesn't explicitly reference this BZ

Comment 6 Nataf Sharabi 2020-10-14 10:38:49 UTC
Verified on:

Client Version: 4.6.0-0.nightly-2020-10-03-051134
Server Version: 4.6.0-rc.3
Kubernetes Version: v1.19.0+d59ce34


To verify you should :

1.Run deployment with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true
2.Post-deployment: verify that Keepalid.conf  in all nodes configured properly
3.Reboot master node that holds API-VIP
4.Verify that API-VIP moved to another master and OC  command still working
5.Wait for 1-2 minutes
6.Run steps 3-5 3-4 more times

Comment 8 errata-xmlrpc 2020-10-27 16:24:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196