Bug 1866265 - [baremetal] API-VIP sometimes not accessible when cluster deployed with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true
Summary: [baremetal] API-VIP sometimes not accessible when cluster deployed with OPENS...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Yossi Boaron
QA Contact: Nataf Sharabi
Depends On:
TreeView+ depends on / blocked
Reported: 2020-08-05 09:07 UTC by Yossi Boaron
Modified: 2020-10-27 16:25 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-10-27 16:24:53 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 92 0 None closed Bug 1866265: Stop Keepalived on bootstrap after bootstrap completed 2021-01-27 08:12:59 UTC
Github openshift machine-config-operator pull 2107 0 None closed Bug 1871769: [baremetal] keep API VIP in the bootstrap node until the bootstrap’s node API goes away 2021-01-27 08:12:59 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:25:07 UTC

Description Yossi Boaron 2020-08-05 09:07:02 UTC
I deployed 4.6 dev-scripts, Keepalived set to work in Unicast mode and the VIPs were set correctly.

Post deployment, a Keepalived mode change to multicast was triggered by creating /etc/keepalived/monitor-user.conf in all nodes.

Though it seems that Keepalived mode changed to multicast in all the nodes successfully, I wasn't able to reach API-VIP from the provisioning host.

From the ARP table in the provisioning host, it seems that API-VIP was resolved to a wrong MAC address.

Comment 2 Yossi Boaron 2020-08-23 09:33:23 UTC
I think I understood the root cause of this issue,  I deployed my dev-scripts environment with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true

So we have the following case :
1. After bootstrap completed phase, kepalived-monitor in the bootstrap continues to read node details from API and keep keepalived.conf up 2 date .
 The kubeconfig file in the bootstrap node points to    https://localhost:6443

 2. So, after bootstrap complete phase, keepalived monitor fails to read node details ( localhost API is disabled) and as a result of that a keepalived.conf without the masters as peers being rendered.

 3. After rebooting the masters or removing keepalived-monitor container on master nodes,  they fail to retrieve the bootstrap IP address in the current implementation  , keepalived conf on masters being rendered without the bootstrap IP

At this point, we have two separate keepalived domains for API VIP

Comment 3 Yu Qi Zhang 2020-09-24 14:22:10 UTC
Due to https://github.com/openshift/machine-config-operator/pull/2107 having merged I think this is good to move to the modified state. The PR doesn't explicitly reference this BZ

Comment 6 Nataf Sharabi 2020-10-14 10:38:49 UTC
Verified on:

Client Version: 4.6.0-0.nightly-2020-10-03-051134
Server Version: 4.6.0-rc.3
Kubernetes Version: v1.19.0+d59ce34

To verify you should :

2.Post-deployment: verify that Keepalid.conf  in all nodes configured properly
3.Reboot master node that holds API-VIP
4.Verify that API-VIP moved to another master and OC  command still working
5.Wait for 1-2 minutes
6.Run steps 3-5 3-4 more times

Comment 8 errata-xmlrpc 2020-10-27 16:24:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.