I deployed 4.6 dev-scripts, Keepalived set to work in Unicast mode and the VIPs were set correctly. Post deployment, a Keepalived mode change to multicast was triggered by creating /etc/keepalived/monitor-user.conf in all nodes. Though it seems that Keepalived mode changed to multicast in all the nodes successfully, I wasn't able to reach API-VIP from the provisioning host. From the ARP table in the provisioning host, it seems that API-VIP was resolved to a wrong MAC address.
I think I understood the root cause of this issue, I deployed my dev-scripts environment with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true So we have the following case : 1. After bootstrap completed phase, kepalived-monitor in the bootstrap continues to read node details from API and keep keepalived.conf up 2 date . The kubeconfig file in the bootstrap node points to https://localhost:6443 2. So, after bootstrap complete phase, keepalived monitor fails to read node details ( localhost API is disabled) and as a result of that a keepalived.conf without the masters as peers being rendered. 3. After rebooting the masters or removing keepalived-monitor container on master nodes, they fail to retrieve the bootstrap IP address in the current implementation , keepalived conf on masters being rendered without the bootstrap IP At this point, we have two separate keepalived domains for API VIP
Due to https://github.com/openshift/machine-config-operator/pull/2107 having merged I think this is good to move to the modified state. The PR doesn't explicitly reference this BZ
Verified on: Client Version: 4.6.0-0.nightly-2020-10-03-051134 Server Version: 4.6.0-rc.3 Kubernetes Version: v1.19.0+d59ce34 To verify you should : 1.Run deployment with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true 2.Post-deployment: verify that Keepalid.conf in all nodes configured properly 3.Reboot master node that holds API-VIP 4.Verify that API-VIP moved to another master and OC command still working 5.Wait for 1-2 minutes 6.Run steps 3-5 3-4 more times
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196