Bug 1866265
Summary: | [baremetal] API-VIP sometimes not accessible when cluster deployed with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yossi Boaron <yboaron> |
Component: | Machine Config Operator | Assignee: | Yossi Boaron <yboaron> |
Status: | CLOSED ERRATA | QA Contact: | Nataf Sharabi <nsharabi> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.6 | CC: | asegurap, bperkins, jerzhang |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:24:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yossi Boaron
2020-08-05 09:07:02 UTC
I think I understood the root cause of this issue, I deployed my dev-scripts environment with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true So we have the following case : 1. After bootstrap completed phase, kepalived-monitor in the bootstrap continues to read node details from API and keep keepalived.conf up 2 date . The kubeconfig file in the bootstrap node points to https://localhost:6443 2. So, after bootstrap complete phase, keepalived monitor fails to read node details ( localhost API is disabled) and as a result of that a keepalived.conf without the masters as peers being rendered. 3. After rebooting the masters or removing keepalived-monitor container on master nodes, they fail to retrieve the bootstrap IP address in the current implementation , keepalived conf on masters being rendered without the bootstrap IP At this point, we have two separate keepalived domains for API VIP Due to https://github.com/openshift/machine-config-operator/pull/2107 having merged I think this is good to move to the modified state. The PR doesn't explicitly reference this BZ Verified on: Client Version: 4.6.0-0.nightly-2020-10-03-051134 Server Version: 4.6.0-rc.3 Kubernetes Version: v1.19.0+d59ce34 To verify you should : 1.Run deployment with OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true 2.Post-deployment: verify that Keepalid.conf in all nodes configured properly 3.Reboot master node that holds API-VIP 4.Verify that API-VIP moved to another master and OC command still working 5.Wait for 1-2 minutes 6.Run steps 3-5 3-4 more times Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |