Bug 1884420
| Summary: | Keepalived stops on bootstrap too early | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ben Nemec <bnemec> |
| Component: | Networking | Assignee: | Antoni Segura Puimedon <asegurap> |
| Networking sub component: | runtime-cfg | QA Contact: | Victor Voronkov <vvoronko> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | bperkins, yboaron |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
To keep the VIP in the bootstrap node until the masters' API shows up, we increased the priority of the bootstrap keepalived API VIP membership. In order for the VIP to successfully move to the masters even when the bootstrap is requested to stay even after clustering (when its API server is already gone), we implemented a mechanism in the monitor that stops it. The problem with that was that sometimes, during a clustering, the API in the bootstrap node could go down for long enough that it looked like it would not go up anymore.
Consequence:
If the bootstrap kube-apiserver goes down for some time, and if this time is long enough to trigger the keepalived-monitor to stop keepalived, then the deployment breaks.
Fix:
Continue to check for the API server on the bootstrap node, and reloading keepalived if it shows up again. In case it is gone for good, API VIP will move to one of the masters, but if it just went down for a while because of API pod restarts and resource issues, we'll reload and reclaim the API VIP.
Result:
Deployment succeeds.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-27 16:47:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ben Nemec
2020-10-01 22:34:11 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |