Bug 1981465
Summary: | Assisted installer wait for ready nodes on bootstrap kube-apiserver though it moved to one of the other masters | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Fred Rolland <frolland> |
Component: | assisted-installer | Assignee: | Eran Cohen <ercohen> |
assisted-installer sub component: | Installer | QA Contact: | Udi Kalifon <ukalifon> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aos-bugs, ercohen, ohochman |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | OCP-Metal-v1.0.24.1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:38:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Fred Rolland
2021-07-12 15:48:55 UTC
Can this be reproduced? How should we test this? In which cases does kube-apiserver move to a new master? Maybe @ercohen can provide steps to reproduce. This can be reproduced by delaying one of the master nodes to get to ready status (in kubernetes) after the master node reboot. I think the best way to reproduce it is by killing the CNI pods on one of the master nodes until the kube-apiserver move from the bootstrap to the other master. This will allow the installation to progress but will keep the node in a NotReady status. I think it might reproduce with an easier flow: just stop kubelet, disconnect the node network, stop the node, etc... but unsure how it will effect the kube-apiserver transition to the other master. Verified. We tested by stopping the kubelet service on one of the non-bootstrap masters right after it rebooted. We can see in the logs that the kube-apiserver did not get disconnected, and if we fix the problem within 1 hour the installation succeeds. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |