Bug 2027744
Summary: | etcd operator QuorumGuardController reads stored install-config.yaml which doesn't match system's current state | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Neil Girard <ngirard> |
Component: | Etcd | Assignee: | melbeher |
Status: | CLOSED DUPLICATE | QA Contact: | ge liu <geliu> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.8 | CC: | alray, dwest, htariq, melbeher, tjungblu |
Target Milestone: | --- | Flags: | alray:
needinfo-
alray: needinfo- alray: needinfo- alray: needinfo- |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-04-27 10:02:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Neil Girard
2021-11-30 14:49:02 UTC
@ngirard I would need to know the command you have used to scale down the master nodes .. I have been through the must-gather and I saw that the Quorum-guard controller has deleted `cvo-managed etcd quorum guard` Looking at the QuorumGuardController here (https://github.com/openshift/cluster-etcd-operator/blob/6b8418244509206ebfd362a2eaf0a4dc91239135/pkg/operator/quorumguardcontroller/quorumguardcontroller.go#L53) , it sets the PodDistrubtionBudget to MaxUnavailable(1) The assumption is that we have 3 master nodes and 3 etcd pods .. To keep the quorum in this case the minimum is to have two etcd pods, that's why the PodDistrubtionBudget is MaxUnavailable(1) Another way to put it is to change it into MinAvailable(2) to keep the quorum guard I think this could fix this issue and future issues with clusters where we have more than 3 master nodes cc @ cc @dwest @htariq @alray @ngirard Sorry, some how this bz hasn't been in my feeds. So customer originally installed 4 masters. Support asked them to remove one (not sure the actual pricess). After a while, customer attempted upgrade and hit this situation where operator wanted 4 quorum-guard pods. I'll see if I can find the case where they scaled the master count down (i'm assuming deleted the node, and cleaned up etcd member list manually.) Removing myself from "needinfo". Probably a wrong manip from @melbeher |