Description of problem: etcd operator determines the replica count of the etcd-quorum-guard deployment by reading the install-config.yaml that is stored in kube-system at the time of cluster creation which may not match what the cluster's current state is. Customer was upgrading the cluster (not for the first time) from 4.7 to 4.8 and during the upgrade, the master nodes were unable to drain due to etcd-quorum-guard PDB. Looking at the project, there was a 4th pod stuck in pending. We attempted to scale the deployment down to 3 (which would work if timing was right) but the operator kept resetting the count to 4. Looking at the source code here (https://github.com/openshift/cluster-etcd-operator/blob/35672edef2c867e135b0e9378a00764b363d8ba5/pkg/operator/quorumguardcontroller/quorumguardcontroller.go#L292) the quorum guard controller reads the install-config from the cluster-config-v1 config map in kube-system. At the time the cluster was created, for some reason the customer had created a cluster with 4 masters. Since then, they removed the fourth master and was currently running 3 masters / etcd instances. The etcd operator never synced this back up due to its reading of that configmap. To fix this situation for the customer, we had to hand modify that install-config.yaml file in the configmap to have replicacount=3. Once we did that AND restarted the pod (operator doesn't seem to dynamically refresh its desired replica count), the customer was finally able to proceed with their update and the deployments stayed at a replica count of 3. I am not sure if this is the best way to constantly check for that count, but opening this BZ as a point of discussion as to whether a different CM should house that after initial install for the etcd operator or if operator can use the current etcd endpoints as the source of truth for how many replicas of the quorum guard pods there should be. Version-Release number of selected component (if applicable): 4.8 How reproducible: Always based on this scenario Steps to Reproduce: N/A Actual results: quorum guard pod count does not match the number of etcd instances. Expected results: quorum guard pod count matches the number of etcd instances. Additional info: The attached case has must-gather that can be used to see all configs along w/ an inspect of the kube-system, openshift-etcd and openshift-etcd-operator namespaces.
@ngirard I would need to know the command you have used to scale down the master nodes .. I have been through the must-gather and I saw that the Quorum-guard controller has deleted `cvo-managed etcd quorum guard`
Looking at the QuorumGuardController here (https://github.com/openshift/cluster-etcd-operator/blob/6b8418244509206ebfd362a2eaf0a4dc91239135/pkg/operator/quorumguardcontroller/quorumguardcontroller.go#L53) , it sets the PodDistrubtionBudget to MaxUnavailable(1) The assumption is that we have 3 master nodes and 3 etcd pods .. To keep the quorum in this case the minimum is to have two etcd pods, that's why the PodDistrubtionBudget is MaxUnavailable(1) Another way to put it is to change it into MinAvailable(2) to keep the quorum guard I think this could fix this issue and future issues with clusters where we have more than 3 master nodes cc @
cc @dwest @htariq @alray @ngirard
Sorry, some how this bz hasn't been in my feeds. So customer originally installed 4 masters. Support asked them to remove one (not sure the actual pricess). After a while, customer attempted upgrade and hit this situation where operator wanted 4 quorum-guard pods. I'll see if I can find the case where they scaled the master count down (i'm assuming deleted the node, and cleaned up etcd member list manually.)
Removing myself from "needinfo". Probably a wrong manip from @melbeher