TRT recently added a test to monitor for this and it exposed that etcd quorum pods are actually landing on the same node for periods of time: https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=openshift-tests-upgrade.[sig-scheduling][Early]%20The%20openshift-etcd%20pods%20should%20be%20scheduled%20on%20different%20nodes%20[Suite:openshift/conformance/parallel] Sample job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade/1503258288765014016 This seems to be happening alarmingly often: https://search.ci.openshift.org/?search=The+openshift-etcd+pods+should+be+scheduled+on+different+nodes&maxAge=48h&context=0&type=junit&name=4.11&excludeName=quorum&maxMatches=5&maxBytes=20971520&groupBy=job Marking sev high as this has potential to cause loss of quorum. Backporting to 4.10 should probably be discussed. Jan Chaloupka did some work to allow force assign PDB pods to nodes instead of relying on scheduler, may be a good idea to make use of this for etcd.
TRT is double checking the results to make absolutely sure the test is catching something real.
I confirmed that for both HAProxy and ETCD cases, the test is catching real problems. There is a bug with image-registry that is being fixed.
Working on an update to replace the etcd-operator's quorum guard controller with the staticpod quorum guard controller. This would also include a new readyz server sidecar on the etcd-pods for the guard controller to be able to check for pod readiness.
*** Bug 2065454 has been marked as a duplicate of this bug. ***
Verified with 4.11.0-0.nightly-2022-04-20-045714, quorum guard controller have been updated, I suppose it should resolve this problem, sh-4.4# crictl ps|grep etcd cf3e865a0bcb2 d6eace900ed8aa9f2bb76d7f34981a34bf0cad1ee69ff3b05fd9b408d4645349 13 minutes ago Running etcd-readyz 0 6bead9a291bc8
Haseeb would you expect this to NEVER happen now? It looks like it's improved somewhat, but it's also still happening: https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=openshift-tests-upgrade.[sig-scheduling][Early]%20The%20openshift-etcd%20pods%20should%20be%20scheduled%20on%20different%20nodes%20[Suite:openshift/conformance/parallel] Current week pass rate: 97.83% Prev week pass rate: 96.85% Most of the hits seem to be coming from 4.11 upgraded from 4.10 jobs: https://sippy.ci.openshift.org/sippy-ng/jobs/4.11/runs?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22failed_test_names%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22openshift-tests-upgrade.%5Bsig-scheduling%5D%5BEarly%5D%20The%20openshift-etcd%20pods%20should%20be%20scheduled%20on%20different%20nodes%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D%22%7D%5D%7D&sortField=timestamp&sort=desc
*** Bug 2027744 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days