Bug 2063831
Summary: | etcd quorum pods landing on same node | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Devan Goodwin <dgoodwin> | |
Component: | Etcd | Assignee: | Haseeb Tariq <htariq> | |
Status: | CLOSED ERRATA | QA Contact: | ge liu <geliu> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.11 | CC: | htariq, kenzhang, ngirard, wking | |
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2070783 (view as bug list) | Environment: |
openshift-tests-upgrade.[sig-scheduling][Early] The openshift-etcd pods should be scheduled on different nodes [Suite:openshift/conformance/parallel]
|
|
Last Closed: | 2022-08-10 10:54:06 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2070783 |
Description
Devan Goodwin
2022-03-14 13:36:26 UTC
TRT is double checking the results to make absolutely sure the test is catching something real. I confirmed that for both HAProxy and ETCD cases, the test is catching real problems. There is a bug with image-registry that is being fixed. Working on an update to replace the etcd-operator's quorum guard controller with the staticpod quorum guard controller. This would also include a new readyz server sidecar on the etcd-pods for the guard controller to be able to check for pod readiness. *** Bug 2065454 has been marked as a duplicate of this bug. *** Verified with 4.11.0-0.nightly-2022-04-20-045714, quorum guard controller have been updated, I suppose it should resolve this problem, sh-4.4# crictl ps|grep etcd cf3e865a0bcb2 d6eace900ed8aa9f2bb76d7f34981a34bf0cad1ee69ff3b05fd9b408d4645349 13 minutes ago Running etcd-readyz 0 6bead9a291bc8 Haseeb would you expect this to NEVER happen now? It looks like it's improved somewhat, but it's also still happening: https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=openshift-tests-upgrade.[sig-scheduling][Early]%20The%20openshift-etcd%20pods%20should%20be%20scheduled%20on%20different%20nodes%20[Suite:openshift/conformance/parallel] Current week pass rate: 97.83% Prev week pass rate: 96.85% Most of the hits seem to be coming from 4.11 upgraded from 4.10 jobs: https://sippy.ci.openshift.org/sippy-ng/jobs/4.11/runs?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22failed_test_names%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22openshift-tests-upgrade.%5Bsig-scheduling%5D%5BEarly%5D%20The%20openshift-etcd%20pods%20should%20be%20scheduled%20on%20different%20nodes%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D%22%7D%5D%7D&sortField=timestamp&sort=desc *** Bug 2027744 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |