Bug 1872802 - 5 of the last 5 promotion jobs failed on a node not draining
Summary: 5 of the last 5 promotion jobs failed on a node not draining
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Sam Batschelet
QA Contact: ge liu
Depends On:
TreeView+ depends on / blocked
Reported: 2020-08-26 16:17 UTC by David Eads
Modified: 2020-09-01 11:17 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 429 None closed bug 1872802: remove old MCO etcd quorum guard resources 2020-08-31 01:37:53 UTC

Description David Eads 2020-08-26 16:17:43 UTC
The two that I looked at appear related to the etcd quorum guard pod.  Diagnosis looks like this
1. inspect the co/machine-config .  This shows that the pool didn't upgrade and lists a problem with "error when evicting pod" and lists a quorum guard pod
2. That quorum guard pod is for etcd and the two I looked at were both in ns/openshift-machine-config-operator

I haven't dug further.

Comment 1 David Eads 2020-08-26 16:18:28 UTC
from here: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/#4.6.0-0.ci find one where upgrade column is red.  Click the job, then click the failed upgrade.

Comment 2 Sam Batschelet 2020-08-27 07:08:01 UTC
After https://github.com/openshift/cluster-etcd-operator/pull/429 upgrade minor appears to be passing regularly.

Comment 3 ge liu 2020-09-01 11:11:31 UTC
Verified with checking ci job result.

Note You need to log in before you can comment on or make changes to this bug.