Bug 1815179
Summary: | Upgrades from 4.3.5 failing since 2020-03-19: Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.4.0-0.ci-2020-03-19-140914: 83% complete | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Devan Goodwin <dgoodwin> |
Component: | Etcd | Assignee: | Sam Batschelet <sbatsche> |
Status: | CLOSED DUPLICATE | QA Contact: | ge liu <geliu> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4 | CC: | anarayan, sbatsche, sdodson, wking |
Target Milestone: | --- | Keywords: | Upgrades |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-20 23:29:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Devan Goodwin
2020-03-19 17:45:47 UTC
A similar error is showing up for upgrades from 4.4.0-rc2 to 4.5.0 Prow link : https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/22342 Error message: [Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial] [Suite:openshift] not sure if they are related. We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. Who is impacted? Customers upgrading from 4.2.99 to 4.3.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet All customers upgrading from 4.2.z to 4.3.z fail approximately 10% of the time What is the impact? Up to 2 minute disruption in edge routing Up to 90seconds of API downtime etcd loses quorum and you have to restore from backup How involved is remediation? Issue resolves itself after five minutes Admin uses oc to fix things Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression? No, it’s always been like this we just never noticed Yes, from 4.2.z and 4.3.1 *** This bug has been marked as a duplicate of bug 1815539 *** Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |