Bug 1824981
Summary: | [4.3 upgrade][alert]Failed to install Operator packageserver version 0.13.0. Reason-ComponentUnhealthy | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hongkai Liu <hongkliu> |
Component: | OLM | Assignee: | Evan Cordell <ecordell> |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
Status: | CLOSED NOTABUG | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | agarcial, ccoleman, dsover, krizza, nhale, oarribas, wking |
Version: | 4.3.0 | Keywords: | Upgrades |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-12 23:34:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hongkai Liu
2020-04-16 19:10:40 UTC
The version before upgrade: 4.3.0-0.nightly-2020-03-23-130439 In general, it would be much comforting if 1. no alerts would be fired if upgrade is considered successful. 2. the status of clusterverion shows a nicer message if upgrade is still in process instead of "Unable to apply 4.3.0-0.nightly-2020-04-13-190424: the cluster operator machine-config has not yet successfully rolled out" In general all components that can. should not fire an alert on short term disruption that is within safe bounds, especially upgrade. On 2 we should potentially tolerate that one longer. (In reply to Hongkai Liu from comment #0) > oc --context build01 adm upgrade --allow-explicit-upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2020-04-13-190424 --force=true Largely unrelated, but just to get the word out in close proximity to anyone mentioning --force, you would be much safer using by-digest pullspecs for the reasons described in [1,2], both of which are in flight to land client-side guards/warnings around this. [1]: https://github.com/openshift/oc/pull/390 [2]: https://github.com/openshift/oc/pull/238 (In reply to Hongkai Liu from comment #3) > 1. no alerts would be fired if upgrade is considered successful. [1] is in flight so we can enforce this, at least for update environments that we cover in CI. [1]: https://github.com/openshift/origin/pull/24786 Thanks, Trevor. I will bug you before the next upgrade. - the oc cli version - how to get the sha for an upgrade and how to use it in the oc-adm-update command. This is more than low severity. It caused a representative customer admin team to panic and assume our product was faulty. Updates should be zero-downtime. If the root cause here is an API outage, then assign this bug to the API team or whoever is responsible for the API outage. Or use this bug to improve the condition's reason/message, because currently "Failed to install Operator packageserver version 0.13.0. Reason-ComponentUnhealthy" does not sound like "API outage" to me. OLM should clearly explain why it's failing, so it's clear that another component is responsible for the degradation. It's not OLM's responsibility to raise timeouts to work around bugs in other components. |