Bug 1808143
Summary: | Nodes periodically going NotReady with multiple failed services | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | brad.williams |
Component: | Node | Assignee: | Ryan Phillips <rphillips> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aos-bugs, jeder, jokerman, lmohanty, mwoodson, nmalik, rphillips, sdodson, vrutkovs, wking |
Version: | 4.3.z | Keywords: | Upgrades |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-02 17:12:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
brad.williams
2020-02-27 22:54:58 UTC
To help with impact analysis we need to find answers to the following questions. It is fine if we do not answer some of these questions at this point of time, but we should try to get answers. What symptoms (in Telemetry, Insights, etc.) does a cluster experiencing this bug exhibit? What kind of clusters are impacted because of the bug? What cluster functionality is degraded while hitting the bug? Does the upgrade complete? What is the expected rate of the failure (%) for vulnerable clusters which attempt the update? What is the observed rate of failure we see in CI? Can this bug cause data loss? Data loss = API server data loss or CRD state information loss etc. Is it possible to recover the cluster from the bug? Is recovery automatic without intervention? I.e. is the condition transient? Is recovery possible with the only intervention being 'oc adm upgrade …' to a new release image with a fix? Is there a manual workaround that exists to recover from the bug? What are manual steps? How long before the bug is fixed? Is this a regression? From which version does this regression exist? This should be fixed with https://github.com/openshift/origin/pull/24611 and https://bugzilla.redhat.com/show_bug.cgi?id=1800319 *** This bug has been marked as a duplicate of bug 1808429 *** Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |