Bug 1763295
Summary: | SyncingFailed and other waitForDeploymentRollout consumers often show only 'timed out waiting for the condition' | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Cloud Compute | Assignee: | Alberto <agarcial> |
Cloud Compute sub component: | Other Providers | QA Contact: | Jianwei Hou <jhou> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | agarcial, jhou, lmohanty, mgugino, wking |
Version: | 4.2.z | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.2.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1763293 | Environment: | |
Last Closed: | 2020-05-19 13:01:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1763293, 1763772 | ||
Bug Blocks: |
Description
W. Trevor King
2019-10-18 17:15:15 UTC
Not sure why I was the assignee here. Saw this in below upgrade tests for testing upgrade to 4.2.30 [1] https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/669 [2] https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/668 [3] https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/667 The operator appeared to be working fine. machine-controller (container) was in crashloop backoff in the replacement deployment. This is likely due to some misconfiguration in a build that got promoted at some point. Unfortunately, the must-gather does not include any logs in any of the linked cases for the pods in crash loop backoff. This is a bug in must-gather functionality, I think. I have seen this before with another failed build (semver was not parsible and causing the controller to fail immediately upon startup). I will get another bug open for gathering the failed container logs in the must-gather logs. > The operator appeared to be working fine.
It may be. The point of this bug is that "timed out waiting for the condition" is a garbage message. The operator should at least tell us what it was waiting for that timed out. And ideally, give some hints about the impact of the degraded condition and suggest some mitigation steps, although I don't think we need to be that good before we can close this bug.
(In reply to W. Trevor King from comment #4) > > The operator appeared to be working fine. > > It may be. The point of this bug is that "timed out waiting for the > condition" is a garbage message. The operator should at least tell us what > it was waiting for that timed out. And ideally, give some hints about the > impact of the degraded condition and suggest some mitigation steps, although > I don't think we need to be that good before we can close this bug. Okay, I think that is a good assessment. I changed the target release to 4.6, I think we should have a clearer message here as well. This bug is a clone of https://bugzilla.redhat.com/show_bug.cgi?id=1763293 which was addressed by https://github.com/openshift/machine-api-operator/pull/417 which is included in >=4.3. Moving to 4.2 to backport. Ok, if this is just the clone of something already fix, we don't need to backport to 4.2 at this point. This is just cleaning up an error message for something that otherwise works. |