Bug 1852916
| Summary: | [sig-apps][Feature:DeploymentConfig] deploymentconfigs adoption will orphan all RCs and adopt them back when recreated | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Corey Daley <cdaley> |
| Component: | openshift-controller-manager | Assignee: | Maciej Szulik <maszulik> |
| Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.5 | CC: | aos-bugs, jerzhang, jsafrane, mfojtik, pweil, wking |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | workloads, LifecycleStale | ||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
[sig-apps][Feature:DeploymentConfig] deploymentconfigs adoption will orphan all RCs and adopt them back when recreated
[Feature:DeploymentConfig] deploymentconfigs adoption [Conformance] will orphan all RCs and adopt them back when recreated
|
|
| Last Closed: | 2020-10-27 16:11:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Corey Daley
2020-07-01 15:07:17 UTC
Linked 'launch' job was via cluster-bot, which you can see by clicking through from the job-detail page to the ProwJob YAML, which opens with:
metadata:
annotations:
ci-chat-bot.openshift.io/channel: ""
ci-chat-bot.openshift.io/expires: "13500"
ci-chat-bot.openshift.io/jobInputs: '[{"Image":"","Version":"4.5.0-0.latest","Refs":[{"org":"openshift","repo":"origin","base_ref":"master","base_sha":"5d8c7f115968f4e3cdc44e047be2585bec6f1e7e","pulls":[{"number":25217,"author":"system:serviceaccount:ci:ci-chat-bot","sha":"db81e9892f66b8c838f545e8051c53df5696d39e"}]},{"org":"openshift","repo":"oauth-server","base_ref":"master","base_sha":"9eee6c6eeaf48f5426dd501c0726f777eb850847","pulls":[{"number":50,"author":"system:serviceaccount:ci:ci-chat-bot","sha":"9de5c160e16fee7fddc29ea2423d083548be8d6b"}]}]}]'
ci-chat-bot.openshift.io/jobParams: test=e2e
ci-chat-bot.openshift.io/mode: test
ci-chat-bot.openshift.io/ns: ci-ln-mbpkz5b
ci-chat-bot.openshift.io/originalMessage: test e2e openshift/origin#25217,openshift/oauth-server#50
ci-chat-bot.openshift.io/platform: aws
ci-chat-bot.openshift.io/user: U9UTT7MAT
prow.k8s.io/job: release-openshift-origin-installer-launch-aws
creationTimestamp: "2020-07-01T11:29:18Z"
generation: 3
labels:
ci-chat-bot.openshift.io/launch: "true"
prow.k8s.io/build-id: ""
prow.k8s.io/id: chat-bot-2020-07-01-112918.7695
prow.k8s.io/job: release-openshift-origin-installer-launch-aws
prow.k8s.io/type: periodic
name: chat-bot-2020-07-01-112918.7695
namespace: ci
resourceVersion: "66292321"
selfLink: /apis/prow.k8s.io/v1/namespaces/ci/prowjobs/chat-bot-2020-07-01-112918.7695
uid: 5e016563-bd89-45a9-a2d1-231e7bf78005
I dunno if we care about 4.5-latest cluster-bot jobs, since it's possible someone was mucking about with the cluster as it ran. Also in the referenced job, you can see that this test-case was flaky, not fatal (it failed once, but passed on retest). When it failed, the failure message was:
timed out waiting for the condition
which is not very helpful. As I mentioned in the similar bug 1852995, I'd be in favor of work to improve that message to say what we timed out waiting for and how far along we were, to give folks some guidance when they to figure out what got stuck and why.
Based on https://sippy-bparees.svc.ci.openshift.org/?release=4.6 this failed 6 times, I'm marking this low priority, accordingly. *** Bug 1852995 has been marked as a duplicate of this bug. *** GCP and Azure 4.6 runs flake on this occasionally still: Example runs: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.6/1292796072975929344 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.6/1292780309439320064 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.6/1292733358316457984 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-4.6/1292785960559316992 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-4.6/1292821383360811008 Noticed this today when debugging unrelated Watch errors
fail [github.com/openshift/origin/test/extended/deployments/util.go:751]: watch closed unexpectedly
Expected
<bool>: false
to be equivalent to
<bool>: true
And (note a different test case!)
[sig-apps][Feature:DeploymentConfig] deploymentconfigs with minimum ready seconds set should not transition the deployment to Complete before satisfied [Suite:openshift/conformance/parallel] expand_less
fail [github.com/openshift/origin@/test/extended/deployments/deployments.go:1090]: Unexpected error:
<*errors.errorString | 0xc000540410>: {
s: "watch closed before UntilWithoutRetry timeout",
}
watch closed before UntilWithoutRetry timeout
occurred
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-compact-4.6/1293389102703448064
There may be some low hanging fruit here: don't use UntilWithoutRetry in https://github.com/openshift/origin/blob/705b5e6987ec79af02b54e749f52f4bc67c455d9/test/extended/deployments/util.go
From UntilWithoutRetry comments:
// Warning: Unless you have a very specific use case (probably a special Watcher) don't use this function!!!
// Warning: This will fail e.g. on API timeouts and/or 'too old resource version' error.
// Warning: You are most probably looking for a function *Until* or *UntilWithSync* below,
// Warning: solving such issues.
// TODO: Consider making this function private to prevent misuse when the other occurrences in our codebase are gone.
(API timeout or similar error is most probably the case here)
This potentially will be fixed in https://github.com/openshift/origin/pull/25408 Fix landed in https://github.com/openshift/origin/pull/25010 Checked from latest 4.6 test runs from Gcp and Azure , can't reproduce this issue now. will move to verified status. https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#release-openshift-ocp-installer-e2e-gcp-4.6 https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#release-openshift-ocp-installer-e2e-azure-4.6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |