Bug 1852916

Summary: [sig-apps][Feature:DeploymentConfig] deploymentconfigs adoption will orphan all RCs and adopt them back when recreated
Product: OpenShift Container Platform Reporter: Corey Daley <cdaley>
Component: openshift-controller-managerAssignee: Maciej Szulik <maszulik>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.5CC: aos-bugs, jerzhang, jsafrane, mfojtik, pweil, wking
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: workloads, LifecycleStale
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-apps][Feature:DeploymentConfig] deploymentconfigs adoption will orphan all RCs and adopt them back when recreated [Feature:DeploymentConfig] deploymentconfigs adoption [Conformance] will orphan all RCs and adopt them back when recreated
Last Closed: 2020-10-27 16:11:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 W. Trevor King 2020-07-02 04:38:51 UTC
Linked 'launch' job was via cluster-bot, which you can see by clicking through from the job-detail page to the ProwJob YAML, which opens with:

metadata:
  annotations:
    ci-chat-bot.openshift.io/channel: ""
    ci-chat-bot.openshift.io/expires: "13500"
    ci-chat-bot.openshift.io/jobInputs: '[{"Image":"","Version":"4.5.0-0.latest","Refs":[{"org":"openshift","repo":"origin","base_ref":"master","base_sha":"5d8c7f115968f4e3cdc44e047be2585bec6f1e7e","pulls":[{"number":25217,"author":"system:serviceaccount:ci:ci-chat-bot","sha":"db81e9892f66b8c838f545e8051c53df5696d39e"}]},{"org":"openshift","repo":"oauth-server","base_ref":"master","base_sha":"9eee6c6eeaf48f5426dd501c0726f777eb850847","pulls":[{"number":50,"author":"system:serviceaccount:ci:ci-chat-bot","sha":"9de5c160e16fee7fddc29ea2423d083548be8d6b"}]}]}]'
    ci-chat-bot.openshift.io/jobParams: test=e2e
    ci-chat-bot.openshift.io/mode: test
    ci-chat-bot.openshift.io/ns: ci-ln-mbpkz5b
    ci-chat-bot.openshift.io/originalMessage: test e2e openshift/origin#25217,openshift/oauth-server#50
    ci-chat-bot.openshift.io/platform: aws
    ci-chat-bot.openshift.io/user: U9UTT7MAT
    prow.k8s.io/job: release-openshift-origin-installer-launch-aws
  creationTimestamp: "2020-07-01T11:29:18Z"
  generation: 3
  labels:
    ci-chat-bot.openshift.io/launch: "true"
    prow.k8s.io/build-id: ""
    prow.k8s.io/id: chat-bot-2020-07-01-112918.7695
    prow.k8s.io/job: release-openshift-origin-installer-launch-aws
    prow.k8s.io/type: periodic
  name: chat-bot-2020-07-01-112918.7695
  namespace: ci
  resourceVersion: "66292321"
  selfLink: /apis/prow.k8s.io/v1/namespaces/ci/prowjobs/chat-bot-2020-07-01-112918.7695
  uid: 5e016563-bd89-45a9-a2d1-231e7bf78005

I dunno if we care about 4.5-latest cluster-bot jobs, since it's possible someone was mucking about with the cluster as it ran.  Also in the referenced job, you can see that this test-case was flaky, not fatal (it failed once, but passed on retest).  When it failed, the failure message was:

  timed out waiting for the condition

which is not very helpful.  As I mentioned in the similar bug 1852995, I'd be in favor of work to improve that message to say what we timed out waiting for and how far along we were, to give folks some guidance when they to figure out what got stuck and why.

Comment 3 Maciej Szulik 2020-07-02 08:09:27 UTC
Based on https://sippy-bparees.svc.ci.openshift.org/?release=4.6 this failed 6 times, I'm marking this low priority, accordingly.

Comment 4 Maciej Szulik 2020-07-02 08:09:36 UTC
*** Bug 1852995 has been marked as a duplicate of this bug. ***

Comment 6 Jan Safranek 2020-08-12 09:06:11 UTC
Noticed this today when debugging unrelated Watch errors

fail [github.com/openshift/origin/test/extended/deployments/util.go:751]: watch closed unexpectedly
Expected
    <bool>: false
to be equivalent to
    <bool>: true


And (note a different test case!)

[sig-apps][Feature:DeploymentConfig] deploymentconfigs with minimum ready seconds set should not transition the deployment to Complete before satisfied [Suite:openshift/conformance/parallel] expand_less

fail [github.com/openshift/origin@/test/extended/deployments/deployments.go:1090]: Unexpected error:
    <*errors.errorString | 0xc000540410>: {
        s: "watch closed before UntilWithoutRetry timeout",
    }
    watch closed before UntilWithoutRetry timeout
occurred

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-compact-4.6/1293389102703448064


There may be some low hanging fruit here: don't use UntilWithoutRetry in https://github.com/openshift/origin/blob/705b5e6987ec79af02b54e749f52f4bc67c455d9/test/extended/deployments/util.go

From UntilWithoutRetry comments:
// Warning: Unless you have a very specific use case (probably a special Watcher) don't use this function!!!
// Warning: This will fail e.g. on API timeouts and/or 'too old resource version' error.
// Warning: You are most probably looking for a function *Until* or *UntilWithSync* below,
// Warning: solving such issues.
// TODO: Consider making this function private to prevent misuse when the other occurrences in our codebase are gone.

(API timeout or similar error is most probably the case here)

Comment 7 Maciej Szulik 2020-08-12 11:14:03 UTC
This potentially will be fixed in https://github.com/openshift/origin/pull/25408

Comment 8 Maciej Szulik 2020-09-10 18:47:30 UTC
Fix landed in https://github.com/openshift/origin/pull/25010

Comment 10 zhou ying 2020-09-15 02:26:11 UTC
Checked from latest 4.6 test runs from Gcp and Azure , can't reproduce this issue now. will move to verified status. 

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#release-openshift-ocp-installer-e2e-gcp-4.6
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#release-openshift-ocp-installer-e2e-azure-4.6

Comment 13 errata-xmlrpc 2020-10-27 16:11:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196