Bug 1779282 - [4.3.0] Dockercfg secret is not cleaned up when token deleted
Summary: [4.3.0] Dockercfg secret is not cleaned up when token deleted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-controller-manager
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.3.z
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard: devex
: 1814453 (view as bug list)
Depends On: 1806792
Blocks: 1752313
TreeView+ depends on / blocked
 
Reported: 2019-12-03 16:36 UTC by Adam Kaplan
Modified: 2020-04-17 17:52 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: pull secrets for the internal registry sometimes would not be deleted when their associated token was deleted Consequence: stale pull secrets for the internal registry would remain associated with kubernetes service accounts Fix: owner references were established between the internal registry pull secret and its associated token secret Result: pull secrets are always deleted if the associated token is deleted
Clone Of: 1765294
Environment:
Last Closed: 2020-04-08 07:39:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-controller-manager pull 72 0 None closed [release 4.3] Bug 1779282: Use OwnerRefs to clean up SA pull secrets 2020-10-28 18:49:03 UTC
Github openshift origin pull 24253 0 'None' closed Bug 1779282: Debug flaking token delete test 2020-10-28 18:49:04 UTC
Red Hat Product Errata RHBA-2020:1262 0 None None None 2020-04-08 07:40:03 UTC

Description Adam Kaplan 2019-12-03 16:36:39 UTC
+++ This bug was initially created as a clone of Bug #1765294 +++

Description of problem:
[Feature:OpenShiftControllerManager] TestDockercfgTokenDeletedController [Suite:openshift/conformance/parallel] 
fail [github.com/onsi/ginkgo/internal/leafnodes/runner.go:113]: timeout: sa1-dockercfg-zdx4x




Additional info:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.3/228
https://testgrid.k8s.io/redhat-openshift-release-4.3-informing-ocp#release-openshift-ocp-installer-e2e-openstack-4.3

--- Additional comment from Anurag saxena on 2019-10-25 21:02:11 UTC ---



--- Additional comment from Sergiusz Urbaniak on 2019-10-28 10:46:55 UTC ---

confirming the issue is still persistent in e2e tests.

--- Additional comment from errata-xmlrpc on 2019-10-29 20:27:32 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2019:46256-02
https://errata.devel.redhat.com/advisory/46256

--- Additional comment from wewang on 2019-10-31 08:37:58 UTC ---

It still exist in e2e test: 
4.3.0-0.nightly-2019-10-31-050543   https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.3/294

--- Additional comment from Adam Kaplan on 2019-11-01 20:58:35 UTC ---



--- Additional comment from Ben Parees on 2019-11-04 19:44:22 UTC ---

I wonder if this is a watch issue in the test...can we replace the logic in waitForSecretDelete that looks for the deletion event with an explicit poll that simply looks for the secret in question to go missing?

--- Additional comment from Ricardo Maraschini on 2019-11-07 15:43:31 UTC ---

I have just sent a patch that migrates away from watch, let's see if it is an issue there.

--- Additional comment from Oleg Bulatov on 2019-11-08 11:45:03 UTC ---

The PR that Ricardo mentioned: https://github.com/openshift/origin/pull/24103

--- Additional comment from OpenShift Automated Release Tooling on 2019-11-13 19:28:01 UTC ---

Elliott changed bug status from ('MODIFIED',) to ON_QA.

--- Additional comment from wewang on 2019-11-15 07:14:38 UTC ---

[Feature:OpenShiftControllerManager] TestDockercfgTokenDeletedController [Suite:openshift/conformance/parallel] is verified in:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.3/801

--- Additional comment from Petr Muller on 2019-11-20 18:19:58 UTC ---

This test failure also occurred in a machine-os-content promotion job https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-promote-openshift-machine-os-content-e2e-aws-4.3/3992

It looks like the fix is supposed to be in, can you please check if the above is the same thing?

--- Additional comment from Oleg Bulatov on 2019-11-21 00:15:33 UTC ---

Watch has been replaced by Poll, but the test still flakes [1].

[1] https://testgrid.k8s.io/redhat-openshift-ocp-release-4.3-blocking#release-openshift-origin-installer-e2e-gcp-4.3&include-filter-by-regex=TestDockercfgTokenDeletedController

--- Additional comment from Ricardo Maraschini on 2019-11-21 15:16:58 UTC ---

I have this test running individually here for more than 1 hour. It takes less than 10 seconds to complete and I had not even a single failure. Starting to look to see if there may be any problem due to parallel tests.

--- Additional comment from Oleg Bulatov on 2019-11-21 16:41:15 UTC ---

Adding debug information to the test: https://github.com/openshift/origin/pull/24187

--- Additional comment from Adam Kaplan on 2019-11-22 18:31:34 UTC ---

Moving to 4.4.0, we will likely need to backport to 4.3.0 once we determine the root cause.

--- Additional comment from Adam Kaplan on 2019-11-26 15:42:13 UTC ---



--- Additional comment from Adam Kaplan on 2019-11-26 15:42:51 UTC ---

Moving this to 4.3.0 given the impact of this bug.

--- Additional comment from Ed Santiago on 2019-11-27 15:16:00 UTC ---

Still seeing this in recent runs:

   https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.3/478

(plus many from last night)

--- Additional comment from Ben Parees on 2019-11-27 15:19:52 UTC ---

This test was (temporarily) disabled as of 15 hours ago:
https://github.com/openshift/origin/pull/24221

maybe it hadn't made it through the ART cycle though.

--- Additional comment from Oleg Bulatov on 2019-11-27 15:37:20 UTC ---

It's disabled only in master (4.4). Do we want to disable it in 4.3?

--- Additional comment from Ben Parees on 2019-11-27 15:55:19 UTC ---

ugh. yes.  thanks Oleg.

--- Additional comment from Adam Kaplan on 2019-12-02 14:32:03 UTC ---

Note too that once we uncover the root cause of the flake, we need a 4.3 backport anyway for the .0 release or a z-stream update.

Comment 2 wewang 2019-12-06 08:08:40 UTC
all attached PRs are only about gathering additional information. Switching back to ASSIGNED

Comment 8 Adam Kaplan 2020-03-18 15:19:08 UTC
*** Bug 1814453 has been marked as a duplicate of this bug. ***

Comment 9 W. Trevor King 2020-03-18 20:56:53 UTC
From [1]:

> The backport to 4.3.z is on hold until 4.4.0 goes GA.

Also [2].  But the 4.4 bug 1806792 is VERIFIED, we run a lot of 4.4 CI, and we have 4.4 RCs out in the wild.  Can we declare "soaked enough" at some point before 4.4.0 and land this backport to address the most common cause of 4.3 CI failures (which is what this bug was yesterday, although today other failure modes have pulled ahead ;).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1814453#c2
[2]: https://github.com/openshift/openshift-controller-manager/pull/72#pullrequestreview-371497080

Comment 12 wewang 2020-03-26 03:30:37 UTC
Checked in version:
4.3.0-0.ci-2020-03-26-003534

[Feature:OpenShiftControllerManager] TestDockercfgTokenDeletedController [Suite:openshift/conformance/parallel] 
passed in job: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.3/1670

Comment 13 Adam Kaplan 2020-03-30 15:11:24 UTC
Reopening. This likely caused the regression in https://bugzilla.redhat.com/show_bug.cgi?id=1785023.

Comment 14 Adam Kaplan 2020-04-01 17:25:48 UTC
Moving back to VERIFIED - fix for regression is being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1785023 and its dependent BZs.

Comment 16 errata-xmlrpc 2020-04-08 07:39:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1262


Note You need to log in before you can comment on or make changes to this bug.