Bug 1765294 - Dockercfg secret is not cleaned up when token is deleted
Summary: Dockercfg secret is not cleaned up when token is deleted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-controller-manager
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard: devex
: 1765739 1767655 1776504 (view as bug list)
Depends On:
Blocks: 1806792
TreeView+ depends on / blocked
 
Reported: 2019-10-24 18:29 UTC by Weibin Liang
Modified: 2020-07-13 17:12 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: pull secrets for the internal registry sometimes would not be deleted when their associated token was deleted Consequence: stale pull secrets for the internal registry would remain associated with kubernetes service accounts Fix: owner references were established between the internal registry pull secret and its associated token secret Result: pull secrets are always deleted if the associated token is deleted
Clone Of:
: 1779282 1806792 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:11:31 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-controller-manager pull 54 0 None closed Bug 1765294: Re-list pull secret controllers 2020-10-28 18:48:34 UTC
Github openshift openshift-controller-manager pull 61 0 None closed Bug 1765294: Use OwnerRefs to clean up SA pull secrets 2020-10-28 18:48:33 UTC
Github openshift openshift-controller-manager pull 80 0 None closed Revert "Bug 1765294: Use OwnerRefs to clean up SA pull secrets" 2020-10-28 18:48:33 UTC
Github openshift openshift-controller-manager pull 84 0 None closed Bug 1785023: Increase pull secrets controller QPS to 100 2020-10-28 18:48:33 UTC
Github openshift origin pull 24023 0 'None' closed Bug 1765294: Add additional debugging to to flaky pull secret test 2020-10-28 18:48:33 UTC
Github openshift origin pull 24103 0 'None' closed Bug 1765294: Changing from passive to active approach on secret deletion. 2020-10-28 18:48:49 UTC
Github openshift origin pull 24221 0 'None' closed Bug 1765294: Disable flaking token delete test 2020-10-28 18:48:49 UTC
Github openshift origin pull 24226 0 'None' closed Bug 1765294: Debug flaking token delete test 2020-10-28 18:48:34 UTC
Github openshift origin pull 24278 0 None closed Bug 1765294: Increase timeout for Dockercfg tests 2020-10-28 18:48:34 UTC
Github openshift origin pull 24776 0 None closed Test timing of SA pull secret creation 2020-10-28 18:48:50 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:12:02 UTC

Description Weibin Liang 2019-10-24 18:29:36 UTC
Description of problem:
[Feature:OpenShiftControllerManager] TestDockercfgTokenDeletedController [Suite:openshift/conformance/parallel] 
fail [github.com/onsi/ginkgo/internal/leafnodes/runner.go:113]: timeout: sa1-dockercfg-zdx4x




Additional info:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.3/228
https://testgrid.k8s.io/redhat-openshift-release-4.3-informing-ocp#release-openshift-ocp-installer-e2e-openstack-4.3

Comment 1 Anurag saxena 2019-10-25 21:02:11 UTC
*** Bug 1765739 has been marked as a duplicate of this bug. ***

Comment 2 Sergiusz Urbaniak 2019-10-28 10:46:55 UTC
confirming the issue is still persistent in e2e tests.

Comment 4 wewang 2019-10-31 08:37:58 UTC
It still exist in e2e test: 
4.3.0-0.nightly-2019-10-31-050543   https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.3/294

Comment 5 Adam Kaplan 2019-11-01 20:58:35 UTC
*** Bug 1767655 has been marked as a duplicate of this bug. ***

Comment 6 Ben Parees 2019-11-04 19:44:22 UTC
I wonder if this is a watch issue in the test...can we replace the logic in waitForSecretDelete that looks for the deletion event with an explicit poll that simply looks for the secret in question to go missing?

Comment 7 Ricardo Maraschini 2019-11-07 15:43:31 UTC
I have just sent a patch that migrates away from watch, let's see if it is an issue there.

Comment 8 Oleg Bulatov 2019-11-08 11:45:03 UTC
The PR that Ricardo mentioned: https://github.com/openshift/origin/pull/24103

Comment 10 wewang 2019-11-15 07:14:38 UTC
[Feature:OpenShiftControllerManager] TestDockercfgTokenDeletedController [Suite:openshift/conformance/parallel] is verified in:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.3/801

Comment 11 Petr Muller 2019-11-20 18:19:58 UTC
This test failure also occurred in a machine-os-content promotion job https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-promote-openshift-machine-os-content-e2e-aws-4.3/3992

It looks like the fix is supposed to be in, can you please check if the above is the same thing?

Comment 13 Ricardo Maraschini 2019-11-21 15:16:58 UTC
I have this test running individually here for more than 1 hour. It takes less than 10 seconds to complete and I had not even a single failure. Starting to look to see if there may be any problem due to parallel tests.

Comment 15 Adam Kaplan 2019-11-22 18:31:34 UTC
Moving to 4.4.0, we will likely need to backport to 4.3.0 once we determine the root cause.

Comment 16 Adam Kaplan 2019-11-26 15:42:13 UTC
*** Bug 1776504 has been marked as a duplicate of this bug. ***

Comment 17 Adam Kaplan 2019-11-26 15:42:51 UTC
Moving this to 4.3.0 given the impact of this bug.

Comment 18 Ed Santiago 2019-11-27 15:16:00 UTC
Still seeing this in recent runs:

   https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.3/478

(plus many from last night)

Comment 19 Ben Parees 2019-11-27 15:19:52 UTC
This test was (temporarily) disabled as of 15 hours ago:
https://github.com/openshift/origin/pull/24221

maybe it hadn't made it through the ART cycle though.

Comment 20 Oleg Bulatov 2019-11-27 15:37:20 UTC
It's disabled only in master (4.4). Do we want to disable it in 4.3?

Comment 21 Ben Parees 2019-11-27 15:55:19 UTC
ugh. yes.  thanks Oleg.

Comment 22 Adam Kaplan 2019-12-02 14:32:03 UTC
Note too that once we uncover the root cause of the flake, we need a 4.3 backport anyway for the .0 release or a z-stream update.

Comment 36 Adam Kaplan 2020-03-30 15:06:07 UTC
Reopening. This is likely what is causing the regression in https://bugzilla.redhat.com/show_bug.cgi?id=1785023

Comment 37 Adam Kaplan 2020-04-01 17:23:42 UTC
Moving back to VERIFIED - fix for regression is being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1785023

Comment 39 errata-xmlrpc 2020-07-13 17:11:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.