Bug 1765294

Summary: Dockercfg secret is not cleaned up when token is deleted
Product: OpenShift Container Platform Reporter: Weibin Liang <weliang>
Component: openshift-controller-managerAssignee: Adam Kaplan <adam.kaplan>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: adam.kaplan, anusaxen, aos-bugs, bparees, ccoleman, lsm5, maszulik, mfojtik, obulatov, pmuller, rmarasch, santiago, surbania
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: devex
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: pull secrets for the internal registry sometimes would not be deleted when their associated token was deleted Consequence: stale pull secrets for the internal registry would remain associated with kubernetes service accounts Fix: owner references were established between the internal registry pull secret and its associated token secret Result: pull secrets are always deleted if the associated token is deleted
Story Points: ---
Clone Of:
: 1779282 1806792 (view as bug list) Environment:
Last Closed: 2020-07-13 17:11:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1806792    

Description Weibin Liang 2019-10-24 18:29:36 UTC
Description of problem:
[Feature:OpenShiftControllerManager] TestDockercfgTokenDeletedController [Suite:openshift/conformance/parallel] 
fail [github.com/onsi/ginkgo/internal/leafnodes/runner.go:113]: timeout: sa1-dockercfg-zdx4x




Additional info:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.3/228
https://testgrid.k8s.io/redhat-openshift-release-4.3-informing-ocp#release-openshift-ocp-installer-e2e-openstack-4.3

Comment 1 Anurag saxena 2019-10-25 21:02:11 UTC
*** Bug 1765739 has been marked as a duplicate of this bug. ***

Comment 2 Sergiusz Urbaniak 2019-10-28 10:46:55 UTC
confirming the issue is still persistent in e2e tests.

Comment 4 wewang 2019-10-31 08:37:58 UTC
It still exist in e2e test: 
4.3.0-0.nightly-2019-10-31-050543   https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.3/294

Comment 5 Adam Kaplan 2019-11-01 20:58:35 UTC
*** Bug 1767655 has been marked as a duplicate of this bug. ***

Comment 6 Ben Parees 2019-11-04 19:44:22 UTC
I wonder if this is a watch issue in the test...can we replace the logic in waitForSecretDelete that looks for the deletion event with an explicit poll that simply looks for the secret in question to go missing?

Comment 7 Ricardo Maraschini 2019-11-07 15:43:31 UTC
I have just sent a patch that migrates away from watch, let's see if it is an issue there.

Comment 8 Oleg Bulatov 2019-11-08 11:45:03 UTC
The PR that Ricardo mentioned: https://github.com/openshift/origin/pull/24103

Comment 10 wewang 2019-11-15 07:14:38 UTC
[Feature:OpenShiftControllerManager] TestDockercfgTokenDeletedController [Suite:openshift/conformance/parallel] is verified in:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.3/801

Comment 11 Petr Muller 2019-11-20 18:19:58 UTC
This test failure also occurred in a machine-os-content promotion job https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-promote-openshift-machine-os-content-e2e-aws-4.3/3992

It looks like the fix is supposed to be in, can you please check if the above is the same thing?

Comment 13 Ricardo Maraschini 2019-11-21 15:16:58 UTC
I have this test running individually here for more than 1 hour. It takes less than 10 seconds to complete and I had not even a single failure. Starting to look to see if there may be any problem due to parallel tests.

Comment 15 Adam Kaplan 2019-11-22 18:31:34 UTC
Moving to 4.4.0, we will likely need to backport to 4.3.0 once we determine the root cause.

Comment 16 Adam Kaplan 2019-11-26 15:42:13 UTC
*** Bug 1776504 has been marked as a duplicate of this bug. ***

Comment 17 Adam Kaplan 2019-11-26 15:42:51 UTC
Moving this to 4.3.0 given the impact of this bug.

Comment 18 Ed Santiago 2019-11-27 15:16:00 UTC
Still seeing this in recent runs:

   https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.3/478

(plus many from last night)

Comment 19 Ben Parees 2019-11-27 15:19:52 UTC
This test was (temporarily) disabled as of 15 hours ago:
https://github.com/openshift/origin/pull/24221

maybe it hadn't made it through the ART cycle though.

Comment 20 Oleg Bulatov 2019-11-27 15:37:20 UTC
It's disabled only in master (4.4). Do we want to disable it in 4.3?

Comment 21 Ben Parees 2019-11-27 15:55:19 UTC
ugh. yes.  thanks Oleg.

Comment 22 Adam Kaplan 2019-12-02 14:32:03 UTC
Note too that once we uncover the root cause of the flake, we need a 4.3 backport anyway for the .0 release or a z-stream update.

Comment 36 Adam Kaplan 2020-03-30 15:06:07 UTC
Reopening. This is likely what is causing the regression in https://bugzilla.redhat.com/show_bug.cgi?id=1785023

Comment 37 Adam Kaplan 2020-04-01 17:23:42 UTC
Moving back to VERIFIED - fix for regression is being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1785023

Comment 39 errata-xmlrpc 2020-07-13 17:11:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409