Bug 1785023

Summary: CI: "ResourceQuota should create a ResourceQuota and capture the life of a secret" with "expected 6, actual 9"
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: openshift-controller-managerAssignee: Adam Kaplan <adam.kaplan>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.zCC: adam.kaplan, aos-bugs, ccoleman, ikarpukh, mfojtik, pmuller, skumari, xxia
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: client used to create pull secrets for the internal registry had a low rate limit Consequence: if a large number of namespaces are created in a short time window, it could take a long time for image registry pull secrets to be created Fix: increased the rate limit for the client used to create pull secrets for the internal registry Result: internal registry pull secrets are quickly created, even under heavy load
Story Points: ---
Clone Of:
: 1819849 (view as bug list) Environment:
Last Closed: 2020-07-13 17:12:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1819849    

Description W. Trevor King 2019-12-18 22:20:09 UTC
[1]:

STEP: Ensuring resource quota status is calculated
Dec 18 21:31:37.901: INFO: resource secrets, expected 6, actual 9
...
Dec 18 21:32:05.903: INFO: resource secrets, expected 6, actual 9
[AfterEach] [sig-api-machinery] ResourceQuota
...
fail [k8s.io/kubernetes/test/e2e/apimachinery/resource_quota.go:167]: Unexpected error:
    <*errors.errorString | 0xc0002c8250>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred
...

failed: (50.4s) 2019-12-18T21:32:06 "[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a secret. [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_openshift-controller-manager/58/pull-ci-openshift-openshift-controller-manager-master-e2e-aws/163

Comment 2 Clayton Coleman 2020-03-10 03:37:03 UTC
This is now happening in almost 75% of 4.5 runs.  Bumping urgency.

Comment 3 Clayton Coleman 2020-03-10 03:37:38 UTC
I meant to say 50%, but either way it's flaking all over the place.

Comment 4 W. Trevor King 2020-03-17 22:49:57 UTC
Still all over the place today:

$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.4&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key' | sed 's|/[^/]*$||' | sort | uniq -c
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.4
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.4
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.4
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4
$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.5&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key' | sed 's|/[^/]*$||' | sort | uniq -c
      8 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5
      2 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-fips-4.5
      5 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.5
      6 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5
      2 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.5
      4 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.5

Recent AWS jobs, if you want specific ones to dig into:

$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-e2e-aws-4.5&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key'
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/437
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/441
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/446
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/447
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/449
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/452
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/475
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/478

At least the last job there (478) also has bug 1812261 (iptables segfaulting).  Not sure if that's related or not.

Comment 5 Clayton Coleman 2020-03-23 18:17:31 UTC
This is now failing at least once in 80% of 4.5 CI jobs:

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5

Comment 6 Clayton Coleman 2020-03-23 21:38:53 UTC
*** Bug 1811648 has been marked as a duplicate of this bug. ***

Comment 7 Petr Muller 2020-03-24 17:24:59 UTC
Still happening, top flake in

release-openshift-origin-installer-e2e-gcp-4.4: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-blocking#release-openshift-origin-installer-e2e-gcp-4.4&sort-by-flakiness=

Comment 10 Clayton Coleman 2020-03-26 14:54:00 UTC
Now that the test has been disabled, moving this to high and assign.  This has to be investigated since it is a regression in behavior.  May not be deferred from 4.5.

Comment 11 Clayton Coleman 2020-03-26 14:55:59 UTC
Note this is still failing in 4.4 and 4.3 and is likely a release blocker for those.  This needs investigation in case we have regressed the product.

Comment 12 Lukasz Szaszkiewicz 2020-03-27 11:48:02 UTC
I think that the way the test counts the expected number of secrets is not deterministic. Especially under moderate load. Please see https://github.com/openshift/origin/pull/24778#issuecomment-604947044

Comment 13 Adam Kaplan 2020-04-01 17:21:44 UTC
This is likely due to a regression introduced by the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1765294. The controller that creates the image registry's pull secret can fall behind due to it's clients (low) QPS.
I am increasing the QPS for this controller in my OCM PR [1].

Fix for this needs to be backported to 4.3.z - the offending code was added to the 4.3.8 release.

[1] https://github.com/openshift/openshift-controller-manager/pull/84

Comment 14 Maciej Szulik 2020-04-07 16:10:48 UTC
*** Bug 1821755 has been marked as a duplicate of this bug. ***

Comment 15 Ben Parees 2020-04-07 17:22:48 UTC
The relevant PRs from this bug that need to be backported to 4.4/4.3 appear to be:

https://github.com/openshift/openshift-controller-manager/pull/84
https://github.com/openshift/origin/pull/24776
https://github.com/openshift/origin/pull/24816

(the other linked PRs were either reverts of prior bad changes or debug PRs)

Comment 16 W. Trevor King 2020-04-07 17:54:36 UTC
> (the other linked PRs were either reverts of prior bad changes or debug PRs)

So is that "https://github.com/openshift/origin/pull/24778 should be closed and/or unlinked from this bug"?

Comment 17 Adam Kaplan 2020-04-09 13:38:00 UTC
Moving this to MODIFIED - all relevant PRs have merged.

Comment 19 W. Trevor King 2020-04-09 22:46:37 UTC
(In reply to Ben Parees from comment #15)
> The relevant PRs from this bug that need to be backported to 4.4/4.3 appear
> to be:
> 
> https://github.com/openshift/openshift-controller-manager/pull/84
> https://github.com/openshift/origin/pull/24776
> https://github.com/openshift/origin/pull/24816

But then Adam removed 24776 but left 24754?

Comment 20 W. Trevor King 2020-04-09 22:47:42 UTC
https://github.com/openshift/origin/pull/24776 is still open, and this bug is ON_QA, so must have been 24754 that needs backporting.

Comment 21 wewang 2020-04-10 08:19:31 UTC
Hi @adam, I have the same question with @W. Trevor King, the pr24776 is not related the bug, right? and I think pr24754 needs to backport to 4.4 and 4.3.

Comment 22 wewang 2020-04-15 01:50:57 UTC
Since pr24776 is about Add test for the bug, not affect to verify the bug, checked a few days result, already passed.

[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a secret. [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"
jobs:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-ocp-installer-e2e-aws-4.5
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5

Comment 24 errata-xmlrpc 2020-07-13 17:12:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409