Bug 1785023 - CI: "ResourceQuota should create a ResourceQuota and capture the life of a secret" with "expected 6, actual 9"
Summary: CI: "ResourceQuota should create a ResourceQuota and capture the life of a se...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-controller-manager
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard:
: 1811648 1821755 (view as bug list)
Depends On:
Blocks: 1819849
TreeView+ depends on / blocked
 
Reported: 2019-12-18 22:20 UTC by W. Trevor King
Modified: 2020-07-13 17:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: client used to create pull secrets for the internal registry had a low rate limit Consequence: if a large number of namespaces are created in a short time window, it could take a long time for image registry pull secrets to be created Fix: increased the rate limit for the client used to create pull secrets for the internal registry Result: internal registry pull secrets are quickly created, even under heavy load
Clone Of:
: 1819849 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:12:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-controller-manager pull 84 0 None closed Bug 1785023: Increase pull secrets controller QPS to 100 2021-02-14 07:30:16 UTC
Github openshift origin pull 24754 0 None closed Bug 1785023: disable ResourceQuota e2e test for life of a secret 2021-02-14 07:30:16 UTC
Github openshift origin pull 24816 0 None closed Bug 1785023: Restore ResourceQuota e2e test for life of a secret 2021-02-14 07:30:16 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:13:19 UTC

Description W. Trevor King 2019-12-18 22:20:09 UTC
[1]:

STEP: Ensuring resource quota status is calculated
Dec 18 21:31:37.901: INFO: resource secrets, expected 6, actual 9
...
Dec 18 21:32:05.903: INFO: resource secrets, expected 6, actual 9
[AfterEach] [sig-api-machinery] ResourceQuota
...
fail [k8s.io/kubernetes/test/e2e/apimachinery/resource_quota.go:167]: Unexpected error:
    <*errors.errorString | 0xc0002c8250>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred
...

failed: (50.4s) 2019-12-18T21:32:06 "[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a secret. [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_openshift-controller-manager/58/pull-ci-openshift-openshift-controller-manager-master-e2e-aws/163

Comment 2 Clayton Coleman 2020-03-10 03:37:03 UTC
This is now happening in almost 75% of 4.5 runs.  Bumping urgency.

Comment 3 Clayton Coleman 2020-03-10 03:37:38 UTC
I meant to say 50%, but either way it's flaking all over the place.

Comment 4 W. Trevor King 2020-03-17 22:49:57 UTC
Still all over the place today:

$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.4&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key' | sed 's|/[^/]*$||' | sort | uniq -c
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.4
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.4
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.4
      1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4
$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.5&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key' | sed 's|/[^/]*$||' | sort | uniq -c
      8 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5
      2 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-fips-4.5
      5 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.5
      6 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5
      2 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.5
      4 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.5

Recent AWS jobs, if you want specific ones to dig into:

$ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-e2e-aws-4.5&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key'
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/437
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/441
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/446
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/447
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/449
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/452
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/475
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/478

At least the last job there (478) also has bug 1812261 (iptables segfaulting).  Not sure if that's related or not.

Comment 5 Clayton Coleman 2020-03-23 18:17:31 UTC
This is now failing at least once in 80% of 4.5 CI jobs:

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5

Comment 6 Clayton Coleman 2020-03-23 21:38:53 UTC
*** Bug 1811648 has been marked as a duplicate of this bug. ***

Comment 7 Petr Muller 2020-03-24 17:24:59 UTC
Still happening, top flake in

release-openshift-origin-installer-e2e-gcp-4.4: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-blocking#release-openshift-origin-installer-e2e-gcp-4.4&sort-by-flakiness=

Comment 10 Clayton Coleman 2020-03-26 14:54:00 UTC
Now that the test has been disabled, moving this to high and assign.  This has to be investigated since it is a regression in behavior.  May not be deferred from 4.5.

Comment 11 Clayton Coleman 2020-03-26 14:55:59 UTC
Note this is still failing in 4.4 and 4.3 and is likely a release blocker for those.  This needs investigation in case we have regressed the product.

Comment 12 Lukasz Szaszkiewicz 2020-03-27 11:48:02 UTC
I think that the way the test counts the expected number of secrets is not deterministic. Especially under moderate load. Please see https://github.com/openshift/origin/pull/24778#issuecomment-604947044

Comment 13 Adam Kaplan 2020-04-01 17:21:44 UTC
This is likely due to a regression introduced by the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1765294. The controller that creates the image registry's pull secret can fall behind due to it's clients (low) QPS.
I am increasing the QPS for this controller in my OCM PR [1].

Fix for this needs to be backported to 4.3.z - the offending code was added to the 4.3.8 release.

[1] https://github.com/openshift/openshift-controller-manager/pull/84

Comment 14 Maciej Szulik 2020-04-07 16:10:48 UTC
*** Bug 1821755 has been marked as a duplicate of this bug. ***

Comment 15 Ben Parees 2020-04-07 17:22:48 UTC
The relevant PRs from this bug that need to be backported to 4.4/4.3 appear to be:

https://github.com/openshift/openshift-controller-manager/pull/84
https://github.com/openshift/origin/pull/24776
https://github.com/openshift/origin/pull/24816

(the other linked PRs were either reverts of prior bad changes or debug PRs)

Comment 16 W. Trevor King 2020-04-07 17:54:36 UTC
> (the other linked PRs were either reverts of prior bad changes or debug PRs)

So is that "https://github.com/openshift/origin/pull/24778 should be closed and/or unlinked from this bug"?

Comment 17 Adam Kaplan 2020-04-09 13:38:00 UTC
Moving this to MODIFIED - all relevant PRs have merged.

Comment 19 W. Trevor King 2020-04-09 22:46:37 UTC
(In reply to Ben Parees from comment #15)
> The relevant PRs from this bug that need to be backported to 4.4/4.3 appear
> to be:
> 
> https://github.com/openshift/openshift-controller-manager/pull/84
> https://github.com/openshift/origin/pull/24776
> https://github.com/openshift/origin/pull/24816

But then Adam removed 24776 but left 24754?

Comment 20 W. Trevor King 2020-04-09 22:47:42 UTC
https://github.com/openshift/origin/pull/24776 is still open, and this bug is ON_QA, so must have been 24754 that needs backporting.

Comment 21 wewang 2020-04-10 08:19:31 UTC
Hi @adam, I have the same question with @W. Trevor King, the pr24776 is not related the bug, right? and I think pr24754 needs to backport to 4.4 and 4.3.

Comment 22 wewang 2020-04-15 01:50:57 UTC
Since pr24776 is about Add test for the bug, not affect to verify the bug, checked a few days result, already passed.

[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a secret. [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"
jobs:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-ocp-installer-e2e-aws-4.5
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5

Comment 24 errata-xmlrpc 2020-07-13 17:12:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.