[1]: STEP: Ensuring resource quota status is calculated Dec 18 21:31:37.901: INFO: resource secrets, expected 6, actual 9 ... Dec 18 21:32:05.903: INFO: resource secrets, expected 6, actual 9 [AfterEach] [sig-api-machinery] ResourceQuota ... fail [k8s.io/kubernetes/test/e2e/apimachinery/resource_quota.go:167]: Unexpected error: <*errors.errorString | 0xc0002c8250>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred ... failed: (50.4s) 2019-12-18T21:32:06 "[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a secret. [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]" [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_openshift-controller-manager/58/pull-ci-openshift-openshift-controller-manager-master-e2e-aws/163
13 of those in the past 24h [1], so rare. Also shows up in 4.3, e.g. [2]. Goes back at least to Monday [3]. [1]: https://search.svc.ci.openshift.org/chart?search=resource%20secrets,%20expected%206,%20actual%209&search=failed:.*ResourceQuota%20should%20create%20a%20ResourceQuota%20and%20capture%20the%20life%20of%20a%20secret [2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-fips-4.3/924 [3]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_sdn/86/pull-ci-openshift-sdn-master-e2e-aws/324
This is now happening in almost 75% of 4.5 runs. Bumping urgency.
I meant to say 50%, but either way it's flaking all over the place.
Still all over the place today: $ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.4&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key' | sed 's|/[^/]*$||' | sort | uniq -c 1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.4 1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.4 1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.4 1 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4 $ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-.*-4.5&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key' | sed 's|/[^/]*$||' | sort | uniq -c 8 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5 2 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-fips-4.5 5 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.5 6 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5 2 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.5 4 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.5 Recent AWS jobs, if you want specific ones to dig into: $ curl -s 'https://search.svc.ci.openshift.org/search?name=^release-openshift-ocp-installer-e2e-aws-4.5&search=failed:+.*ResourceQuota+should+create+a+ResourceQuota+and+capture+the+life+of+a+secret&search=resource+secrets,+expected+6,+actual+9&maxAge=24h' | jq -r '. | to_entries[] | select((.value | length) == 2).key' https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/437 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/441 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/446 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/447 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/449 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/452 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/475 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/478 At least the last job there (478) also has bug 1812261 (iptables segfaulting). Not sure if that's related or not.
This is now failing at least once in 80% of 4.5 CI jobs: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5
*** Bug 1811648 has been marked as a duplicate of this bug. ***
Still happening, top flake in release-openshift-origin-installer-e2e-gcp-4.4: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-blocking#release-openshift-origin-installer-e2e-gcp-4.4&sort-by-flakiness=
Now that the test has been disabled, moving this to high and assign. This has to be investigated since it is a regression in behavior. May not be deferred from 4.5.
Note this is still failing in 4.4 and 4.3 and is likely a release blocker for those. This needs investigation in case we have regressed the product.
I think that the way the test counts the expected number of secrets is not deterministic. Especially under moderate load. Please see https://github.com/openshift/origin/pull/24778#issuecomment-604947044
This is likely due to a regression introduced by the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1765294. The controller that creates the image registry's pull secret can fall behind due to it's clients (low) QPS. I am increasing the QPS for this controller in my OCM PR [1]. Fix for this needs to be backported to 4.3.z - the offending code was added to the 4.3.8 release. [1] https://github.com/openshift/openshift-controller-manager/pull/84
*** Bug 1821755 has been marked as a duplicate of this bug. ***
The relevant PRs from this bug that need to be backported to 4.4/4.3 appear to be: https://github.com/openshift/openshift-controller-manager/pull/84 https://github.com/openshift/origin/pull/24776 https://github.com/openshift/origin/pull/24816 (the other linked PRs were either reverts of prior bad changes or debug PRs)
> (the other linked PRs were either reverts of prior bad changes or debug PRs) So is that "https://github.com/openshift/origin/pull/24778 should be closed and/or unlinked from this bug"?
Moving this to MODIFIED - all relevant PRs have merged.
(In reply to Ben Parees from comment #15) > The relevant PRs from this bug that need to be backported to 4.4/4.3 appear > to be: > > https://github.com/openshift/openshift-controller-manager/pull/84 > https://github.com/openshift/origin/pull/24776 > https://github.com/openshift/origin/pull/24816 But then Adam removed 24776 but left 24754?
https://github.com/openshift/origin/pull/24776 is still open, and this bug is ON_QA, so must have been 24754 that needs backporting.
Hi @adam, I have the same question with @W. Trevor King, the pr24776 is not related the bug, right? and I think pr24754 needs to backport to 4.4 and 4.3.
Since pr24776 is about Add test for the bug, not affect to verify the bug, checked a few days result, already passed. [sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a secret. [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]" jobs: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-ocp-installer-e2e-aws-4.5 https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409