openshift-tests-upgrade.[sig-arch] events should not repeat pathologically is failing frequently in some chained-update CI: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=RequiredInstallerResourcesMissing' | grep 'failures match' | sort periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-from-stable-4.7-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci (all) - 2 runs, 100% failed, 100% of failures match = 100% impact release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci (all) - 2 runs, 100% failed, 100% of failures match = 100% impact Recent jobs: $ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&search=RequiredInstallerResourcesMissing' | jq -r 'keys[]' https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-from-stable-4.7-e2e-aws-upgrade/1469390397758246912 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1469699870255222784 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1470062262755528704 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469398450033397760 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224 Picking [1] to dig into: : [sig-arch] events should not repeat pathologically 0s 1 events happened too frequently event happened 33 times, something is wrong: ns/openshift-etcd-operator deployment/etcd-operator - reason/RequiredInstallerResourcesMissing secrets: etcd-all-certs-3 Finding the events: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224/artifacts/e2e-aws-upgrade/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-etcd-operator" and .reason == "RequiredInstallerResourcesMissing" and .count > 10) | .metadata.creationTimestamp + " " + (.count | tostring) + " " + .reason + ": " + .message' | sort 2021-12-11T20:21:03Z 19 RequiredInstallerResourcesMissing: configmaps: etcd-scripts,restore-etcd-pod, configmaps: etcd-metrics-proxy-client-ca-0,etcd-metrics-proxy-serving-ca-0,etcd-peer-client-ca-0,etcd-pod-0,etcd-serving-ca-0, secrets: etcd-all-peer-0,etcd-all-serving-0,etcd-all-serving-metrics-0 2021-12-11T20:21:19Z 12 RequiredInstallerResourcesMissing: configmaps: etcd-scripts,restore-etcd-pod, configmaps: etcd-metrics-proxy-client-ca-1,etcd-metrics-proxy-serving-ca-1,etcd-peer-client-ca-1,etcd-pod-1,etcd-serving-ca-1, secrets: etcd-all-peer-1,etcd-all-serving-1,etcd-all-serving-metrics-1 2021-12-11T20:44:35Z 33 RequiredInstallerResourcesMissing: secrets: etcd-all-certs-3 2021-12-11T23:23:44Z 28 RequiredInstallerResourcesMissing: configmaps: etcd-endpoints-6 So that's pretty early. Fitting 20:44:35Z into the updates: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + " " + .state + " " + .version' 2021-12-11T23:23:08Z - Partial 4.10.0-0.ci-2021-12-11-061053 2021-12-11T21:56:22Z 2021-12-11T23:22:59Z Completed 4.9.0-0.nightly-2021-12-09-104153 2021-12-11T20:43:47Z 2021-12-11T21:56:14Z Completed 4.8.24 2021-12-11T20:15:58Z 2021-12-11T20:42:44Z Completed 4.7.39 So the hot missing-secret event was from shortly after the 4.7.39 to 4.8.24 leg began. I dunno if this is an etcd issue, or something more on the Kube-core side, or what. Might also be something that's more widespread in 4.7/4.8 updates, because only 4.9+ origin test suites care about it, which may be why we only notice in these longer update chains: origin$ git --no-pager grep 'repeat pathologically' origin/release-4.9 origin/release-4.9:pkg/synthetictests/duplicated_events.go: const testName = "[sig-arch] events should not repeat pathologically" origin$ git --no-pager grep 'repeat pathologically' origin/release-4.8 ...no hits... Assigning to Test Framework 4.9 about possibly relaxing the test coverage, but we could also assign to 4.7 or 4.9 components in charge of avoiding the RequiredInstallerResourcesMissing spew. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224
I have this PR to separate out this test: https://github.com/openshift/origin/pull/26936
this shows the test is separated out https://sippy.ci.openshift.org/sippy-ng/tests/4.11?filters=%7B%22items%22%3A%5B%7B%22id%22%3A1%2C%22columnField%22%3A%22%22%2C%22operatorValue%22%3A%22%22%2C%22value%22%3A%22%22%7D%2C%7B%22id%22%3A99%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22%20should%20not%20see%20excessive%20RequiredInstallerResourcesMissing%20secrets%22%7D%5D%7D
I do not think this needs any doc for the release notes.