Bug 2031564 - events should not repeat pathologically: RequiredInstallerResourcesMissing secrets: etcd-all-certs...
Summary: events should not repeat pathologically: RequiredInstallerResourcesMissing se...
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Test Framework
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.11.0
Assignee: Dennis Periquet
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-12 22:54 UTC by W. Trevor King
Modified: 2022-07-20 21:10 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26936 0 None open Bug 2031564: Separate out test for 'RequiredInstallerResourcesMissing secrets' 2022-04-12 11:27:26 UTC

Description W. Trevor King 2021-12-12 22:54:48 UTC
openshift-tests-upgrade.[sig-arch] events should not repeat pathologically

is failing frequently in some chained-update CI:

  $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=RequiredInstallerResourcesMissing' | grep 'failures match' | sort
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-from-stable-4.7-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci (all) - 2 runs, 100% failed, 100% of failures match = 100% impact

Recent jobs:

$ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&search=RequiredInstallerResourcesMissing' | jq -r 'keys[]'
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-from-stable-4.7-e2e-aws-upgrade/1469390397758246912
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1469699870255222784
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1470062262755528704
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469398450033397760
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224

Picking [1] to dig into:

  : [sig-arch] events should not repeat pathologically	0s
    1 events happened too frequently

    event happened 33 times, something is wrong: ns/openshift-etcd-operator deployment/etcd-operator - reason/RequiredInstallerResourcesMissing secrets: etcd-all-certs-3

Finding the events:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224/artifacts/e2e-aws-upgrade/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-etcd-operator" and .reason == "RequiredInstallerResourcesMissing" and .count > 10) | .metadata.creationTimestamp + " " + (.count | tostring) + " " + .reason + ": " + .message' | sort
  2021-12-11T20:21:03Z 19 RequiredInstallerResourcesMissing: configmaps: etcd-scripts,restore-etcd-pod, configmaps: etcd-metrics-proxy-client-ca-0,etcd-metrics-proxy-serving-ca-0,etcd-peer-client-ca-0,etcd-pod-0,etcd-serving-ca-0, secrets: etcd-all-peer-0,etcd-all-serving-0,etcd-all-serving-metrics-0
  2021-12-11T20:21:19Z 12 RequiredInstallerResourcesMissing: configmaps: etcd-scripts,restore-etcd-pod, configmaps: etcd-metrics-proxy-client-ca-1,etcd-metrics-proxy-serving-ca-1,etcd-peer-client-ca-1,etcd-pod-1,etcd-serving-ca-1, secrets: etcd-all-peer-1,etcd-all-serving-1,etcd-all-serving-metrics-1
  2021-12-11T20:44:35Z 33 RequiredInstallerResourcesMissing: secrets: etcd-all-certs-3
  2021-12-11T23:23:44Z 28 RequiredInstallerResourcesMissing: configmaps: etcd-endpoints-6

So that's pretty early.  Fitting 20:44:35Z into the updates:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + " " + .state + " " + .version'
  2021-12-11T23:23:08Z - Partial 4.10.0-0.ci-2021-12-11-061053
  2021-12-11T21:56:22Z 2021-12-11T23:22:59Z Completed 4.9.0-0.nightly-2021-12-09-104153
  2021-12-11T20:43:47Z 2021-12-11T21:56:14Z Completed 4.8.24
  2021-12-11T20:15:58Z 2021-12-11T20:42:44Z Completed 4.7.39

So the hot missing-secret event was from shortly after the 4.7.39 to 4.8.24 leg began.  I dunno if this is an etcd issue, or something more on the Kube-core side, or what.  Might also be something that's more widespread in 4.7/4.8 updates, because only 4.9+ origin test suites care about it, which may be why we only notice in these longer update chains:

  origin$ git --no-pager grep 'repeat pathologically' origin/release-4.9
  origin/release-4.9:pkg/synthetictests/duplicated_events.go:     const testName = "[sig-arch] events should not repeat pathologically"
  origin$ git --no-pager grep 'repeat pathologically' origin/release-4.8
  ...no hits...

Assigning to Test Framework 4.9 about possibly relaxing the test coverage, but we could also assign to 4.7 or 4.9 components in charge of avoiding the RequiredInstallerResourcesMissing spew. 

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224

Comment 1 Dennis Periquet 2022-03-28 16:59:14 UTC
I have this PR to separate out this test: https://github.com/openshift/origin/pull/26936

Comment 5 Dennis Periquet 2022-07-20 21:10:54 UTC
I do not think this needs any doc for the release notes.


Note You need to log in before you can comment on or make changes to this bug.