Bug 2031564

Summary: events should not repeat pathologically: RequiredInstallerResourcesMissing secrets: etcd-all-certs...
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Test FrameworkAssignee: Dennis Periquet <dperique>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.9CC: dperique, sippy
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-30 18:04:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2021-12-12 22:54:48 UTC
openshift-tests-upgrade.[sig-arch] events should not repeat pathologically

is failing frequently in some chained-update CI:

  $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=RequiredInstallerResourcesMissing' | grep 'failures match' | sort
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-from-stable-4.7-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci (all) - 2 runs, 100% failed, 100% of failures match = 100% impact

Recent jobs:

$ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&search=RequiredInstallerResourcesMissing' | jq -r 'keys[]'
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-from-stable-4.7-e2e-aws-upgrade/1469390397758246912
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1469699870255222784
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1470062262755528704
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469398450033397760
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224

Picking [1] to dig into:

  : [sig-arch] events should not repeat pathologically	0s
    1 events happened too frequently

    event happened 33 times, something is wrong: ns/openshift-etcd-operator deployment/etcd-operator - reason/RequiredInstallerResourcesMissing secrets: etcd-all-certs-3

Finding the events:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224/artifacts/e2e-aws-upgrade/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-etcd-operator" and .reason == "RequiredInstallerResourcesMissing" and .count > 10) | .metadata.creationTimestamp + " " + (.count | tostring) + " " + .reason + ": " + .message' | sort
  2021-12-11T20:21:03Z 19 RequiredInstallerResourcesMissing: configmaps: etcd-scripts,restore-etcd-pod, configmaps: etcd-metrics-proxy-client-ca-0,etcd-metrics-proxy-serving-ca-0,etcd-peer-client-ca-0,etcd-pod-0,etcd-serving-ca-0, secrets: etcd-all-peer-0,etcd-all-serving-0,etcd-all-serving-metrics-0
  2021-12-11T20:21:19Z 12 RequiredInstallerResourcesMissing: configmaps: etcd-scripts,restore-etcd-pod, configmaps: etcd-metrics-proxy-client-ca-1,etcd-metrics-proxy-serving-ca-1,etcd-peer-client-ca-1,etcd-pod-1,etcd-serving-ca-1, secrets: etcd-all-peer-1,etcd-all-serving-1,etcd-all-serving-metrics-1
  2021-12-11T20:44:35Z 33 RequiredInstallerResourcesMissing: secrets: etcd-all-certs-3
  2021-12-11T23:23:44Z 28 RequiredInstallerResourcesMissing: configmaps: etcd-endpoints-6

So that's pretty early.  Fitting 20:44:35Z into the updates:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + " " + .state + " " + .version'
  2021-12-11T23:23:08Z - Partial 4.10.0-0.ci-2021-12-11-061053
  2021-12-11T21:56:22Z 2021-12-11T23:22:59Z Completed 4.9.0-0.nightly-2021-12-09-104153
  2021-12-11T20:43:47Z 2021-12-11T21:56:14Z Completed 4.8.24
  2021-12-11T20:15:58Z 2021-12-11T20:42:44Z Completed 4.7.39

So the hot missing-secret event was from shortly after the 4.7.39 to 4.8.24 leg began.  I dunno if this is an etcd issue, or something more on the Kube-core side, or what.  Might also be something that's more widespread in 4.7/4.8 updates, because only 4.9+ origin test suites care about it, which may be why we only notice in these longer update chains:

  origin$ git --no-pager grep 'repeat pathologically' origin/release-4.9
  origin/release-4.9:pkg/synthetictests/duplicated_events.go:     const testName = "[sig-arch] events should not repeat pathologically"
  origin$ git --no-pager grep 'repeat pathologically' origin/release-4.8
  ...no hits...

Assigning to Test Framework 4.9 about possibly relaxing the test coverage, but we could also assign to 4.7 or 4.9 components in charge of avoiding the RequiredInstallerResourcesMissing spew. 

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1469761023668916224

Comment 1 Dennis Periquet 2022-03-28 16:59:14 UTC
I have this PR to separate out this test: https://github.com/openshift/origin/pull/26936

Comment 5 Dennis Periquet 2022-07-20 21:10:54 UTC
I do not think this needs any doc for the release notes.

Comment 6 Rory Thrasher 2024-04-30 18:04:53 UTC
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary