Bug 2033379 - Prometheus is not highly available
Summary: Prometheus is not highly available
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Test Framework
Version: 4.8
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.8.z
Assignee: W. Trevor King
QA Contact:
Depends On: 2033378
TreeView+ depends on / blocked
Reported: 2021-12-16 16:09 UTC by OpenShift BugZilla Robot
Modified: 2021-12-20 20:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2021-12-20 20:32:42 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26698 0 None Merged bug 2033379: [release-4.8] remove perma-failing prometheus upgrade invariant 2021-12-20 20:25:41 UTC

Comment 4 Devan Goodwin 2021-12-20 13:57:54 UTC
QE looking for help verifying, I assume you are their best bet @wking.

Comment 5 W. Trevor King 2021-12-20 20:32:42 UTC
repeating the query from [1], but with a reduced maxAge because [2] landed in 4.8 4 days ago:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=Watchdog+alert+had+missing+intervals&maxAge=72h&type=junit' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-single-node (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade-single-node (all) - 3 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-uwm (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 3 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 9 runs, 78% failed, 14% of failures match = 11% impact

So that's... better...  Poking at one of the single-node hits [3]:

  INFO[2021-12-19T22:40:01Z] Resolved release initial to registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-12-10-211525 
  INFO[2021-12-19T22:40:01Z] Resolved release latest to registry.ci.openshift.org/ocp/release:4.8.0-0.ci-2021-12-11-001048 

No idea why they're still running jobs between those older nightlies, but makes sense to me that jobs whose target release doesn't contain the fix will still be impacted.  I'll optimistically close CURRENTRELEASE  based on the reduction in hit volume, and we'll open a new series or come back to this run if we are bothered by this test-case going forward.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2030539#c0
[2]: https://github.com/openshift/origin/pull/26698#event-5781227260
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-single-node/1472698236211826688#1:build-log.txt%3A4

Note You need to log in before you can comment on or make changes to this bug.