Bug 2052058 - OSUpdateStaged CI tests failing unnecessarily on known CI infra defect
Summary: OSUpdateStaged CI tests failing unnecessarily on known CI infra defect
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Test Framework
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Devan Goodwin
QA Contact:
URL:
Whiteboard:
Depends On: 2052497
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-08 15:37 UTC by Devan Goodwin
Modified: 2022-08-26 15:21 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2052497 (view as bug list)
Environment:
Last Closed: 2022-08-26 15:21:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26830 0 None open Bug 2052058: Break out test for OSUpdateStaged event with no OSUpdateStarted 2022-02-09 12:06:52 UTC

Description Devan Goodwin 2022-02-08 15:37:49 UTC
Description of problem:

The test "[bz-Machine Config Operator] Nodes should reach OSUpdateStaged in a timely fashion" is failing due to missing OSUpdateStarted events that should have been recorded by the openshift-tests watcher, but weren't. We believe this is a defect somewhere in the disruption monitoring framework but are unsure where. The events exist when queried at the end of CI in gather-extra/must-gather.


How reproducible:

Rare but happening daily several times.

https://search.ci.openshift.org/?search=but+no+OSUpdateStarted+event+was+recorded&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job



To fix we will break up the test, stop failing the existing one when we see these events missing, and add a new test which will flake when this happens so we can track it clearly and separately.

Comment 2 Devan Goodwin 2022-02-10 18:10:54 UTC
For the purposes of this fix, the new test is live and the old is no longer failing on this problem. Next steps will be to actually fix the missing events but for purposes of this bug we're good.


Note You need to log in before you can comment on or make changes to this bug.