Bug 2052058

Summary: OSUpdateStaged CI tests failing unnecessarily on known CI infra defect
Product: OpenShift Container Platform Reporter: Devan Goodwin <dgoodwin>
Component: Test FrameworkAssignee: Devan Goodwin <dgoodwin>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.10CC: vlaad
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2052497 (view as bug list) Environment:
Last Closed: 2022-08-26 15:21:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2052497    
Bug Blocks:    

Description Devan Goodwin 2022-02-08 15:37:49 UTC
Description of problem:

The test "[bz-Machine Config Operator] Nodes should reach OSUpdateStaged in a timely fashion" is failing due to missing OSUpdateStarted events that should have been recorded by the openshift-tests watcher, but weren't. We believe this is a defect somewhere in the disruption monitoring framework but are unsure where. The events exist when queried at the end of CI in gather-extra/must-gather.


How reproducible:

Rare but happening daily several times.

https://search.ci.openshift.org/?search=but+no+OSUpdateStarted+event+was+recorded&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job



To fix we will break up the test, stop failing the existing one when we see these events missing, and add a new test which will flake when this happens so we can track it clearly and separately.

Comment 2 Devan Goodwin 2022-02-10 18:10:54 UTC
For the purposes of this fix, the new test is live and the old is no longer failing on this problem. Next steps will be to actually fix the missing events but for purposes of this bug we're good.