Bug 2092961

Summary: Disruption tests fire when CI cluster itself experiences network disruption
Product: OpenShift Container Platform Reporter: Devan Goodwin <dgoodwin>
Component: Test FrameworkAssignee: OpenShift Release Oversight <openshift-release-oversight>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.11CC: openshift-release-oversight
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-21 19:36:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Devan Goodwin 2022-06-02 15:45:08 UTC
TRT has identified scenarios where the CI cluster where tests are running (as opposed to the cluster under test) can sometimes lose networking causing disruption tests to fire. This was exposed by noticing several aggregated job runs across clouds and platforms all logged disruption at the exact same time.

To help we should add a new disruption backend to hit an external service. If we see this go down in addition to the cluster itself, we know it's not real disruption.

Comment 1 Devan Goodwin 2022-06-03 14:19:26 UTC
This is already failing too often, and not even corresponding to observed disruption in the cluster under test. We're not sure what's going on but we're going to allow up to 10 minutes before failing the test, it will flake if we see any, so we can gather data and compare.