2033615 – [sig-node] Managed cluster should report ready nodes the entire duration of the test run [Late] [Suite:openshift/conformance/parallel]

Bug 2033615 - [sig-node] Managed cluster should report ready nodes the entire duration of the test run [Late] [Suite:openshift/conformance/parallel]

Summary: [sig-node] Managed cluster should report ready nodes the entire duration of t...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Test Framework
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Devan Goodwin
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2034370
TreeView+	depends on / blocked

Reported:	2021-12-17 12:19 UTC by Thomas Jungblut
Modified:	2021-12-21 21:04 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-20 19:07:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 25934	0	None	Merged	Bug 2033615: test: Nodes that are deleted should not fire the unready alert	2021-12-20 19:07:51 UTC

Description Thomas Jungblut 2021-12-17 12:19:38 UTC

[sig-node] Managed cluster should report ready nodes the entire duration of the test run [Late] [Suite:openshift/conformance/parallel]

is failing frequently in CI, see:
https://sippy.ci.openshift.org/sippy-ng/tests/4.7/analysis?test=%5Bsig-node%5D%20Managed%20cluster%20should%20report%20ready%20nodes%20the%20entire%20duration%20of%20the%20test%20run%20%5BLate%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D


This is affecting the 4.7 release quite a lot, here are some recent failures: 
* https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-e2e-aws-serial/1471691294148399104
* https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-e2e-aws-serial/1471547869029732352
* https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-e2e-aws-serial/1471462336106598400
* https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-e2e-aws-serial/1471394344035422208

According to Vadim, this is caused by "its apiserver racing and doesn't apply RBs for SCCs fast enough". 

Here's a slack thread for more discussion: https://coreos.slack.com/archives/CJARLA942/p1639737113130300

Comment 2 W. Trevor King 2021-12-20 19:06:02 UTC

Even if we aren't going to address this on the product side, we don't want 4.7 CI release blocking jobs failing consistently, so we need to address this somehow.  David found [1] improving the PromQL in 4.8, so we'll backport that to 4.7's origin suite.

[1]: https://github.com/openshift/origin/pull/25934

Note You need to log in before you can comment on or make changes to this bug.