1923737 – openshift-tests should better support use of --count with early/late tests and support --fail-fast

Bug 1923737 - openshift-tests should better support use of --count with early/late tests and support --fail-fast

Summary: openshift-tests should better support use of --count with early/late tests an...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Test Framework
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Devan Goodwin
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-01 18:23 UTC by Clayton Coleman
Modified:	2023-09-15 01:00 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-02-02 18:15:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 25837	0	None	closed	Bug 1923737: Add `--fail-fast` to the tests command	2021-02-17 11:33:17 UTC

Description Clayton Coleman 2021-02-01 18:23:08 UTC

In general we are leaning more heavily into iteration, perform a number of cleanups on the test code (for porting to 4.6 as well) that unify some temporary hacks into a more principled structure.

--fail-fast should terminate cleanly when a test fails (and highlight the test htat failed)

[Early] and [Late] should not be run multiple times with --count

[Early] and [Late] differ between jobs and should not overlap

SCC pre-test checks were extremely slow (every 15s) and should take no more than 30s with check intervals much faster

Cloud provider initialization should not occur unless needed

Disruptive tests should skip [Late] but not [Early]

Comment 5 Jian Zhang 2021-03-04 10:06:42 UTC

1, `--count` works as expected, if it is `-1`, it will run forever.
[root@preserve-olm-env origin]# ./openshift-tests run all --dry-run|grep "OLM Report Upgradeable"|./openshift-tests run --count -1 -f -
openshift-tests version: v4.1.0-3659-g7c51c89
started: (0/1/1) "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"

passed: (3.8s) 2021-03-04T09:45:44 "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"

started: (0/2/2) "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"

passed: (2.8s) 2021-03-04T09:45:47 "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"

started: (0/3/3) "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"
...

But, if --count = -5 , it runs 1 time, it should run forever if --count < 0.

[root@preserve-olm-env origin]# ./openshift-tests run all --dry-run|grep "OLM Report Upgradeable"|./openshift-tests run --count -5 -f -
openshift-tests version: v4.1.0-3659-g7c51c89
started: (0/1/1) "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"

passed: (2.8s) 2021-03-04T09:48:33 "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"


Timeline:

Mar 04 09:48:31.559 I ns/e2e-test-olm-23440-ksd65 namespace/e2e-test-olm-23440-ksd65 reason/CreatedSCCRanges created SCC ranges
Mar 04 09:48:31.699 - 1s    W ns/openshift-marketplace pod/qe-app-registry-r2ztw node/ip-10-0-179-41.us-east-2.compute.internal pod has been pending longer than a minute
Mar 04 09:48:31.699 - 1s    W ns/e2e-runtimeclass-7115 pod/test-runtimeclass-e2e-runtimeclass-7115-non-conflict-runti882v2 node/ip-10-0-140-208.us-east-2.compute.internal pod has been pending longer than a minute
Mar 04 09:48:31.699 - 1s    W ns/openshift-marketplace pod/qe-app-registry-2h64f node/ip-10-0-140-208.us-east-2.compute.internal pod has been pending longer than a minute

1 pass, 0 skip (2.8s)

if --count = 0 , it runs 1 time, it should not run.

[root@preserve-olm-env origin]# ./openshift-tests run all --dry-run|grep "OLM Report Upgradeable"|./openshift-tests run --count 0 -f -
openshift-tests version: v4.1.0-3659-g7c51c89
started: (0/1/1) "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"

passed: (2.9s) 2021-03-04T09:47:51 "[sig-operator] an end user can use OLM Report Upgradeable in OLM ClusterOperators status [Suite:openshift/conformance/parallel]"


Timeline:

Mar 04 09:47:49.026 I ns/e2e-test-olm-23440-hwk9h namespace/e2e-test-olm-23440-hwk9h reason/CreatedSCCRanges created SCC ranges
Mar 04 09:47:49.134 - 999ms W ns/openshift-marketplace pod/qe-app-registry-2h64f node/ip-10-0-140-208.us-east-2.compute.internal pod has been pending longer than a minute
Mar 04 09:47:49.134 - 999ms W ns/e2e-runtimeclass-7115 pod/test-runtimeclass-e2e-runtimeclass-7115-non-conflict-runti882v2 node/ip-10-0-140-208.us-east-2.compute.internal pod has been pending longer than a minute
Mar 04 09:47:49.134 - 999ms W ns/openshift-marketplace pod/qe-app-registry-r2ztw node/ip-10-0-179-41.us-east-2.compute.internal pod has been pending longer than a minute

1 pass, 0 skip (2.9s)


`--fail-fast` works as expected, it exited the suite when any test fails
[root@preserve-olm-env origin]# ./openshift-tests run all --dry-run|grep "\[sig-node\] RuntimeClass"
"[sig-node] RuntimeClass  should support RuntimeClasses API operations [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"
"[sig-node] RuntimeClass should reject a Pod requesting a RuntimeClass with an unconfigured handler [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"
"[sig-node] RuntimeClass should reject a Pod requesting a RuntimeClass with conflicting node selector [Disabled:Broken] [Suite:k8s]"
"[sig-node] RuntimeClass should reject a Pod requesting a deleted RuntimeClass [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"
"[sig-node] RuntimeClass should reject a Pod requesting a non-existent RuntimeClass [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"
"[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with a configured handler [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"
"[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with scheduling with taints [Serial]  [Disabled:Broken] [Suite:k8s]"
"[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with scheduling without taints  [Disabled:Broken] [Suite:k8s]"

[root@preserve-olm-env origin]# ./openshift-tests run all --dry-run|grep "\[sig-node\] RuntimeClass"|./openshift-tests run --fail-fast -f -
openshift-tests version: v4.1.0-3659-g7c51c89
started: (0/1/8) "[sig-node] RuntimeClass should reject a Pod requesting a RuntimeClass with conflicting node selector [Disabled:Broken] [Suite:k8s]"

started: (0/2/8) "[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with scheduling without taints  [Disabled:Broken] [Suite:k8s]"

started: (0/3/8) "[sig-node] RuntimeClass should reject a Pod requesting a non-existent RuntimeClass [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"

started: (0/4/8) "[sig-node] RuntimeClass  should support RuntimeClasses API operations [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"

started: (0/5/8) "[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with a configured handler [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"

started: (0/6/8) "[sig-node] RuntimeClass should reject a Pod requesting a deleted RuntimeClass [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"

started: (0/7/8) "[sig-node] RuntimeClass should reject a Pod requesting a RuntimeClass with an unconfigured handler [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"

passed: (1.4s) 2021-03-04T09:20:59 "[sig-node] RuntimeClass should reject a Pod requesting a RuntimeClass with conflicting node selector [Disabled:Broken] [Suite:k8s]"
...
failed: (5m4s) 2021-03-04T09:26:01 "[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with a configured handler [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]"
skipped: (5m4s) 2021-03-04T09:26:01 "[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with scheduling without taints  [Disabled:Broken] [Suite:k8s]"

Timeline:
...
Failing tests:

[sig-node] RuntimeClass should run a Pod requesting a RuntimeClass with a configured handler [NodeFeature:RuntimeHandler] [Disabled:Broken] [Suite:k8s]

error: 1 fail, 5 pass, 1 skip (5m4s)

But, it should have 2 skip test cases, the results summary is incorrect.

Comment 7 Steve Kuznetsov 2021-05-17 19:24:17 UTC

@clayton what is left here?

Comment 9 Red Hat Bugzilla 2023-09-15 01:00:10 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.