Bug 1997509 - flake: [sig-cli] oc builds new-build [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
Summary: flake: [sig-cli] oc builds new-build [Skipped:Disconnected] [Suite:openshift/...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Filip Krepinsky
QA Contact: zhou ying
URL:
Whiteboard: tag-ci
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-25 12:26 UTC by Dan Winship
Modified: 2022-03-11 18:15 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-11 18:15:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26442 0 None None None 2021-09-02 10:45:09 UTC

Description Dan Winship 2021-08-25 12:26:32 UTC
eg, in https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-ovn-dualstack/1430337026187595776

logs show:

Aug 25 02:31:43.742: INFO: Running 'oc --namespace=e2e-test-oc-builds-v66z8 --kubeconfig=/root/dev-scripts/ocp/ostest/auth/kubeconfig delete all --all'
Aug 25 02:31:44.029: INFO: Error running /usr/local/bin/oc --namespace=e2e-test-oc-builds-v66z8 --kubeconfig=/root/dev-scripts/ocp/ostest/auth/kubeconfig delete all --all:
StdOut>
error: the server doesn't have a resource type "noxus"
StdErr>
error: the server doesn't have a resource type "noxus"

error: the server doesn't have a resource type "noxus"


It seems that the "oc delete all" is racing with the tests in k8s.io/kubernetes/test/e2e/apimachinery/, which temporarily add a CRD named "noxus" and then later remove it. So the type exists when "oc" enumerates the set of all types, but not a few moments later when it tries to enumerate the objects of that type in the current namespace.

It looks like there are a handful of tests in test/extended/cli/builds.go, test/extended/cli/admin.go, test/extended/builds/pipeline_jenkins_e2e.go, and test/extended/builds/pipeline_origin_bld.go that use "oc delete all", but it seems like in most of the cases, it's deleting everything immediately before the end of the test case, which is unnecessary anyway... and the "new-build" test in test/extended/cli/builds.go should probably be split into multiple test cases anyway, and then you could lose the "oc delete all"s there too.

(I'm not sure if it makes sense to consider this a bug in "oc delete all"?)

Comment 1 Maciej Szulik 2021-09-01 10:28:30 UTC
Filip the main issue here is `error: the server doesn't have a resource type "noxus"`
I'm not what added that type to all alias, but that's clearly wrong, nothing like that should ever exist
in openshift. My suggestion for a fix is to explicitly call out what we want to remove instead of using all.

Comment 2 Filip Krepinsky 2021-09-02 10:53:10 UTC
I have posted a fix which deletes only the necessary resources instead of all as suggested.

I have not touched the other *.go tests since all of them use a label selector and thus shouldn't cause these problems.

There are multiple *.sh tests that use oc delete all, but there is a lot of usages to do a simple fix. @Maciej should I focus on these as well, or leave them for the rewrite?

Comment 3 Filip Krepinsky 2021-09-02 11:00:37 UTC
> test/extended/cli/builds.go should probably be split into multiple test cases anyway

we have decided against that to limit the number of test cases

> (I'm not sure if it makes sense to consider this a bug in "oc delete all"?)

To me it seems like an okay behaviour. To get an error that a resource got missing in between the start and end of the execution. All is simply a category that gets expanded to all of the resources. So there probably shouldn't be a difference between specifying the resource vs all category.

Comment 4 Filip Krepinsky 2021-09-02 11:09:26 UTC
> I have not touched the other *.go tests since all of them use a label selector and thus shouldn't cause these problems.

I am taking this back. The issue persists here as well - will update the PR

Comment 5 Maciej Szulik 2021-09-02 14:03:03 UTC
(In reply to Filip Krepinsky from comment #2)
> There are multiple *.sh tests that use oc delete all, but there is a lot of
> usages to do a simple fix. @Maciej should I focus on these as well, or leave
> them for the rewrite?

I think it's reasonable to be explicit.

Comment 6 Filip Krepinsky 2021-09-03 11:23:52 UTC
I agree, although this bug doesn't affect these *.sh tests since they are run in serial mode. I would prefer to merge only the broken parts and postpone the refactoring for later, to focus on higher priority tasks and because it could introduce breaking changes.

Comment 7 Maciej Szulik 2021-09-06 15:49:49 UTC
(In reply to Filip Krepinsky from comment #6)
> I agree, although this bug doesn't affect these *.sh tests since they are
> run in serial mode. I would prefer to merge only the broken parts and
> postpone the refactoring for later, to focus on higher priority tasks and
> because it could introduce breaking changes.

SGTM


Note You need to log in before you can comment on or make changes to this bug.