2059299 – [IBMCloud] CRB creation failures continue during pre-test step

Bug 2059299 - [IBMCloud] CRB creation failures continue during pre-test step

Summary: [IBMCloud] CRB creation failures continue during pre-test step

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Test Framework
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Christopher J Schaefer
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2059717
TreeView+	depends on / blocked

Reported:	2022-02-28 16:55 UTC by Christopher J Schaefer
Modified:	2022-03-21 17:39 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-14 17:08:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 26864	0	None	open	Bug 2059299: IBMCloud: Ignore CRB already exists	2022-02-28 21:47:20 UTC

Description Christopher J Schaefer 2022-02-28 16:55:04 UTC

Description of problem:
Prior to every test case in OCP Cofnormance, on IBM Cloud, the existence of a ClusterRoleBinding (CRB), e2e

Version-Release number of selected component (if applicable):
openshift-tests version: v4.1.0+834adcb-4899-dirty


How reproducible:
About 80%

Steps to Reproduce:
1. Create a new IPI cluster on IBM Cloud
2. Run OCP Conformance on new cluster


Actual results:

openshift-tests version: v4.1.0+834adcb-4899-dirty
started: (0/1/2808) "[sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"

started: (0/2/2808) "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"

started: (0/3/2808) "[sig-cluster-lifecycle][Feature:Machines][Early] Managed cluster should have same number of Machines and Nodes [Suite:openshift/conformance/parallel]"

started: (0/4/2808) "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]"

Feb 25 21:54:01.752: INFO: disruption/kube-api connection/new started responding to GET requests over new connections
Feb 25 21:54:02.125: INFO: disruption/openshift-api connection/new started responding to GET requests over new connections
Feb 25 21:54:02.125: INFO: disruption/oauth-api connection/new started responding to GET requests over new connections
Feb 25 21:54:02.125: INFO: disruption/kube-api connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.125: INFO: disruption/openshift-api connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.125: INFO: disruption/oauth-api connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.134: INFO: ns/openshift-console route/console disruption/ingress-to-console connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.153: INFO: ns/openshift-console route/console disruption/ingress-to-console connection/new started responding to GET requests over new connections
Feb 25 21:54:02.322: INFO: ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.329: INFO: ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/new started responding to GET requests over new connections
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1489
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1489
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:61
[BeforeEach] [sig-auth][Feature:SCC][Early]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/client.go:140
STEP: Creating a kubernetes client
[AfterEach] [sig-auth][Feature:SCC][Early]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/client.go:138
[AfterEach] [sig-auth][Feature:SCC][Early]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/client.go:139
fail [github.com/openshift/origin/test/extended/util/ibmcloud/provider.go:60]: Unexpected error:
    <*errors.StatusError | 0xc00317c3c0>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {
                SelfLink: "",
                ResourceVersion: "",
                Continue: "",
                RemainingItemCount: nil,
            },
            Status: "Failure",
            Message: "clusterrolebindings.rbac.authorization.k8s.io \"e2e-node-attacher\" already exists",
            Reason: "AlreadyExists",
            Details: {
                Name: "e2e-node-attacher",
                Group: "rbac.authorization.k8s.io",
                Kind: "clusterrolebindings",
                UID: "",
                Causes: nil,
                RetryAfterSeconds: 0,
            },
            Code: 409,
        },
    }
    clusterrolebindings.rbac.authorization.k8s.io "e2e-node-attacher" already exists
occurred

failed: (1.3s) 2022-02-25T21:54:02 "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]"



Expected results:
No test errors caused by trying to create an existing CRB


Additional info:
IBM Cloud is testing a fix to ignore the creation error if it was caused due to the CRB already existing.

Comment 1 Christopher J Schaefer 2022-02-28 16:59:22 UTC

A mutex was recently added to attempt to handle these kind of errors (multiple were seen during testing for instance).
And initial testing appeared that the fix was working as intended.
https://github.com/openshift/origin/commit/74b277cc8bda12ae575262e3249050984c2a5fd7

However, recent testing appears to show that even with the mutex, the CRB may not be found but trying to create it fails because it does already exist.

The thought is to check the creation error and if it is caused due to the CRB already existing, ignore the error

Comment 3 Devan Goodwin 2022-03-14 17:08:49 UTC

Looks like this should be resolved, please reopen if not fixed.

Note You need to log in before you can comment on or make changes to this bug.