Bug 2059299

Summary: [IBMCloud] CRB creation failures continue during pre-test step
Product: OpenShift Container Platform Reporter: Christopher J Schaefer <cschaefe>
Component: Test FrameworkAssignee: Christopher J Schaefer <cschaefe>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: rvanderp
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-14 17:08:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2059717    

Description Christopher J Schaefer 2022-02-28 16:55:04 UTC
Description of problem:
Prior to every test case in OCP Cofnormance, on IBM Cloud, the existence of a ClusterRoleBinding (CRB), e2e

Version-Release number of selected component (if applicable):
openshift-tests version: v4.1.0+834adcb-4899-dirty


How reproducible:
About 80%

Steps to Reproduce:
1. Create a new IPI cluster on IBM Cloud
2. Run OCP Conformance on new cluster


Actual results:

openshift-tests version: v4.1.0+834adcb-4899-dirty
started: (0/1/2808) "[sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"

started: (0/2/2808) "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"

started: (0/3/2808) "[sig-cluster-lifecycle][Feature:Machines][Early] Managed cluster should have same number of Machines and Nodes [Suite:openshift/conformance/parallel]"

started: (0/4/2808) "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]"

Feb 25 21:54:01.752: INFO: disruption/kube-api connection/new started responding to GET requests over new connections
Feb 25 21:54:02.125: INFO: disruption/openshift-api connection/new started responding to GET requests over new connections
Feb 25 21:54:02.125: INFO: disruption/oauth-api connection/new started responding to GET requests over new connections
Feb 25 21:54:02.125: INFO: disruption/kube-api connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.125: INFO: disruption/openshift-api connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.125: INFO: disruption/oauth-api connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.134: INFO: ns/openshift-console route/console disruption/ingress-to-console connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.153: INFO: ns/openshift-console route/console disruption/ingress-to-console connection/new started responding to GET requests over new connections
Feb 25 21:54:02.322: INFO: ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/reused started responding to GET requests over reused connections
Feb 25 21:54:02.329: INFO: ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/new started responding to GET requests over new connections
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1489
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1489
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:61
[BeforeEach] [sig-auth][Feature:SCC][Early]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/client.go:140
STEP: Creating a kubernetes client
[AfterEach] [sig-auth][Feature:SCC][Early]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/client.go:138
[AfterEach] [sig-auth][Feature:SCC][Early]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/client.go:139
fail [github.com/openshift/origin/test/extended/util/ibmcloud/provider.go:60]: Unexpected error:
    <*errors.StatusError | 0xc00317c3c0>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {
                SelfLink: "",
                ResourceVersion: "",
                Continue: "",
                RemainingItemCount: nil,
            },
            Status: "Failure",
            Message: "clusterrolebindings.rbac.authorization.k8s.io \"e2e-node-attacher\" already exists",
            Reason: "AlreadyExists",
            Details: {
                Name: "e2e-node-attacher",
                Group: "rbac.authorization.k8s.io",
                Kind: "clusterrolebindings",
                UID: "",
                Causes: nil,
                RetryAfterSeconds: 0,
            },
            Code: 409,
        },
    }
    clusterrolebindings.rbac.authorization.k8s.io "e2e-node-attacher" already exists
occurred

failed: (1.3s) 2022-02-25T21:54:02 "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]"



Expected results:
No test errors caused by trying to create an existing CRB


Additional info:
IBM Cloud is testing a fix to ignore the creation error if it was caused due to the CRB already existing.

Comment 1 Christopher J Schaefer 2022-02-28 16:59:22 UTC
A mutex was recently added to attempt to handle these kind of errors (multiple were seen during testing for instance).
And initial testing appeared that the fix was working as intended.
https://github.com/openshift/origin/commit/74b277cc8bda12ae575262e3249050984c2a5fd7

However, recent testing appears to show that even with the mutex, the CRB may not be found but trying to create it fails because it does already exist.

The thought is to check the creation error and if it is caused due to the CRB already existing, ignore the error

Comment 3 Devan Goodwin 2022-03-14 17:08:49 UTC
Looks like this should be resolved, please reopen if not fixed.