Bug 2022797

Summary: e2e-metal-ipi-ovn-ipv6 failing TestAllowedSCCViaRBAC
Product: OpenShift Container Platform Reporter: Derek Higgins <derekh>
Component: Test InfrastructureAssignee: Derek Higgins <derekh>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.10CC: wking
Target Milestone: ---Keywords: Triaged
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 23:27:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2024659    

Description Derek Higgins 2021-11-12 15:32:53 UTC
We're seeing occasional failures of the test
"[sig-auth][Feature:SecurityContextConstraints]  TestAllowedSCCViaRBAC [Suite:openshift/conformance/parallel]"

in the job 4.10-e2e-metal-ipi-ovn-ipv6, the test fails with the error

[BeforeEach] [sig-auth][Feature:SecurityContextConstraints]
  github.com/openshift/origin/test/extended/util/client.go:116
Nov 11 02:19:52.233: INFO: configPath is now "/tmp/configfile3736488686"
Nov 11 02:19:52.233: INFO: The user is now "e2e-test-scc-w6h76-user"
Nov 11 02:19:52.233: INFO: Creating project "e2e-test-scc-w6h76"
Nov 11 02:19:52.638: INFO: Waiting on permissions in project "e2e-test-scc-w6h76" ...
Nov 11 02:19:52.720: INFO: Waiting for ServiceAccount "default" to be provisioned...
Nov 11 02:19:52.902: INFO: Waiting for ServiceAccount "deployer" to be provisioned...
Nov 11 02:19:53.086: INFO: Waiting for ServiceAccount "builder" to be provisioned...
Nov 11 02:19:53.270: INFO: Waiting for RoleBinding "system:image-pullers" to be provisioned...
Nov 11 02:19:53.433: INFO: Waiting for RoleBinding "system:image-builders" to be provisioned...
Nov 11 02:19:53.597: INFO: Waiting for RoleBinding "system:deployers" to be provisioned...
Nov 11 02:19:54.362: INFO: Project "e2e-test-scc-w6h76" has been fully provisioned.
[It] TestAllowedSCCViaRBAC [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/security/scc.go:91
Nov 11 02:19:54.369: INFO: Creating project "e2e-test-scc-8fx79"
Nov 11 02:19:54.861: INFO: Waiting on permissions in project "e2e-test-scc-8fx79" ...
W1111 02:20:12.803936   57093 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostPID=true)
[AfterEach] [sig-auth][Feature:SecurityContextConstraints]
  github.com/openshift/origin/test/extended/util/client.go:140
STEP: Collecting events from namespace "e2e-test-scc-w6h76".
STEP: Found 1 events.
Nov 11 02:20:13.234: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for test3: { } FailedScheduling: 0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Nov 11 02:20:13.319: INFO: POD    NODE  PHASE    GRACE  CONDITIONS
Nov 11 02:20:13.319: INFO: test3        Pending         [{PodScheduled False 0001-01-01 00:00:00 +0000 UTC 2021-11-11 02:20:12 +0000 UTC Unschedulable 0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.}]
Nov 11 02:20:13.319: INFO:
Nov 11 02:20:13.563: INFO: skipping dumping cluster info - cluster too large
Nov 11 02:20:13.903: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-scc-w6h76-user}, err: <nil>

Comment 1 Derek Higgins 2021-11-12 15:35:13 UTC
In tests that fail the following line is common
W1112 12:20:13.512073 1825198 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostPID=true)



the problem with the job appears to be coming from the test "[sig-cli] oc adm cluster-role-reapers [Suite:openshift/conformance/parallel]"
this prune command https://github.com/openshift/origin/blob/master/test/extended/cli/admin.go#L355
        o.Expect(oc.Run("adm", "prune", "auth").Args("clusterrole/edit").Execute()).To(o.Succeed())
prunes out the rolebinding for the other test that is running at the same time,this could be causing more then just the TestAllowedSCCViaRBAC job to fail,
I'm thinking we move to to be a serial job

Comment 2 Derek Higgins 2021-11-18 16:59:11 UTC
Based on anecdotal evidence (recent CI results) this appears to have helped reliability, marking as verified.

Comment 3 W. Trevor King 2022-03-10 23:27:06 UTC
targets 4.10, not attached to errata, seems happy based on comment 2 -> moving to CURRENTRELEASE.