Description of problem: I've broken this off from https://bugzilla.redhat.com/show_bug.cgi?id=1803956 as my triage to date has seen enough discrepancies with the debug there, hence leading me to think it is a separate issue. Intermittently some image-ecosystem e2e's fail creating pods with the SCC related exception Latest incarnation: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-samples-operator/242/pull-ci-openshift-cluster-samples-operator-release-4.4-e2e-aws-image-ecosystem/5 One of the failures: [image_ecosystem][Slow] openshift images should be SCL enabled returning s2i usage when running the image [Top Level] [image_ecosystem][Slow] openshift images should be SCL enabled returning s2i usage when running the image "centos/ruby-23-centos7" should print the usage [Suite:openshift] expand_less 11s fail [github.com/openshift/origin/test/extended/image_ecosystem/scl.go:38]: Unexpected error: <*errors.StatusError | 0xc0018caa00>: { ErrStatus: { TypeMeta: {Kind: "", APIVersion: ""}, ListMeta: { SelfLink: "", ResourceVersion: "", Continue: "", RemainingItemCount: nil, }, Status: "Failure", Message: "pods \"test-pod-bae89b70-329c-48b1-b945-696d33657490\" is forbidden: unable to validate against any security context constraint: []", Reason: "Forbidden", Details: { Name: "test-pod-bae89b70-329c-48b1-b945-696d33657490", Group: "", Kind: "pods", UID: "", Causes: nil, RetryAfterSeconds: 0, }, Code: 403, }, } pods "test-pod-bae89b70-329c-48b1-b945-696d33657490" is forbidden: unable to validate against any security context constraint: [] occurred Version-Release number of selected component (if applicable): To date, have only seen/noticed on 4.4 e2e's How reproducible: intermittent Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: So in making another attempt to cross reference this bug with the 4.4 e2e flakes I with Standa's analysis at https://bugzilla.redhat.com/show_bug.cgi?id=1803956#c10 I discovered that namespace_scc_allocation_controller.go was moved to github.com/openshift/cluster-policy-controller in 4.3 When I look in that pod's logs, I see I0313 16:42:43.021643 1 cert_rotation.go:137] Starting client certificate rotation controller I0313 16:42:43.023554 1 policy_controller.go:41] Starting controllers on 0.0.0.0:10357 (v0.0.0-unknown) I0313 16:42:43.028080 1 standalone_apiserver.go:103] Started health checks at 0.0.0.0:10357 I0313 16:42:43.028252 1 leaderelection.go:242] attempting to acquire leader lease openshift-kube-controller-manager/cluster-policy-controller... E0313 16:42:45.999535 1 leaderelection.go:331] error retrieving resource lock openshift-kube-controller-manager/cluster-policy-controller: configmaps "cluster-policy-controller" is forbidden: User "system:kube-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "openshift-kube-controller-manager" So presumably it did not get far enough along to even get to the problem Standa saw. Taking a stab at "openshift-kube-controller-manager" pods, with the disclaimer I've never looked at those before, It would seem there is some host level pain. I see variants in the different logs, but as an example from openshift-kube-controller-manager_kube-controller-manager-ip-10-0-143-82.us-west-2.compute.internal_kube-controller-manager.log W0313 16:43:32.713660 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="ip-10-0-146-44.us-west-2.compute.internal" does not exist W0313 16:43:32.713676 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="ip-10-0-158-56.us-west-2.compute.internal" does not exist W0313 16:43:32.713686 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="ip-10-0-135-140.us-west-2.compute.internal" does not exist W0313 16:43:32.713695 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="ip-10-0-136-197.us-west-2.compute.internal" does not exist W0313 16:43:32.713704 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="ip-10-0-140-104.us-west-2.compute.internal" does not exist W0313 16:43:32.713717 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="ip-10-0-143-82.us-west-2.compute.internal" does not exist Lastly, I asked in various slack channels how to debug " unable to validate against any security context constraint" errors and David Eads said adding a dump of the SCCs on test failure would help. I have PR https://github.com/openshift/origin/pull/24703 up but it is awaiting another round of review/approval.
*** This bug has been marked as a duplicate of bug 1820687 ***