Bug 1961204

Summary: kube-controller-manager operator is slow to apply sa.scc.uid-range annotation
Product: OpenShift Container Platform Reporter: Vadim Rutkovsky <vrutkovs>
Component: kube-controller-managerAssignee: Filip Krepinsky <fkrepins>
Status: CLOSED DUPLICATE QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.7CC: aos-bugs, bleanhar, bparees, deads, dhellmann, ercohen, hekumar, itsoiref, lmohanty, mfojtik, pmali, sttts, wking, yliu1
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1955299 Environment:
job=release-openshift-origin-installer-e2e-aws-compact-4.7=all [sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel] job=release-openshift-origin-installer-e2e-aws-upgrade-4.4-to-4.5-to-4.6-to-4.7-ci=all
Last Closed: 2022-04-04 21:50:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1955299    
Bug Blocks:    

Description Vadim Rutkovsky 2021-05-17 13:45:36 UTC
+++ This bug was initially created as a clone of Bug #1955299 +++

job:
release-openshift-origin-installer-e2e-aws-compact-4.7 

is just started always failing in CI, see testgrid results:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#release-openshift-origin-installer-e2e-aws-compact-4.7


sample job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-compact-4.7/1387145769915518976

specifically it looks like this test just started permfailing:
[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel] expand_more


fail [github.com/openshift/origin/test/extended/authorization/scc.go:57]: 6 pods failed on SCC errors
Error creating: pods "cloud-credential-operator-744f659b74-" is forbidden: unable to validate against any security context constraint: [] for ReplicaSet.apps/v1/cloud-credential-operator-744f659b74 -n openshift-cloud-credential-operator happened 9 times
Error creating: pods "aws-ebs-csi-driver-controller-59b98974df-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[1].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[2].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[2].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[3].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[3].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[4].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[4].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[5].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[5].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used] for ReplicaSet.apps/v1/aws-ebs-csi-driver-controller-59b98974df -n openshift-cluster-csi-drivers happened 4 times

--- Additional comment from Stefan Schimanski on 2021-04-30 11:20:44 UTC ---

We know to see some failures (compare https://github.com/openshift/origin/blob/master/test/extended/authorization/scc.go#L23). Now there are more:

fail [github.com/openshift/origin/test/extended/authorization/scc.go:57]: 6 pods failed on SCC errors
Error creating: pods "cloud-credential-operator-744f659b74-" is forbidden: unable to validate against any security context constraint: [] for ReplicaSet.apps/v1/cloud-credential-operator-744f659b74 -n openshift-cloud-credential-operator happened 9 times
Error creating: pods "aws-ebs-csi-driver-controller-59b98974df-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[1].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[2].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[2].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[3].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[3].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[4].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[4].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used spec.containers[5].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[5].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used] for ReplicaSet.apps/v1/aws-ebs-csi-driver-controller-59b98974df -n openshift-cluster-csi-drivers happened 4 times
Error creating: pods "aws-ebs-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[1].securityContext.containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used spec.containers[2].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[2].securityContext.containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used] for DaemonSet.apps/v1/aws-ebs-csi-driver-node -n openshift-cluster-csi-drivers happened 1 times
Error creating: pods "console-operator-84b9dcd98b-" is forbidden: unable to validate against any security context constraint: [] for ReplicaSet.apps/v1/console-operator-84b9dcd98b -n openshift-console-operator happened 12 times
Error creating: pods "downloads-9f49bf5fd-" is forbidden: unable to validate against any security context constraint: [] for ReplicaSet.apps/v1/downloads-9f49bf5fd -n openshift-console happened 12 times
Error creating: pods "marketplace-operator-54f7786b4c-" is forbidden: unable to validate against any security context constraint: [] for ReplicaSet.apps/v1/marketplace-operator-54f7786b4c -n openshift-marketplace happened 11 times

(from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-compact-4.7/1387145769915518976)

Comment 1 Sergiusz Urbaniak 2021-06-03 11:16:51 UTC
*** Bug 1955299 has been marked as a duplicate of this bug. ***

Comment 5 Eran Cohen 2021-08-30 11:49:16 UTC
We see this issue in https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node as well.
e.g. 
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node/1432284279919874048
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node/1431740745693270016
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node/1431493395477434368


Error creating: pods "console-operator-6b677db698-" is forbidden: unable to validate against any security context constraint: [provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for ReplicaSet.apps/v1/console-operator-6b677db698 -n openshift-console-operator happened 15 times
Error creating: pods "ingress-canary-" is forbidden: unable to validate against any security context constraint: [provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/ingress-canary -n openshift-ingress-canary happened 13 times

Comment 6 Vadim Rutkovsky 2021-09-06 16:32:08 UTC
It appears this early test is running before upgrade is started. As a result this test would flake on install jobs and upgrade jobs.

Comment 8 Vadim Rutkovsky 2022-01-26 11:23:53 UTC
*** Bug 2046094 has been marked as a duplicate of this bug. ***

Comment 9 yliu1 2022-01-27 20:34:29 UTC
*** Bug 2047397 has been marked as a duplicate of this bug. ***

Comment 10 Filip Krepinsky 2022-01-28 16:02:36 UTC
https://github.com/openshift/cluster-kube-controller-manager-operator/pull/594 should help this issue, but let's wait for some time to see how it affects the CI

Comment 12 Filip Krepinsky 2022-04-04 21:50:41 UTC
I have found just a few occurences of an this error and it fails just for azure-file-csi-driver. Eg. in

- https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1510767376654667776
- https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-techpreview-serial/1510700534229635072

Error creating: pods "azure-file-csi-driver-controller-94fcc6984-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.initContainers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.initContainers[0].securityContext.containers[0].hostPort: Invalid value: 10303: Host ports are not allowed to be used, spec.initContainers[0].securityContext.containers[1].hostPort: Invalid value: 9211: Host ports are not allowed to be used, spec.initContainers[0].securityContext.containers[3].hostPort: Invalid value: 9212: Host ports are not



It does not seem to affect the creation of other resources in other namespaces so it would suggest the cluster-policy-controller managed to start and was fixed by the #594 PR. 
Also, the reported errors can be seen in CI for other releases (such as 4.7) which were not backported into. 

Closing

*** This bug has been marked as a duplicate of bug 2045872 ***

Comment 14 Filip Krepinsky 2022-05-17 23:14:23 UTC
There is something else going wrong here. We can see from the events that the scc ranges for this namespaces were correctly initialized before the namespace was used, so we can rule out this bug.

picked events:

18:26:06	kube-system	cluster-policy-controller-namespace-security-allocation-controller	bootstrap-kube-controller-manager-ip-192-168-85-48.us-west-2.compute.internal	CreatedSCCRanges created SCC ranges for openshift-cluster-csi-drivers namespace
18:33:12 (x15) openshift-cluster-csi-drivers daemonset-controller vmware-vsphere-csi-driver-node FailedCreate Error creating: pods "vmware-vsphere-csi-driver-node-" is forbidden: unable to validate against any security context constraint:
18:33:12 (x17)	openshift-cluster-csi-drivers replicaset-controller	vmware-vsphere-csi-driver-controller-69b69d7f6f	FailedCreate FailedCreate Error creating: pods "vmware-vsphere-csi-driver-controller-69b69d7f6f-" is forbidden: unable to validate against any security context constraint