Bug 1806915

Summary: openshift-service-ca: Some core components are in openshift.io/run-level 1 and are bypassing SCC, but should not be
Product: OpenShift Container Platform Reporter: Stefan Schimanski <sttts>
Component: apiserver-authAssignee: Standa Laznicka <slaznick>
Status: CLOSED ERRATA QA Contact: scheng
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: aos-bugs, ccoleman, eparis, jialiu, jokerman, mfojtik, nhale, nstielau, sfowler, wsun, xiyuan, xtian, xxia
Target Milestone: ---Keywords: Reopened
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The namespace openshift-service-ca was labelled with "openshift.io/run-level: 1". Consequence: The pods inside this namespace would run with extra privileges. Fix: Since the label is no longer necessary to avoid components' circular dependency, it was removed. Result: The service-ca pods had their privileges scoped down.
Story Points: ---
Clone Of: 1805488 Environment:
Last Closed: 2021-02-24 15:10:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1805488, 1966621    

Comment 1 Standa Laznicka 2020-03-06 09:01:42 UTC
When trying to fix this issue for service-ca operator and controller, some dependency loops were identified that prevent the removal of the run-level label from the operator's and operand's namespaces.

Originally, it was observed that the DNS operator has a compulsory mount of the serving certificate provided by the service-ca controller. This prevented etcd from running, which in turn caused failures of the kube-apiserver deployment after bootstrap, which caused the cluster-policy-controller (which is not a part of the bootstrap control plane) to fail to connect to API (it connects to localhost and thus won't use the bootrap-control-plane kube-apiserver). This was fixed by removing the etcd dependency on DNS in https://github.com/openshift/cluster-etcd-operator/pull/233.

The cluster-policy-controller is unfortunately still dependent on the openshift-apiserver which provides rangeallocations.security.openshift.io resources needed by the namespace-security-allocation-controller (part of cluster-policy-controller). Without the namespace-security-allocation-controller running and annotating the namespaces with annotations needed for SCC admission, the service-ca operator and controller cannot run with any other SCC than privileged, which would be a poor fix to the issue. Note that the openshift-apiserver can't run without service-ca already running, which is creating yet another dependency loop.

A solution to the problem would be to move the rangeallocations.security.openshift.io resource group to CRD so that the controller can work even before openshift-apiserver starts, allowing any payload to use a proper SCC. I don't think the move to CRDs would be wise at this point of development phase of 4.4.

Comment 3 Stefan Schimanski 2020-03-12 15:30:12 UTC
Reopened and moved to 4.5.

Comment 4 Stefan Schimanski 2020-03-12 15:30:31 UTC
Reopened and moved to 4.5.

Comment 5 Standa Laznicka 2020-05-19 15:13:46 UTC
No progress in 4.5 about this (mirroring changes to the operator bug: https://bugzilla.redhat.com/show_bug.cgi?id=1806917#c3)

Comment 19 errata-xmlrpc 2021-02-24 15:10:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633