Bug 2097186

Summary: PSa autolabeling in 4.11 env upgraded from 4.10 does not work due to missing RBAC objects
Product: OpenShift Container Platform Reporter: Xingxing Xia <xxia>
Component: apiserver-authAssignee: Standa Laznicka <slaznick>
Status: CLOSED ERRATA QA Contact: Yash Tripathi <ytripath>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.11CC: mfojtik, surbania, ytripath
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:17:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xingxing Xia 2022-06-15 05:28:47 UTC
Description of problem:
PSa autolabeling in 4.11 env upgraded from 4.10 does not work due to missing RBAC objects.
It works well in 4.11 fresh env.

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-06-15-004003 env upgraded from 4.10.0-0.nightly-2022-06-14-140230

How reproducible:
Always

Steps to Reproduce:
1. Launch 4.10 fresh env
2. Upgrade to 4.11.0-0.nightly-2022-06-15-004003 successfully
3. Then check if PSa autolabeling works by:
oc login -u xxx
oc new-project xxia-proj
oc get project xxia-proj -o yaml
...
  labels:
    kubernetes.io/metadata.name: xxia-proj
  name: xxia-proj
...

Not autolabeled!
Then let's opt-in explicitly:
oc label ns xxia-proj security.openshift.io/scc.podSecurityLabelSync="true" --context admin
oc get project xxia-proj -o yaml
...
  labels:
    kubernetes.io/metadata.name: xxia-proj
    security.openshift.io/scc.podSecurityLabelSync: "true"
  name: xxia-proj
...

Actual results:
3. PSa autolabeling does not work in upgraded 4.11 env which is upgraded from 4.10

Expected results:
3. Should work like 4.11 fresh installed env shows:
oc get project xxia-proj -o yaml
...
  labels:
...
    pod-security.kubernetes.io/audit: restricted
...
    pod-security.kubernetes.io/warn: restricted

Additional info:
Checked 4.11 fresh env, it has below RBAC objects:
oc get ClusterRoleBinding system:openshift:controller:podsecurity-admission-label-syncer-controller
oc get ClusterRole system:openshift:controller:podsecurity-admission-label-syncer-controller
Checked in upgraded 4.11 env which is upgraded from 4.10, it does not have these objects.

Comment 1 Xingxing Xia 2022-06-15 05:54:56 UTC
Logs in "cluster-policy-controller":
oc logs -c cluster-policy-controller kube-controller-manager-ip-10-0-153-20.ap-northeast-2.compute.internal -n openshift-kube-controller-manager
2022-06-15T05:10:28.657538595Z I0615 05:10:28.657493       1 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-controller-manager", Name:"kube-controller-manager-ip-10-0-153-20.ap-northeast-2.compute.internal", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CreatedSCCRanges' created SCC ranges for xxia-proj namespace
2022-06-15T05:10:28.700441701Z E0615 05:10:28.700407       1 base_controller.go:270] "pod-security-admission-label-synchronization-controller" controller failed to sync "xxia-proj", err: failed to synchronize namespace "xxia-proj": failed to update the namespace: namespaces "xxia-proj" is forbidden: User "system:serviceaccount:openshift-infra:podsecurity-admission-label-syncer-controller" cannot update resource "namespaces" in API group "" in the namespace "xxia-proj"
2022-06-15T05:10:28.704836909Z E0615 05:10:28.703200       1 base_controller.go:270] "pod-security-admission-label-synchronization-controller" controller failed to sync "xxia-proj", err: failed to synchronize namespace "xxia-proj": failed to update the namespace: namespaces "xxia-proj" is forbidden: User "system:serviceaccount:openshift-infra:podsecurity-admission-label-syncer-controller" cannot update resource "namespaces" in API group "" in the namespace "xxia-proj"
...
2022-06-15T05:10:48.027120631Z E0615 05:10:48.027085       1 base_controller.go:270] "pod-security-admission-label-synchronization-controller" controller failed to sync "xxia-proj", err: failed to synchronize namespace "xxia-proj": failed to update the namespace: namespaces "xxia-proj" is forbidden: User "system:serviceaccount:openshift-infra:podsecurity-admission-label-syncer-controller" cannot update resource "namespaces" in API group "" in the namespace "xxia-proj"

Many and frequent log lines like above.

Comment 5 Xingxing Xia 2022-06-22 02:06:56 UTC
Yash, a minor reminder, marking above comment private due to exposing cluster URL publicly.

Moving to VERIFIED, above result is expected. Yash, for 4.10 clusters, when upgraded to 4.11, the SCCs have some differences VS fresh 4.11 clusters. You can compare:
mkdir $DIR
cd $DIR
for i in `oc get scc --no-headers | grep -o '^[^ ]*'`
do
  oc get scc $i -o yaml > $i.yaml
done

First, check fresh 4.11, let DIR=4.11-fresh-cluster, then run above scripts.
Then, check an upgraded 4.11 cluster upgraded from 4.10, let DIR=4.11-upgraded-cluster, then run above scripts.
Then run diff -u 4.11-fresh-cluster/restricted.yaml 4.11-upgraded-cluster/restricted.yaml
You will see the difference and know why above your result is expected by design.
Could you check and give feedback here?
This was also said here https://github.com/openshift/openshift-docs/issues/43249#issuecomment-1157420339 which mentioned you, did you notice?
Thanks!

Comment 7 Yash Tripathi 2022-07-01 12:54:29 UTC
On checking the diff b/w restricted scc it is clear that the upgraded cluster has "system:authenticated" hence the difference between clusters

$ diff -u 4.11-fresh-cluster/restricted.yaml 4.11-upgraded-cluster/restricted.yaml
--- 4.11-fresh-cluster/restricted.yaml  2022-07-01 18:04:07.545127735 +0530
+++ 4.11-upgraded-cluster/restricted.yaml       2022-07-01 14:20:36.622140855 +0530
@@ -10,7 +10,8 @@
 defaultAddCapabilities: null
 fsGroup:
   type: MustRunAs
-groups: []
+groups:
+- system:authenticated
 kind: SecurityContextConstraints
 metadata:
   annotations:
@@ -18,13 +19,14 @@
     include.release.openshift.io/self-managed-high-availability: "true"
     include.release.openshift.io/single-node-developer: "true"
     kubernetes.io/description: restricted denies access to all host features and requires
-      pods to be run with a UID, and SELinux context that are allocated to the namespace.
+      pods to be run with a UID, and SELinux context that are allocated to the namespace.  This
+      is the most restrictive SCC and it is used by default for authenticated users.
     release.openshift.io/create-only: "true"
-  creationTimestamp: "2022-07-01T04:40:53Z"
+  creationTimestamp: "2022-07-01T04:43:26Z"
   generation: 1
   name: restricted
-  resourceVersion: "403"
-  uid: f6eb5ec3-c46c-4f56-89e3-14d36739ab47
+  resourceVersion: "475"
+  uid: 84b40601-997f-4fc1-bb80-93ae9dcd8732
 priority: null
 readOnlyRootFilesystem: false
 requiredDropCapabilities:

Comment 8 errata-xmlrpc 2022-08-10 11:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069