Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1831802

Summary:	CVO does not reconcile/restore basic-user ClusterRole
Product:	OpenShift Container Platform	Reporter:	Rogerio Bastos <rbastos>
Component:	kube-apiserver	Assignee:	Stefan Schimanski <sttts>
Status:	CLOSED NOTABUG	QA Contact:	Xingxing Xia <xxia>
Severity:	low	Docs Contact:
Priority:	low
Version:	4.3.z	CC:	aos-bugs, cblecker, deads, jeder, jokerman, mfojtik, nmalik
Target Milestone:	---	Keywords:	ServiceDeliveryImpact
Target Release:	---
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-05-19 09:26:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rogerio Bastos 2020-05-05 16:39:55 UTC

Description of problem:
Currently, the basic-user ClusterRole is not managed/restored by CVO, if a user edits the ClusterRole, leaving an empty list of rules, different cluster components will stop working.

The basic-user ClusterRole is used by image registry, prometheus and others, if a user manages to  erase its permissions, different errors start occurring in the cluster:

1) Failure to pull images

2) Blocking pod creation due to img failures:
Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/cli:latest": 
rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password: unauthorized: unable to validate token: Forbidden

3) Errors from different ServiceAccounts, like prometheus:
time="2020-05-05T01:28:38.007246679Z" level=error msg="Get user failed with error: users.user.openshift.io \"~\" is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot get resource \"users\" in API group \"user.openshift.io\" at the cluster scope"



Version-Release number of the following components:
Openshift 4.3.12

How reproducible:
Every time you perform the ClusterRole update described below

Steps to Reproduce:
As a cluster user edit the basic-user ClusterRole and clear the rules array:
Example: 
$ oc edit clusterrole basic-user  --as-group=dedicated-admins --as=myded-admin



Actual results:
Different errors (listed above) start happening in cluster

Expected results:
CVO should reconcile the basic-user ClusterRole back to the default/initial config

Comment 1 Christoph Blecker 2020-05-05 17:10:18 UTC

You can duplicate this with a user that is *not* cluster-admin, but *has* been given access to patch clusterroles:
oc patch clusterrole/basic-user --type=json -p '[{"op": "remove", "path": "/rules" }]'

This will cause the above impacts to the registry and prometheus.

Comment 2 David Eads 2020-05-06 15:07:41 UTC

The stock ClusterRoles are maintained during kube-apiserver start, not via an operator or a controller.  To fix this in the future, you can force a redeployment of the kube-apiserver to trigger a restart and the role will be fixed.

A better solution of having the kube-apiserver run a controller to maintain these roles is technically possible, but currently expensive in terms of QPS.  It could be done, but it won't be something quickly actionable.

Comment 3 Stefan Schimanski 2020-05-19 09:26:15 UTC

There is a workaround. This is the behaviour since RBAC was created.

As we don't promise to repair anything possible in the cluster, this is not a bug or regression. If we want to repair RBAC roles in realtime, that's a feature. Please move to Jira.

Comment 4 Christoph Blecker 2020-06-11 00:08:56 UTC

Created https://issues.redhat.com/browse/RFE-970 for this