Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1831802

Summary: CVO does not reconcile/restore basic-user ClusterRole
Product: OpenShift Container Platform Reporter: Rogerio Bastos <rbastos>
Component: kube-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED NOTABUG QA Contact: Xingxing Xia <xxia>
Severity: low Docs Contact:
Priority: low    
Version: 4.3.zCC: aos-bugs, cblecker, deads, jeder, jokerman, mfojtik, nmalik
Target Milestone: ---Keywords: ServiceDeliveryImpact
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-19 09:26:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rogerio Bastos 2020-05-05 16:39:55 UTC
Description of problem:
Currently, the basic-user ClusterRole is not managed/restored by CVO, if a user edits the ClusterRole, leaving an empty list of rules, different cluster components will stop working.

The basic-user ClusterRole is used by image registry, prometheus and others, if a user manages to  erase its permissions, different errors start occurring in the cluster:

1) Failure to pull images

2) Blocking pod creation due to img failures:
Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/cli:latest": 
rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password: unauthorized: unable to validate token: Forbidden

3) Errors from different ServiceAccounts, like prometheus:
time="2020-05-05T01:28:38.007246679Z" level=error msg="Get user failed with error: users.user.openshift.io \"~\" is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot get resource \"users\" in API group \"user.openshift.io\" at the cluster scope"



Version-Release number of the following components:
Openshift 4.3.12

How reproducible:
Every time you perform the ClusterRole update described below

Steps to Reproduce:
As a cluster user edit the basic-user ClusterRole and clear the rules array:
Example: 
$ oc edit clusterrole basic-user  --as-group=dedicated-admins --as=myded-admin



Actual results:
Different errors (listed above) start happening in cluster

Expected results:
CVO should reconcile the basic-user ClusterRole back to the default/initial config

Comment 1 Christoph Blecker 2020-05-05 17:10:18 UTC
You can duplicate this with a user that is *not* cluster-admin, but *has* been given access to patch clusterroles:
oc patch clusterrole/basic-user --type=json -p '[{"op": "remove", "path": "/rules" }]'

This will cause the above impacts to the registry and prometheus.

Comment 2 David Eads 2020-05-06 15:07:41 UTC
The stock ClusterRoles are maintained during kube-apiserver start, not via an operator or a controller.  To fix this in the future, you can force a redeployment of the kube-apiserver to trigger a restart and the role will be fixed.

A better solution of having the kube-apiserver run a controller to maintain these roles is technically possible, but currently expensive in terms of QPS.  It could be done, but it won't be something quickly actionable.

Comment 3 Stefan Schimanski 2020-05-19 09:26:15 UTC
There is a workaround. This is the behaviour since RBAC was created.

As we don't promise to repair anything possible in the cluster, this is not a bug or regression. If we want to repair RBAC roles in realtime, that's a feature. Please move to Jira.

Comment 4 Christoph Blecker 2020-06-11 00:08:56 UTC
Created https://issues.redhat.com/browse/RFE-970 for this