Bug 1831802 - CVO does not reconcile/restore basic-user ClusterRole
Summary: CVO does not reconcile/restore basic-user ClusterRole
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.3.z
Hardware: All
OS: All
low
low
Target Milestone: ---
: ---
Assignee: Stefan Schimanski
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-05 16:39 UTC by Rogerio Bastos
Modified: 2020-06-11 00:08 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-19 09:26:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Rogerio Bastos 2020-05-05 16:39:55 UTC
Description of problem:
Currently, the basic-user ClusterRole is not managed/restored by CVO, if a user edits the ClusterRole, leaving an empty list of rules, different cluster components will stop working.

The basic-user ClusterRole is used by image registry, prometheus and others, if a user manages to  erase its permissions, different errors start occurring in the cluster:

1) Failure to pull images

2) Blocking pod creation due to img failures:
Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/cli:latest": 
rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password: unauthorized: unable to validate token: Forbidden

3) Errors from different ServiceAccounts, like prometheus:
time="2020-05-05T01:28:38.007246679Z" level=error msg="Get user failed with error: users.user.openshift.io \"~\" is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot get resource \"users\" in API group \"user.openshift.io\" at the cluster scope"



Version-Release number of the following components:
Openshift 4.3.12

How reproducible:
Every time you perform the ClusterRole update described below

Steps to Reproduce:
As a cluster user edit the basic-user ClusterRole and clear the rules array:
Example: 
$ oc edit clusterrole basic-user  --as-group=dedicated-admins --as=myded-admin



Actual results:
Different errors (listed above) start happening in cluster

Expected results:
CVO should reconcile the basic-user ClusterRole back to the default/initial config

Comment 1 Christoph Blecker 2020-05-05 17:10:18 UTC
You can duplicate this with a user that is *not* cluster-admin, but *has* been given access to patch clusterroles:
oc patch clusterrole/basic-user --type=json -p '[{"op": "remove", "path": "/rules" }]'

This will cause the above impacts to the registry and prometheus.

Comment 2 David Eads 2020-05-06 15:07:41 UTC
The stock ClusterRoles are maintained during kube-apiserver start, not via an operator or a controller.  To fix this in the future, you can force a redeployment of the kube-apiserver to trigger a restart and the role will be fixed.

A better solution of having the kube-apiserver run a controller to maintain these roles is technically possible, but currently expensive in terms of QPS.  It could be done, but it won't be something quickly actionable.

Comment 3 Stefan Schimanski 2020-05-19 09:26:15 UTC
There is a workaround. This is the behaviour since RBAC was created.

As we don't promise to repair anything possible in the cluster, this is not a bug or regression. If we want to repair RBAC roles in realtime, that's a feature. Please move to Jira.

Comment 4 Christoph Blecker 2020-06-11 00:08:56 UTC
Created https://issues.redhat.com/browse/RFE-970 for this


Note You need to log in before you can comment on or make changes to this bug.