1974414 – Uninstalling kube-descheduler clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5 removes some clusterrolebindings

Bug 1974414 - Uninstalling kube-descheduler clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5 removes some clusterrolebindings

Summary: Uninstalling kube-descheduler clusterkubedescheduleroperator.4.6.0-2021060108...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Vu Dinh
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:	1970910
Blocks:	1975453
TreeView+	depends on / blocked

Reported:	2021-06-21 15:43 UTC by Vu Dinh
Modified:	2021-07-27 23:13 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 23:13:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:13:37 UTC

Description Vu Dinh 2021-06-21 15:43:52 UTC

This bug was initially created as a copy of Bug #1970910

I am copying this bug because: 



Description of problem:
Uninstalling kube-descheduler clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5 removes some clusterrolebindings causing the cluster to be unusable.

Version-Release number of selected component (if applicable):
clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5

How reproducible:
Always. 

Steps to Reproduce:
1. Create a fresh installation of OCP 4.6
2. oc create -f aio-cluster-kube-descheduler-operator.yaml
3. oc create -f kubedescheduler-cluster.yaml
4. check csv and rolebindings:
oc get clusterrolebinding -A | wc -l
oc get csv
NAME                                                               DISPLAY                     VERSION                             REPLACES   PHASE
clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5   Kube Descheduler Operator   4.6.0-202106010807.p0.git.5db84c5              Pending
5. oc delete csv clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5
6. Wait for OLM to remove clusterrolebindings
7. oc get clusterrolebinding -A | wc -l

Actual results:
Number of clusterrolebindings reduced severely


Expected results:
Just the clusterrolebindings of the namespace been removed

Additional info:
Adding yaml files mentioned in reproducer steps.

Comment 1 RamaKasturi 2021-06-23 10:59:57 UTC

Hello Vu Dinh,

   one question related to the bug here, do we need to try with the same version of descheduler you  provided at [1] on 4.8 cluster as well ? Also after deleting the csv i do not see any clusterrolebindings of the namespaces are deleted.

[1] docker.io/dinhxuanvu/descheduler-index:v1

Thanks
kasturi

Comment 3 Vu Dinh 2021-06-23 16:11:47 UTC

Hey Rama,

Yes, please use the same version for descheduler operator.

Vu

Comment 4 RamaKasturi 2021-06-23 16:26:02 UTC

Moving the bug to verified state as i did not see any CRB getting deleted after deletion of csv. Below are the steps i followed to verify the bug.

steps followed:
===================
1) Install latest 4.8 cluster
2) create namespace called 'openshift-kube-descheduler-operator'
3) create operatorgroup using the yaml below
[knarra@knarra ~]$ cat /tmp/operatorgroup.yaml 
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-kube-descheduler-operator
  namespace: openshift-kube-descheduler-operator
spec:
  targetNamespaces:
    - openshift-kube-descheduler-operator
4) create catalogsource with index image using the yaml below
[knarra@knarra ~]$ cat /tmp/catalogsource.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: qe-app-registry
  namespace: openshift-kube-descheduler-operator
spec:
  sourceType: grpc
  image: docker.io/dinhxuanvu/descheduler-index:v1

5) create subscription using the yaml file below

[knarra@knarra ~]$ cat /tmp/subscription.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-kube-descheduler-operator
  namespace: openshift-kube-descheduler-operator
spec:
  channel: stable
  name: cluster-kube-descheduler-operator
  source: qe-app-registry
  sourceNamespace: openshift-kube-descheduler-operator

Now you can see that csv is in pending state with error "one or more requirements could not be found"

Events:
  Type    Reason               Age                    From                        Message
  ----    ------               ----                   ----                        -------
  Normal  RequirementsUnknown  2m15s                  operator-lifecycle-manager  requirements not yet checked
  Normal  RequirementsNotMet   2m14s (x2 over 2m15s)  operator-lifecycle-manager  one or more requirements couldn't be found
[knarra@knarra ~]$ oc get clusterrolebinding -A | wc -l
200
[knarra@knarra ~]$ oc get csv
NAME                                                               DISPLAY                     VERSION                             REPLACES   PHASE
clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5   Kube Descheduler Operator   4.6.0-202106010807.p0.git.5db84c5              Pending
[knarra@knarra ~]$ oc delete csv clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5
clusterserviceversion.operators.coreos.com "clusterkubedescheduleroperator.4.6.0-202106010807.p0.git.5db84c5" deleted
[knarra@knarra ~]$ oc get clusterrolebinding -A | wc -l
200
[knarra@knarra ~]$ oc get clusterrolebinding -A | wc -l
200
[knarra@knarra ~]$ oc get clusterrolebinding -A | wc -l
200

Comment 6 XiuJuan Wang 2021-07-02 09:51:31 UTC

If this bug target release version should be set 4.8.0? Seems the bug is verified on 4.8.0 version

Comment 7 RamaKasturi 2021-07-02 10:24:56 UTC

Hello XiuJuan,

  I am not sure if the target version should be set to 4.8.0, this bug was created just for the backporting purpose to 4.6. This was a bug which was seen in ocp4.6 and customer needed a fix as he is not willing to upgrade to 4.7. So we had to backport all the way from 4.9 to 4.6 though this issue does not appear on ocp4.7+. May be Vu Dinh will be a good contact to see if we can set the targetRelease to 4.8.0 ?

Thanks
kasturi

Comment 9 errata-xmlrpc 2021-07-27 23:13:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.