Bug 2313736 - [Provider] Rook deploys provisioner and plugin pods after deleting rook-ceph-operator pod when ROOK_CSI_DISABLE_DRIVER is True
Summary: [Provider] Rook deploys provisioner and plugin pods after deleting rook-ceph-...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.17
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.17.0
Assignee: Parth Arora
QA Contact: Jilju Joy
URL:
Whiteboard: isf-provider
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-09-20 10:06 UTC by Jilju Joy
Modified: 2024-10-30 14:35 UTC (History)
3 users (show)

Fixed In Version: 4.17.0-109
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-10-30 14:35:46 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 734 0 None open Bug 2313736: csi: fix the disable driver flag in the csi driver reconcile 2024-09-24 13:07:40 UTC
Github rook rook pull 14746 0 None open csi: fix the csi driver reconcile 2024-09-20 12:08:37 UTC
Red Hat Issue Tracker OCSBZM-9294 0 None None None 2024-10-03 13:07:18 UTC
Red Hat Product Errata RHSA-2024:8676 0 None None None 2024-10-30 14:35:49 UTC

Description Jilju Joy 2024-09-20 10:06:55 UTC
Description of problem (please be detailed as possible and provide log
snippests):
After running a set of automated tests, it is observed that the deployments csi-rbdplugin-provisioner, csi-cephfsplugin-provisioner and the daemonsets csi-cephfsplugin, csi-rbdplugin were deployed on provider cluster when ROOK_CSI_DISABLE_DRIVER is "true".

Upon further investigation, it was found that the deletion of rook-ceph-operator pod caused this. If ocs-client-operator-controller-manager is restarted after this, these deployments and daemonsets owned by rook-ceph-operator deployment will be deleted. 


Initial status

% oc get daemonsets
NAME                                               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
openshift-storage.cephfs.csi.ceph.com-nodeplugin   3         3         3       3            3           <none>          41h
openshift-storage.rbd.csi.ceph.com-nodeplugin      3         3         3       3            3           <none>          41h
% oc get deployments | grep -E "provisioner|ctrlplugin"
openshift-storage.cephfs.csi.ceph.com-ctrlplugin     2/2     2            2           41h
openshift-storage.rbd.csi.ceph.com-ctrlplugin        2/2     2            2           41h
% 
% 
% oc get pods | grep ocs-client-operator-controller-manager
ocs-client-operator-controller-manager-9bd575ccb-rxgjs            2/2     Running     0              5m2s
% 
% oc get pods | grep rook-ceph-operator                    
rook-ceph-operator-8654886f75-vz9z7                               1/1     Running     0              21h
% 
% 

Delete rook-ceph-operator pod.

% oc delete pod rook-ceph-operator-8654886f75-vz9z7
pod "rook-ceph-operator-8654886f75-vz9z7" deleted
% 
% 
% 
% oc get pods | grep rook-ceph-operator            
rook-ceph-operator-8654886f75-qxs7g                               1/1     Running     0              33s
% 
% 
New csi daemonsets and deployments created by rook
% oc get daemonsets
NAME                                               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
csi-cephfsplugin                                   3         3         3       3            3           <none>          27s
csi-rbdplugin                                      3         3         0       3            0           <none>          27s
openshift-storage.cephfs.csi.ceph.com-nodeplugin   3         3         3       3            3           <none>          41h
openshift-storage.rbd.csi.ceph.com-nodeplugin      3         3         3       3            3           <none>          41h
% 
% 
% oc get deployments | grep -E "provisioner|ctrlplugin"
csi-cephfsplugin-provisioner                         2/2     2            2           43s
csi-rbdplugin-provisioner                            2/2     2            2           43s
openshift-storage.cephfs.csi.ceph.com-ctrlplugin     2/2     2            2           41h
openshift-storage.rbd.csi.ceph.com-ctrlplugin        2/2     2            2           41h
% 
% 
Delete ocs-client-operator-controller-manager pod.
% oc delete pod ocs-client-operator-controller-manager-9bd575ccb-rxgjs
pod "ocs-client-operator-controller-manager-9bd575ccb-rxgjs" deleted
% 
% 
% oc get pods | grep ocs-client-operator-controller-manager     
ocs-client-operator-controller-manager-9bd575ccb-dpvfc            0/2     Init:0/1    0              69s
% 
% oc get pods | grep ocs-client-operator-controller-manager
ocs-client-operator-controller-manager-9bd575ccb-dpvfc            0/2     PodInitializing   0              91s
% 
% 
% oc get pods | grep ocs-client-operator-controller-manager
ocs-client-operator-controller-manager-9bd575ccb-dpvfc            2/2     Running     0              102s
% 
% 
Previously created daemonsets and deployments are deleted automatically.
% oc get daemonsets
NAME                                               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
openshift-storage.cephfs.csi.ceph.com-nodeplugin   3         3         3       3            3           <none>          41h
openshift-storage.rbd.csi.ceph.com-nodeplugin      3         3         3       3            3           <none>          41h
% oc get deployments | grep -E "provisioner|ctrlplugin"
openshift-storage.cephfs.csi.ceph.com-ctrlplugin     2/2     2            2           41h
openshift-storage.rbd.csi.ceph.com-ctrlplugin        2/2     2            2           41h

% oc get cm ocs-operator-config -o yaml
apiVersion: v1
data:
  CSI_CLUSTER_NAME: 883d5e66-3214-42cf-8dec-17630a5f4328
  CSI_DISABLE_HOLDER_PODS: "true"
  CSI_ENABLE_TOPOLOGY: "false"
  CSI_TOPOLOGY_DOMAIN_LABELS: ""
  ROOK_CSI_DISABLE_DRIVER: "true"
  ROOK_CSI_ENABLE_NFS: "false"
  ROOK_CURRENT_NAMESPACE_ONLY: "true"
kind: ConfigMap
metadata:
  creationTimestamp: "2024-09-18T19:47:39Z"
  name: ocs-operator-config
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ocs.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: OCSInitialization
    name: ocsinit
    uid: b1fdeeaf-706f-4897-a906-a595452b123c
  resourceVersion: "90610"
  uid: 62d174d7-5275-41cc-96f8-a227192a5da9


% oc exec rook-ceph-operator-8654886f75-vz9z7 -- printenv | grep -i disable_driver
ROOK_CSI_DISABLE_DRIVER=true

ownerReferences for provisioner deployments and csi-cephfsplugin, csi-rbdplugin  daemonsets is

ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: true
    kind: Deployment
    name: rook-ceph-operator

=======================================================

Version of all relevant components (if applicable):

% oc get csv
NAME                                         DISPLAY                            VERSION               REPLACES                                     PHASE
cephcsi-operator.v4.17.0-101.stable          CephCSI operator                   4.17.0-101.stable                                                  Succeeded
ingress-node-firewall.v4.16.0-202408262007   Ingress Node Firewall Operator     4.16.0-202408262007   ingress-node-firewall.v4.16.0-202409051837   Succeeded
mcg-operator.v4.17.0-101.stable              NooBaa Operator                    4.17.0-101.stable                                                  Succeeded
metallb-operator.v4.17.0-202409182235        MetalLB Operator                   4.17.0-202409182235   metallb-operator.v4.17.0-202409161407        Succeeded
ocs-client-operator.v4.17.0-101.stable       OpenShift Data Foundation Client   4.17.0-101.stable                                                  Succeeded
ocs-operator.v4.17.0-101.stable              OpenShift Container Storage        4.17.0-101.stable                                                  Succeeded
odf-csi-addons-operator.v4.17.0-101.stable   CSI Addons                         4.17.0-101.stable                                                  Succeeded
odf-operator.v4.17.0-101.stable              OpenShift Data Foundation          4.17.0-101.stable                                                  Succeeded
odf-prometheus-operator.v4.17.0-101.stable   Prometheus Operator                4.17.0-101.stable                                                  Succeeded
recipe.v4.17.0-101.stable                    Recipe                             4.17.0-101.stable                                                  Succeeded
rook-ceph-operator.v4.17.0-101.stable        Rook-Ceph                          4.17.0-101.stable                                                  Succeeded


% oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.10   True        False         39h     Cluster version is 4.16.10

==================================================

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Reporting the first occurrence

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
Yes
============================================
Steps to Reproduce:

1. On a 4.17 provider cluster, delete rook-ceph-operator pod
2. Check the presence of deployments csi-cephfsplugin-provisioner, csi-rbdplugin-provisioner and daemonsets csi-cephfsplugin, csi-rbdplugin. These should not be present.
==============================================

Actual results:
deployments csi-rbdplugin-provisioner, csi-cephfsplugin-provisioner and the daemonsets csi-cephfsplugin, csi-rbdplugin were deployed.

Expected results:
Rook should not deploy CSI as long as  ROOK_CSI_DISABLE_DRIVER is True

Additional info:
Must gather log: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2313736/

Comment 6 errata-xmlrpc 2024-10-30 14:35:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676


Note You need to log in before you can comment on or make changes to this bug.