Description of problem (please be detailed as possible and provide log snippests): The cleanup job for CephSubVolumeGroup reads the subvolumegroup name from `spec.name` of the CR. But this name is not provided all the time. So if this `spec.Name` is not provided then operator should used CR name, Version of all relevant components (if applicable): ODF 4.16 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Create ODF storage cluster 2.Create CephSubVolumeGroup resource without `spec.Name` entry 3.Add cleanup annotation to the CR. 4.Delete the CR and observe the cleanup job logs. Actual results: The cleanup jobs fails due to missing env varible $ k logs cleanup-svg-9b7989c82f0b72ee4f9e1b92444f221b-qn99p -f 2024/05/17 10:53:30 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined 2024-05-17 10:53:30.650936 I | rookcmd: starting Rook v1.14.0-103.g3ddafdfe5 with arguments '/usr/local/bin/rook ceph clean CephFilesystemSubVolumeGroup' 2024-05-17 10:53:30.650954 I | rookcmd: flag values: --help=false, --log-level=DEBUG 2024-05-17 10:53:30.651594 C | rookcmd: cephFS SubVolumeGroup name is not available in the pod environment variables Expected results: Cleanup job should not fail Additional info:
I have checked the scenario with client deletion from UI (Storage / Storage Clients / trash icon ) on Provider cluster with odf-operator.v4.16.0-108.stable and Client cluster ocs-client-operator.v4.16.0-108.stable The issue is not resolved. Steps: c get storageconsumer -A -o jsonpath='{range .items[*]}{.metadata.name} {.status.client.clusterId}{"\n"}{end}' storageconsumer-4f974f42-2301-460a-90a5-2607cabea062 4f974f42-2301-460a-90a5-2607cabea062 storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1 902cd8c9-4115-4424-b8d7-1cf4127135d1 storageconsumer-bcff8cf4-a660-4c2d-8834-051fcb373021 bcff8cf4-a660-4c2d-8834-051fcb373021 storageconsumer-bd1ac952-86c2-4d0e-a6b2-92020b41e959 bd1ac952-86c2-4d0e-a6b2-92020b41e959 storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f c01582a4-80f7-4aac-bf83-002fa086394f storageconsumer-e5766b2d-e770-456f-8c9f-cc84140f8455 e5766b2d-e770-456f-8c9f-cc84140f8455 compare with storageclient CR - they dont refer one another id: f452a26f-2163-4c73-a2ee-f8495ddd882c step 1 switch to Client hcp415-bm3-i step 2 get clusterID oc get clusterversions.config.openshift.io version -n openshift-storage-client -o jsonpath='{.spec.clusterID}' 902cd8c9-4115-4424-b8d7-1cf4127135d1 step 3 oc get cronjob -n openshift-storage-client oc patch cronjob storageclient-e5fb1f06bee2517f-status-reporter -n openshift-storage-client -p '{"spec":{"suspend":true}}' set timer step 4 switch back to Provider step 5 get associated storagerequests on Provider: oc get storagerequest -A -o jsonpath='{range .items[?(@.metadata.ownerReferences[0].name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}' openshift-storage storagerequest-1df0252fd0b9919690f9f97306a25b47 openshift-storage storagerequest-92c3e721d8d5240b525c08a20d466ab3 step 6 get associated cephradosnamespace oc get cephblockpoolradosnamespaces -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")].metadata.name}' cephradosnamespace-db0cd87b2701f9e25f84de4254f8911e% step 7 get associated cephfilesystemsubvolumegroups oc get cephfilesystemsubvolumegroups -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")].metadata.name}' cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee step 8 after 5 min check that hearbeat stopped and last heartbeat updated 5 min ago, alert appeared go to UI of Provider and delete client with clusterID saved in step 2 902cd8c9-4115-4424-b8d7-1cf4127135d1 screen recording https://drive.google.com/file/d/1AowHjJi7Il5_j_BvNHA3VOTv_pEneCtF/view?usp=sharing
This is an unrelated bug. Not related to this BZ. The cephFilesystemSubVolumeGroup (`cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee`) didn't even get the delete request. ``` ❯ oc get cephfilesystemsubvolumegroups.ceph.rook.io -o yaml cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee apiVersion: ceph.rook.io/v1 kind: CephFilesystemSubVolumeGroup metadata: creationTimestamp: "2024-05-23T11:29:43Z" finalizers: - cephfilesystemsubvolumegroup.ceph.rook.io generation: 2 labels: cephfilesystem.datapool.name: ocs-storagecluster-cephfilesystem-data0 ocs.openshift.io/storageconsumer-name: storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1 name: cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1alpha1 blockOwnerDeletion: true controller: true kind: StorageRequest name: storagerequest-92c3e721d8d5240b525c08a20d466ab3 uid: 4ccc75f8-9eaa-400b-a69e-300de0e49461 resourceVersion: "21498935" uid: 228fc842-a652-4230-bca2-879c136672b7 spec: dataPoolName: "" filesystemName: ocs-storagecluster-cephfilesystem pinning: {} status: info: clusterID: bad4e5d041ba4bc96d0bc414eb8af6e9 observedGeneration: 2 phase: Ready ``` Moving it back to on_QA. Please open a new issue for this cluster. The StorageClient pods are crash looping with ``` ❯ oc logs storageclient-737342087af10580-status-reporter-28617708-nmf28 W0530 09:50:00.976897 1 main.go:160] Failed to get clusterDNS "cluster": dnses.config.openshift.io "cluster" is forbidden: User "system:serviceaccount:openshift-storage:ocs-client-operator-status-reporter" cannot get resource "dnses" in API group "config.openshift.io" at the cluster scope W0530 09:50:00.977062 1 main.go:164] Cluster Base Domain is empty. F0530 09:50:01.021719 1 main.go:142] Failed to update mon configmap for storageClient d7fd5074-d22a-47c8-979c-67e1bf1e41d4: failed to fetch current csi config map: configmaps "ceph-csi-configs" is forbidden: User "system:serviceaccount:openshift-storage:ocs-client-operator-status-reporter" cannot get resource "configmaps" in API group "" in the namespace "openshift-storage" ```
-------- BM3 - hcp415-bm3-m oc get clusterversions.config.openshift.io version -n openshift-storage-client -o jsonpath='{.spec.clusterID}' b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7 from client: oc get pvc -n hcp415-bm3-m-ns -o wide NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE ocs-storagecluster-ceph-pvc-1 Bound pvc-2095b84c-643c-4f4d-a362-833b4424bd9e 10Gi RWO kubevirt-csi-infra-default 15m Filesystem ocs-storagecluster-ceph-pvc-2 Bound pvc-3bc01fd6-5a98-42c3-b028-b4b6105959a1 3Gi RWO storage-client-ceph-rbd 15m Filesystem ocs-storagecluster-ceph-pvc-3 Bound pvc-659599a1-2713-4a4b-80ae-78caff8f1463 6Gi RWO storage-client-cephfs 15m Filesystem ocs-storagecluster-ceph-pvc-4 Bound pvc-edfb80d7-cea5-4d87-b59f-6fb8fbe31c40 3Gi RWO kubevirt-csi-infra-default 15m Filesystem ocs-storagecluster-ceph-pvc-5 Bound pvc-e602f843-ddbc-4855-a779-ffc614e085d4 5Gi RWO storage-client-ceph-rbd 15m Filesystem ocs-storagecluster-ceph-pvc-6 Bound pvc-d9fed82c-0153-4547-87ac-90f8ee8ca052 6Gi RWO storage-client-cephfs 15m Filesystem ocs-storagecluster-ceph-pvc-7 Bound pvc-b8cfbabd-662e-4be3-b9cb-65d26a55a5ea 5Gi RWO kubevirt-csi-infra-default 15m Filesystem oc get storagerequest -A -o jsonpath='{range .items[?(@.metadata.ownerReferences[0].name=="storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}' openshift-storage storagerequest-94a1aad6dfec7dc629d39fb0514c001b openshift-storage storagerequest-eb0eb50567eed2694da91e19ca41dc4b oc get cephblockpoolradosnamespaces -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7")].metadata.name}' cephradosnamespace-659dbb570cbf8b2230c8350927bd4c4e oc get cephfilesystemsubvolumegroups -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7")].metadata.name}' cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf oc get cephfilesystemsubvolumegroups.ceph.rook.io cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf -n openshift-storage -oyaml apiVersion: ceph.rook.io/v1 kind: CephFilesystemSubVolumeGroup metadata: creationTimestamp: "2024-06-03T13:03:01Z" finalizers: - cephfilesystemsubvolumegroup.ceph.rook.io generation: 2 labels: cephfilesystem.datapool.name: ocs-storagecluster-cephfilesystem-data0 ocs.openshift.io/storageconsumer-name: storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7 name: cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1alpha1 blockOwnerDeletion: true controller: true kind: StorageRequest name: storagerequest-94a1aad6dfec7dc629d39fb0514c001b uid: 5346a776-5199-411c-9b45-c73ea48a63b1 resourceVersion: "75312868" uid: bf3142ec-2b14-4276-9be0-84d12a84277c spec: dataPoolName: "" filesystemName: ocs-storagecluster-cephfilesystem pinning: {} status: info: clusterID: ce3a8e185680a4cb249a2a0180679fe0 observedGeneration: 2 phase: Ready sh-5.1$ ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf [ { "name": "csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4" }, { "name": "csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699" } ] oc get cephfilesystemsubvolumegroups.ceph.rook.io cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf -n openshift-storage -oyaml apiVersion: ceph.rook.io/v1 kind: CephFilesystemSubVolumeGroup metadata: annotations: rook.io/force-deletion: "true" creationTimestamp: "2024-06-03T13:03:01Z" finalizers: - cephfilesystemsubvolumegroup.ceph.rook.io generation: 2 labels: cephfilesystem.datapool.name: ocs-storagecluster-cephfilesystem-data0 ocs.openshift.io/storageconsumer-name: storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7 name: cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1alpha1 blockOwnerDeletion: true controller: true kind: StorageRequest name: storagerequest-94a1aad6dfec7dc629d39fb0514c001b uid: 5346a776-5199-411c-9b45-c73ea48a63b1 resourceVersion: "75751340" uid: bf3142ec-2b14-4276-9be0-84d12a84277c spec: dataPoolName: "" filesystemName: ocs-storagecluster-cephfilesystem pinning: {} status: info: clusterID: ce3a8e185680a4cb249a2a0180679fe0 observedGeneration: 2 phase: Ready oc -n openshift-storage logs cleanup-svg-6d197fc1fbb3456bc7c42503614df334-cg6xg 2024/06/10 15:03:44 maxprocs: Leaving GOMAXPROCS=64: CPU quota undefined 2024-06-10 15:03:44.488227 I | rookcmd: starting Rook v4.16.0-0.a2396a5186cc038b22154e857e0f7865e709d06a with arguments '/usr/local/bin/rook ceph clean CephFilesystemSubVolumeGroup' 2024-06-10 15:03:44.488249 I | rookcmd: flag values: --help=false, --log-level=DEBUG 2024-06-10 15:03:44.488967 I | cleanup: starting clean up cephFS subVolumeGroup resource "cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf" 2024-06-10 15:03:44.488986 D | exec: Running command: ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-06-10 15:03:44.858858 I | cleanup: starting clean up of subvolume "csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4" 2024-06-10 15:03:44.858903 I | cleanup: OMAP value for the object "csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4" is "csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4" 2024-06-10 15:03:44.858951 D | exec: Running command: rados getomapval csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4 csi.volname -p ocs-storagecluster-cephfilesystem-metadata --namespace csi /dev/stdout --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-06-10 15:03:44.899078 I | cleanup: OMAP key for the OIMAP value "csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4" is "ceph.volume.pvc-d9fed82c-0153-4547-87ac-90f8ee8ca052" 2024-06-10 15:03:44.899102 D | exec: Running command: rados rm csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-06-10 15:03:44.945237 I | cephclient: successfully deleted omap value "csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4" for pool "ocs-storagecluster-cephfilesystem-metadata" 2024-06-10 15:03:44.945286 D | exec: Running command: rados rmomapkey csi.volumes.default ceph.volume.pvc-d9fed82c-0153-4547-87ac-90f8ee8ca052 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-06-10 15:03:44.987253 I | cephclient: successfully deleted omap key "ceph.volume.pvc-d9fed82c-0153-4547-87ac-90f8ee8ca052" for pool "ocs-storagecluster-cephfilesystem-metadata" 2024-06-10 15:03:44.987286 D | exec: Running command: ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4 --group_name cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-06-10 15:03:45.355556 D | exec: Running command: ceph fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4 cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --force --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-06-10 15:03:45.714779 I | cleanup: starting clean up of subvolume "csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699" 2024-06-10 15:03:45.714806 I | cleanup: OMAP value for the object "csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699" is "csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699" 2024-06-10 15:03:45.714823 D | exec: Running command: rados getomapval csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699 csi.volname -p ocs-storagecluster-cephfilesystem-metadata --namespace csi /dev/stdout --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-06-10 15:03:45.756882 I | cleanup: OMAP key for the OIMAP value "csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699" is "ceph.volume.pvc-659599a1-2713-4a4b-80ae-78caff8f1463" 2024-06-10 15:03:45.756916 D | exec: Running command: rados rm csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-06-10 15:03:45.802948 I | cephclient: successfully deleted omap value "csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699" for pool "ocs-storagecluster-cephfilesystem-metadata" 2024-06-10 15:03:45.802997 D | exec: Running command: rados rmomapkey csi.volumes.default ceph.volume.pvc-659599a1-2713-4a4b-80ae-78caff8f1463 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-06-10 15:03:45.848709 I | cephclient: successfully deleted omap key "ceph.volume.pvc-659599a1-2713-4a4b-80ae-78caff8f1463" for pool "ocs-storagecluster-cephfilesystem-metadata" 2024-06-10 15:03:45.848742 D | exec: Running command: ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699 --group_name cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-06-10 15:03:46.252544 D | exec: Running command: ceph fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699 cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --force --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-06-10 15:03:46.639814 I | cleanup: successfully cleaned up cephFS subVolumeGroup "cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf" ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf [] ODF 4.16.0-118, Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591