Bug 2280946
| Summary: | Cephblockpoolradosnamespace and subvolumegroups not deleted with storageconsumer deletion | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Jilju Joy <jijoy> |
| Component: | rook | Assignee: | Santosh Pillai <sapillai> |
| Status: | CLOSED ERRATA | QA Contact: | Jilju Joy <jijoy> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.16 | CC: | dosypenk, lgangava, odf-bz-bot, sapillai, tnielsen |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.16.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | isf-provider | ||
| Fixed In Version: | 4.16.0-106 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-07-17 13:23:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jilju Joy
2024-05-17 09:46:54 UTC
I have checked the scenario with client deletion from UI (Storage / Storage Clients / trash icon ) on Provider cluster with odf-operator.v4.16.0-108.stable and Client cluster ocs-client-operator.v4.16.0-108.stable
The issue is not resolved.
Steps:
c get storageconsumer -A -o jsonpath='{range .items[*]}{.metadata.name} {.status.client.clusterId}{"\n"}{end}'
storageconsumer-4f974f42-2301-460a-90a5-2607cabea062 4f974f42-2301-460a-90a5-2607cabea062
storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1 902cd8c9-4115-4424-b8d7-1cf4127135d1
storageconsumer-bcff8cf4-a660-4c2d-8834-051fcb373021 bcff8cf4-a660-4c2d-8834-051fcb373021
storageconsumer-bd1ac952-86c2-4d0e-a6b2-92020b41e959 bd1ac952-86c2-4d0e-a6b2-92020b41e959
storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f c01582a4-80f7-4aac-bf83-002fa086394f
storageconsumer-e5766b2d-e770-456f-8c9f-cc84140f8455 e5766b2d-e770-456f-8c9f-cc84140f8455
compare with storageclient CR - they dont refer one another
id: f452a26f-2163-4c73-a2ee-f8495ddd882c
step 1
switch to Client hcp415-bm3-i
step 2
get clusterID
oc get clusterversions.config.openshift.io version -n openshift-storage-client -o jsonpath='{.spec.clusterID}'
902cd8c9-4115-4424-b8d7-1cf4127135d1
step 3
oc get cronjob -n openshift-storage-client
oc patch cronjob storageclient-e5fb1f06bee2517f-status-reporter -n openshift-storage-client -p '{"spec":{"suspend":true}}'
set timer
step 4
switch back to Provider
step 5
get associated storagerequests on Provider:
oc get storagerequest -A -o jsonpath='{range .items[?(@.metadata.ownerReferences[0].name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}'
openshift-storage storagerequest-1df0252fd0b9919690f9f97306a25b47
openshift-storage storagerequest-92c3e721d8d5240b525c08a20d466ab3
step 6
get associated cephradosnamespace
oc get cephblockpoolradosnamespaces -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")].metadata.name}'
cephradosnamespace-db0cd87b2701f9e25f84de4254f8911e%
step 7
get associated cephfilesystemsubvolumegroups
oc get cephfilesystemsubvolumegroups -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")].metadata.name}'
cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee
step 8
after 5 min check that hearbeat stopped and last heartbeat updated 5 min ago, alert appeared
go to UI of Provider and delete client with clusterID saved in step 2
902cd8c9-4115-4424-b8d7-1cf4127135d1
screen recording https://drive.google.com/file/d/1AowHjJi7Il5_j_BvNHA3VOTv_pEneCtF/view?usp=sharing
I have tested the same with another client when subvolumes and rbd images exist. Both provider and client are ocs-client-operator.v4.16.0-108.stable
(interpreter) ➜ ocs-ci git:(master) ✗ oc get storagerequest -A -o jsonpath='{range .items[?(@.metadata.ownerReferences[0].name=="storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}'
openshift-storage storagerequest-3da25b2c25c01bf68207bef8a66c6e27
openshift-storage storagerequest-9019dcf8665ea8ddc2187c82230b0b50
openshift-storage storagerequest-c78afdf62730a457f004413c0e308012
(interpreter) ➜ ocs-ci git:(master) ✗ oc get storageconsumer storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f -n openshift-storage
Error from server (NotFound): storageconsumers.ocs.openshift.io "storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f" not found
(interpreter) ➜ ocs-ci git:(master) ✗ oc get cephblockpoolradosnamespaces -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f")].metadata.name}'
cephradosnamespace-60867fefece3a1865a2c3f08ae49d1b6 cephradosnamespace-86f0f753d45ee87382a8678d036d98ca%
(interpreter) ➜ ocs-ci git:(master) ✗ oc get cephfilesystemsubvolumegroups -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f")].metadata.name}'
\cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc% (interpreter) ➜ ocs-ci git:(master) ✗
(interpreter) ➜ ocs-ci git:(master) ✗
(interpreter) ➜ ocs-ci git:(master) ✗ tool
zsh: command not found: tool
(interpreter) ➜ ocs-ci git:(master) ✗ toolbash
ocsinitialization.ocs.openshift.io/ocsinit patched
sh-5.1$ ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc
[
{
"name": "csi-vol-090e5c49-15f4-4d06-ae6a-8a42ef0f17ef"
},
{
"name": "csi-vol-41f55c47-0854-4abb-bd99-46f10301499b"
}
]
sh-5.1$ rbd ls -p ocs-storagecluster-cephblockpool --namespace cephradosnamespace-60867fefece3a1865a2c3f08ae49d1b6
csi-vol-65257493-e43c-4210-9fee-b9b97a8cd0df
csi-vol-65257493-e43c-4210-9fee-b9b97a8cd0df
sh: csi-vol-65257493-e43c-4210-9fee-b9b97a8cd0df: command not found
sh-5.1$ rbd ls -p ocs-storagecluster-cephblockpool --namespace cephradosnamespace-60867fefece3a1865a2c3f08ae49d1b6
csi-vol-65257493-e43c-4210-9fee-b9b97a8cd0df
screen recording - https://drive.google.com/file/d/1LMkmOCLK7_dhUGlKhEglMm-0RRDpChqi/view?usp=sharing
(In reply to Daniel Osypenko from comment #13) > I have tested the same with another client when subvolumes and rbd images > exist. Both provider and client are ocs-client-operator.v4.16.0-108.stable Please provide the cluster where this is happening. https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/38138/ copied to dm (In reply to Daniel Osypenko from comment #15) > https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe- > deploy-ocs-cluster/38138/ > copied to dm Thanks Daniel Checked the cluster. I see that the resources mentioned in the comment are deleted and the clean up jobs are working correctly. CephSubVolumeGroup: `cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc` ❯ oc exec -it rook-ceph-tools-555d6cd8f7-7v529 sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. sh-5.1$ ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc Error ENOENT: subvolume group 'cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc' does not exist sh-5.1$ exit exit Cleanup job logs: ``` ❯ oc logs cleanup-svg-86529c108be12e0cc51fef8a3c8cc989-rjg7x 2024/05/30 15:12:33 maxprocs: Leaving GOMAXPROCS=64: CPU quota undefined 2024-05-30 15:12:33.243205 I | rookcmd: starting Rook v4.16.0-0.9b2caf9cc037c2396b89ae2116a98795b6acd978 with arguments '/usr/local/bin/rook ceph clean CephFilesystemSubVolumeGroup' 2024-05-30 15:12:33.243329 I | rookcmd: flag values: --help=false, --log-level=DEBUG 2024-05-30 15:12:33.243917 I | cleanup: starting clean up cephFS subVolumeGroup resource "cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc" 2024-05-30 15:12:33.243934 D | exec: Running command: ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-05-30 15:12:33.586566 I | cleanup: starting clean up of subvolume "csi-vol-090e5c49-15f4-4d06-ae6a-8a42ef0f17ef" 2024-05-30 15:12:33.586586 I | cleanup: OMAP value for the object "csi-vol-090e5c49-15f4-4d06-ae6a-8a42ef0f17ef" is "csi.volume.090e5c49-15f4-4d06-ae6a-8a42ef0f17ef" 2024-05-30 15:12:33.586607 D | exec: Running command: rados getomapval csi.volume.090e5c49-15f4-4d06-ae6a-8a42ef0f17ef csi.volname -p ocs-storagecluster-cephfilesystem-metadata --namespace csi /dev/stdout --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-05-30 15:12:33.626217 I | cleanup: OMAP key for the OIMAP value "csi.volume.090e5c49-15f4-4d06-ae6a-8a42ef0f17ef" is "ceph.volume.pvc-1b2d6de6-3edb-427e-aea0-bc595f4837a8" 2024-05-30 15:12:33.626237 D | exec: Running command: rados rm csi.volume.090e5c49-15f4-4d06-ae6a-8a42ef0f17ef -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-05-30 15:12:33.665008 I | cephclient: successfully deleted omap value "csi.volume.090e5c49-15f4-4d06-ae6a-8a42ef0f17ef" for pool "ocs-storagecluster-cephfilesystem-metadata" 2024-05-30 15:12:33.665045 D | exec: Running command: rados rmomapkey csi.volumes.default ceph.volume.pvc-1b2d6de6-3edb-427e-aea0-bc595f4837a8 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-05-30 15:12:33.702992 I | cephclient: successfully deleted omap key "ceph.volume.pvc-1b2d6de6-3edb-427e-aea0-bc595f4837a8" for pool "ocs-storagecluster-cephfilesystem-metadata" 2024-05-30 15:12:33.703012 D | exec: Running command: ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem csi-vol-090e5c49-15f4-4d06-ae6a-8a42ef0f17ef --group_name cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-05-30 15:12:34.032427 D | exec: Running command: ceph fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-090e5c49-15f4-4d06-ae6a-8a42ef0f17ef cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc --force --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-05-30 15:12:34.347869 I | cleanup: starting clean up of subvolume "csi-vol-41f55c47-0854-4abb-bd99-46f10301499b" 2024-05-30 15:12:34.347889 I | cleanup: OMAP value for the object "csi-vol-41f55c47-0854-4abb-bd99-46f10301499b" is "csi.volume.41f55c47-0854-4abb-bd99-46f10301499b" 2024-05-30 15:12:34.347904 D | exec: Running command: rados getomapval csi.volume.41f55c47-0854-4abb-bd99-46f10301499b csi.volname -p ocs-storagecluster-cephfilesystem-metadata --namespace csi /dev/stdout --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-05-30 15:12:34.386386 I | cleanup: OMAP key for the OIMAP value "csi.volume.41f55c47-0854-4abb-bd99-46f10301499b" is "ceph.volume.pvc-e4d51546-ae4f-4df0-8e32-f8a96f3a602f" 2024-05-30 15:12:34.386418 D | exec: Running command: rados rm csi.volume.41f55c47-0854-4abb-bd99-46f10301499b -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-05-30 15:12:34.424939 I | cephclient: successfully deleted omap value "csi.volume.41f55c47-0854-4abb-bd99-46f10301499b" for pool "ocs-storagecluster-cephfilesystem-metadata" 2024-05-30 15:12:34.424974 D | exec: Running command: rados rmomapkey csi.volumes.default ceph.volume.pvc-e4d51546-ae4f-4df0-8e32-f8a96f3a602f -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring 2024-05-30 15:12:34.464664 I | cephclient: successfully deleted omap key "ceph.volume.pvc-e4d51546-ae4f-4df0-8e32-f8a96f3a602f" for pool "ocs-storagecluster-cephfilesystem-metadata" 2024-05-30 15:12:34.464698 D | exec: Running command: ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem csi-vol-41f55c47-0854-4abb-bd99-46f10301499b --group_name cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-05-30 15:12:34.816279 D | exec: Running command: ceph fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-41f55c47-0854-4abb-bd99-46f10301499b cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc --force --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2024-05-30 15:12:35.151179 I | cleanup: successfully cleaned up cephFS subVolumeGroup "cephfilesystemsubvolumegroup-c103cd65023f6e7bb20f101b1122c7fc" ``` -------------------------------- CephRadosNamespace: sh-5.1$ rbd ls -p ocs-storagecluster-cephblockpool --namespace cephradosnamespace-60867fefece3a1865a2c3f08ae49d1b6 rbd: namespace 'cephradosnamespace-60867fefece3a1865a2c3f08ae49d1b6' does not exist. rbd: listing images failed: (2) No such file or directory sh-5.1$ ❯ oc get cephblockpoolradosnamespaces cephradosnamespace-60867fefece3a1865a2c3f08ae49d1b6 -o yaml Error from server (NotFound): cephblockpoolradosnamespaces.ceph.rook.io "cephradosnamespace-60867fefece3a1865a2c3f08ae49d1b6" not found ---------------------- So moving this back to on_QA to verify again. resources are left behind, please check it @sapillai
➜ oc get cephblockpoolradosnamespaces -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")].metadata.name}'
cephradosnamespace-db0cd87b2701f9e25f84de4254f8911e%
➜ oc get cephfilesystemsubvolumegroups -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")].metadata.name}'
cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee%
and
➜ oc get cephblockpoolradosnamespaces -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f")].metadata.name}'
cephradosnamespace-961a77489221c265904ce799b20e5bb2%
➜ oc get cephfilesystemsubvolumegroups -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f")].metadata.name}'
cephfilesystemsubvolumegroup-7bff4824be367f35c0a05ba23a4ec6bf%
Verified in version: OCP 4.16.0-ec.6 ODF 4.16.0-110 Same version of client and OCP in the hosted cluster. Client created on hosted cluster using agent. Created two additional storageclaims on client side. This created new cephblockpoolradosnamespace and cephfilesystemsubvolumegroup. So there are two cephblockpoolradosnamespaces and cephfilesystemsubvolumegroups associated with the storageconsumer. Created PVCs using all the 4 storageclasses. Created PVCs are RBD - Block and Filesystem volume mode. CephFS - Filesystem volume mode. 1 GB of data was present on each volume at the time of deleting the hosted cluster. Snapshot of the PVCs were present. Deleted the hosted cluster where the storageclient is present. After 20 minutes, deleted the storageclient from storage clients page in provider cluster UI. At the time of deletion, last heartbeat was showing as 20 minutes ago in the UI. Cephblockpoolradosnamespaces and cephfilesystemsubvolumegroups are deleted. Test details and outputs are given in https://bugzilla.redhat.com/show_bug.cgi?id=2280813#c6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591 |