Bug 2281580
| Summary: | CephSubVolumeGroup cleanup job fails due to missing subvolumegroup name env | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Santosh Pillai <sapillai> |
| Component: | rook | Assignee: | Santosh Pillai <sapillai> |
| Status: | CLOSED ERRATA | QA Contact: | Daniel Osypenko <dosypenk> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.16 | CC: | dosypenk, jijoy, odf-bz-bot, tnielsen |
| Target Milestone: | --- | Keywords: | TestBlocker |
| Target Release: | ODF 4.16.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | isf-provider | ||
| Fixed In Version: | 4.16.0-106 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-07-17 13:23:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Santosh Pillai
2024-05-20 04:52:38 UTC
I have checked the scenario with client deletion from UI (Storage / Storage Clients / trash icon ) on Provider cluster with odf-operator.v4.16.0-108.stable and Client cluster ocs-client-operator.v4.16.0-108.stable
The issue is not resolved.
Steps:
c get storageconsumer -A -o jsonpath='{range .items[*]}{.metadata.name} {.status.client.clusterId}{"\n"}{end}'
storageconsumer-4f974f42-2301-460a-90a5-2607cabea062 4f974f42-2301-460a-90a5-2607cabea062
storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1 902cd8c9-4115-4424-b8d7-1cf4127135d1
storageconsumer-bcff8cf4-a660-4c2d-8834-051fcb373021 bcff8cf4-a660-4c2d-8834-051fcb373021
storageconsumer-bd1ac952-86c2-4d0e-a6b2-92020b41e959 bd1ac952-86c2-4d0e-a6b2-92020b41e959
storageconsumer-c01582a4-80f7-4aac-bf83-002fa086394f c01582a4-80f7-4aac-bf83-002fa086394f
storageconsumer-e5766b2d-e770-456f-8c9f-cc84140f8455 e5766b2d-e770-456f-8c9f-cc84140f8455
compare with storageclient CR - they dont refer one another
id: f452a26f-2163-4c73-a2ee-f8495ddd882c
step 1
switch to Client hcp415-bm3-i
step 2
get clusterID
oc get clusterversions.config.openshift.io version -n openshift-storage-client -o jsonpath='{.spec.clusterID}'
902cd8c9-4115-4424-b8d7-1cf4127135d1
step 3
oc get cronjob -n openshift-storage-client
oc patch cronjob storageclient-e5fb1f06bee2517f-status-reporter -n openshift-storage-client -p '{"spec":{"suspend":true}}'
set timer
step 4
switch back to Provider
step 5
get associated storagerequests on Provider:
oc get storagerequest -A -o jsonpath='{range .items[?(@.metadata.ownerReferences[0].name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}'
openshift-storage storagerequest-1df0252fd0b9919690f9f97306a25b47
openshift-storage storagerequest-92c3e721d8d5240b525c08a20d466ab3
step 6
get associated cephradosnamespace
oc get cephblockpoolradosnamespaces -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")].metadata.name}'
cephradosnamespace-db0cd87b2701f9e25f84de4254f8911e%
step 7
get associated cephfilesystemsubvolumegroups
oc get cephfilesystemsubvolumegroups -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1")].metadata.name}'
cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee
step 8
after 5 min check that hearbeat stopped and last heartbeat updated 5 min ago, alert appeared
go to UI of Provider and delete client with clusterID saved in step 2
902cd8c9-4115-4424-b8d7-1cf4127135d1
screen recording https://drive.google.com/file/d/1AowHjJi7Il5_j_BvNHA3VOTv_pEneCtF/view?usp=sharing
This is an unrelated bug. Not related to this BZ.
The cephFilesystemSubVolumeGroup (`cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee`) didn't even get the delete request.
```
❯ oc get cephfilesystemsubvolumegroups.ceph.rook.io -o yaml cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee
apiVersion: ceph.rook.io/v1
kind: CephFilesystemSubVolumeGroup
metadata:
creationTimestamp: "2024-05-23T11:29:43Z"
finalizers:
- cephfilesystemsubvolumegroup.ceph.rook.io
generation: 2
labels:
cephfilesystem.datapool.name: ocs-storagecluster-cephfilesystem-data0
ocs.openshift.io/storageconsumer-name: storageconsumer-902cd8c9-4115-4424-b8d7-1cf4127135d1
name: cephfilesystemsubvolumegroup-d1926f84e3f12c97361f29e8e431bcee
namespace: openshift-storage
ownerReferences:
- apiVersion: ocs.openshift.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: StorageRequest
name: storagerequest-92c3e721d8d5240b525c08a20d466ab3
uid: 4ccc75f8-9eaa-400b-a69e-300de0e49461
resourceVersion: "21498935"
uid: 228fc842-a652-4230-bca2-879c136672b7
spec:
dataPoolName: ""
filesystemName: ocs-storagecluster-cephfilesystem
pinning: {}
status:
info:
clusterID: bad4e5d041ba4bc96d0bc414eb8af6e9
observedGeneration: 2
phase: Ready
```
Moving it back to on_QA. Please open a new issue for this cluster. The StorageClient pods are crash looping with
```
❯ oc logs storageclient-737342087af10580-status-reporter-28617708-nmf28
W0530 09:50:00.976897 1 main.go:160] Failed to get clusterDNS "cluster": dnses.config.openshift.io "cluster" is forbidden: User "system:serviceaccount:openshift-storage:ocs-client-operator-status-reporter" cannot get resource "dnses" in API group "config.openshift.io" at the cluster scope
W0530 09:50:00.977062 1 main.go:164] Cluster Base Domain is empty.
F0530 09:50:01.021719 1 main.go:142] Failed to update mon configmap for storageClient d7fd5074-d22a-47c8-979c-67e1bf1e41d4: failed to fetch current csi config map: configmaps "ceph-csi-configs" is forbidden: User "system:serviceaccount:openshift-storage:ocs-client-operator-status-reporter" cannot get resource "configmaps" in API group "" in the namespace "openshift-storage"
```
-------- BM3 - hcp415-bm3-m
oc get clusterversions.config.openshift.io version -n openshift-storage-client -o jsonpath='{.spec.clusterID}'
b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7
from client:
oc get pvc -n hcp415-bm3-m-ns -o wide
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
ocs-storagecluster-ceph-pvc-1 Bound pvc-2095b84c-643c-4f4d-a362-833b4424bd9e 10Gi RWO kubevirt-csi-infra-default 15m Filesystem
ocs-storagecluster-ceph-pvc-2 Bound pvc-3bc01fd6-5a98-42c3-b028-b4b6105959a1 3Gi RWO storage-client-ceph-rbd 15m Filesystem
ocs-storagecluster-ceph-pvc-3 Bound pvc-659599a1-2713-4a4b-80ae-78caff8f1463 6Gi RWO storage-client-cephfs 15m Filesystem
ocs-storagecluster-ceph-pvc-4 Bound pvc-edfb80d7-cea5-4d87-b59f-6fb8fbe31c40 3Gi RWO kubevirt-csi-infra-default 15m Filesystem
ocs-storagecluster-ceph-pvc-5 Bound pvc-e602f843-ddbc-4855-a779-ffc614e085d4 5Gi RWO storage-client-ceph-rbd 15m Filesystem
ocs-storagecluster-ceph-pvc-6 Bound pvc-d9fed82c-0153-4547-87ac-90f8ee8ca052 6Gi RWO storage-client-cephfs 15m Filesystem
ocs-storagecluster-ceph-pvc-7 Bound pvc-b8cfbabd-662e-4be3-b9cb-65d26a55a5ea 5Gi RWO kubevirt-csi-infra-default 15m Filesystem
oc get storagerequest -A -o jsonpath='{range .items[?(@.metadata.ownerReferences[0].name=="storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}'
openshift-storage storagerequest-94a1aad6dfec7dc629d39fb0514c001b
openshift-storage storagerequest-eb0eb50567eed2694da91e19ca41dc4b
oc get cephblockpoolradosnamespaces -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7")].metadata.name}'
cephradosnamespace-659dbb570cbf8b2230c8350927bd4c4e
oc get cephfilesystemsubvolumegroups -n openshift-storage -o jsonpath='{.items[?(@.metadata.labels.ocs\.openshift\.io/storageconsumer-name=="storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7")].metadata.name}'
cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf
oc get cephfilesystemsubvolumegroups.ceph.rook.io cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf -n openshift-storage -oyaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystemSubVolumeGroup
metadata:
creationTimestamp: "2024-06-03T13:03:01Z"
finalizers:
- cephfilesystemsubvolumegroup.ceph.rook.io
generation: 2
labels:
cephfilesystem.datapool.name: ocs-storagecluster-cephfilesystem-data0
ocs.openshift.io/storageconsumer-name: storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7
name: cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf
namespace: openshift-storage
ownerReferences:
- apiVersion: ocs.openshift.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: StorageRequest
name: storagerequest-94a1aad6dfec7dc629d39fb0514c001b
uid: 5346a776-5199-411c-9b45-c73ea48a63b1
resourceVersion: "75312868"
uid: bf3142ec-2b14-4276-9be0-84d12a84277c
spec:
dataPoolName: ""
filesystemName: ocs-storagecluster-cephfilesystem
pinning: {}
status:
info:
clusterID: ce3a8e185680a4cb249a2a0180679fe0
observedGeneration: 2
phase: Ready
sh-5.1$ ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf
[
{
"name": "csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4"
},
{
"name": "csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699"
}
]
oc get cephfilesystemsubvolumegroups.ceph.rook.io cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf -n openshift-storage -oyaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystemSubVolumeGroup
metadata:
annotations:
rook.io/force-deletion: "true"
creationTimestamp: "2024-06-03T13:03:01Z"
finalizers:
- cephfilesystemsubvolumegroup.ceph.rook.io
generation: 2
labels:
cephfilesystem.datapool.name: ocs-storagecluster-cephfilesystem-data0
ocs.openshift.io/storageconsumer-name: storageconsumer-b29534a0-71b4-4c6b-8ae5-d5c7cadb82f7
name: cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf
namespace: openshift-storage
ownerReferences:
- apiVersion: ocs.openshift.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: StorageRequest
name: storagerequest-94a1aad6dfec7dc629d39fb0514c001b
uid: 5346a776-5199-411c-9b45-c73ea48a63b1
resourceVersion: "75751340"
uid: bf3142ec-2b14-4276-9be0-84d12a84277c
spec:
dataPoolName: ""
filesystemName: ocs-storagecluster-cephfilesystem
pinning: {}
status:
info:
clusterID: ce3a8e185680a4cb249a2a0180679fe0
observedGeneration: 2
phase: Ready
oc -n openshift-storage logs cleanup-svg-6d197fc1fbb3456bc7c42503614df334-cg6xg
2024/06/10 15:03:44 maxprocs: Leaving GOMAXPROCS=64: CPU quota undefined
2024-06-10 15:03:44.488227 I | rookcmd: starting Rook v4.16.0-0.a2396a5186cc038b22154e857e0f7865e709d06a with arguments '/usr/local/bin/rook ceph clean CephFilesystemSubVolumeGroup'
2024-06-10 15:03:44.488249 I | rookcmd: flag values: --help=false, --log-level=DEBUG
2024-06-10 15:03:44.488967 I | cleanup: starting clean up cephFS subVolumeGroup resource "cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf"
2024-06-10 15:03:44.488986 D | exec: Running command: ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2024-06-10 15:03:44.858858 I | cleanup: starting clean up of subvolume "csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4"
2024-06-10 15:03:44.858903 I | cleanup: OMAP value for the object "csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4" is "csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4"
2024-06-10 15:03:44.858951 D | exec: Running command: rados getomapval csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4 csi.volname -p ocs-storagecluster-cephfilesystem-metadata --namespace csi /dev/stdout --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring
2024-06-10 15:03:44.899078 I | cleanup: OMAP key for the OIMAP value "csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4" is "ceph.volume.pvc-d9fed82c-0153-4547-87ac-90f8ee8ca052"
2024-06-10 15:03:44.899102 D | exec: Running command: rados rm csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring
2024-06-10 15:03:44.945237 I | cephclient: successfully deleted omap value "csi.volume.f50bc2fd-d004-4975-8993-8e8f389463a4" for pool "ocs-storagecluster-cephfilesystem-metadata"
2024-06-10 15:03:44.945286 D | exec: Running command: rados rmomapkey csi.volumes.default ceph.volume.pvc-d9fed82c-0153-4547-87ac-90f8ee8ca052 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring
2024-06-10 15:03:44.987253 I | cephclient: successfully deleted omap key "ceph.volume.pvc-d9fed82c-0153-4547-87ac-90f8ee8ca052" for pool "ocs-storagecluster-cephfilesystem-metadata"
2024-06-10 15:03:44.987286 D | exec: Running command: ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4 --group_name cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2024-06-10 15:03:45.355556 D | exec: Running command: ceph fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-f50bc2fd-d004-4975-8993-8e8f389463a4 cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --force --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2024-06-10 15:03:45.714779 I | cleanup: starting clean up of subvolume "csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699"
2024-06-10 15:03:45.714806 I | cleanup: OMAP value for the object "csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699" is "csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699"
2024-06-10 15:03:45.714823 D | exec: Running command: rados getomapval csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699 csi.volname -p ocs-storagecluster-cephfilesystem-metadata --namespace csi /dev/stdout --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring
2024-06-10 15:03:45.756882 I | cleanup: OMAP key for the OIMAP value "csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699" is "ceph.volume.pvc-659599a1-2713-4a4b-80ae-78caff8f1463"
2024-06-10 15:03:45.756916 D | exec: Running command: rados rm csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring
2024-06-10 15:03:45.802948 I | cephclient: successfully deleted omap value "csi.volume.92bf24aa-e42b-4a40-9753-417e776a1699" for pool "ocs-storagecluster-cephfilesystem-metadata"
2024-06-10 15:03:45.802997 D | exec: Running command: rados rmomapkey csi.volumes.default ceph.volume.pvc-659599a1-2713-4a4b-80ae-78caff8f1463 -p ocs-storagecluster-cephfilesystem-metadata --namespace csi --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring
2024-06-10 15:03:45.848709 I | cephclient: successfully deleted omap key "ceph.volume.pvc-659599a1-2713-4a4b-80ae-78caff8f1463" for pool "ocs-storagecluster-cephfilesystem-metadata"
2024-06-10 15:03:45.848742 D | exec: Running command: ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699 --group_name cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2024-06-10 15:03:46.252544 D | exec: Running command: ceph fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-92bf24aa-e42b-4a40-9753-417e776a1699 cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf --force --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2024-06-10 15:03:46.639814 I | cleanup: successfully cleaned up cephFS subVolumeGroup "cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf"
ceph fs subvolume ls ocs-storagecluster-cephfilesystem cephfilesystemsubvolumegroup-11d45f38d9584ca0aa47d17035ed0adf
[]
ODF 4.16.0-118, Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591 |