Description of problem (please be detailed as possible and provide log snippests): This should be same as https://github.com/ceph/ceph-csi/issues/4013, we have a customer with similar problem in OpenShift Virtualization and is easy to hit. The OpenShift Virtualization has golden images of templates which will be used for VM boot disk[1]. The source image will be either a PVC (before 4.14) or a VolumeSnapshot. VMs created from these templates will use CSI clone to clone the new PVC. PVC and image details of the VM created from a rhel9 golden image: ~~~ rhel9-minimum-mammal Bound pvc-f9685b54-fd9e-44d0-b882-aa7d707e588b 30Gi RWX ocs-external-storagecluster-ceph-rbd 39s oc get pv pvc-f9685b54-fd9e-44d0-b882-aa7d707e588b -o json |jq '.spec.csi.volumeAttributes.imageName' "csi-vol-8f886291-10e7-49f6-9d7f-69db4bc6d21e" # rbd info sno/csi-vol-8f886291-10e7-49f6-9d7f-69db4bc6d21e |grep parent parent: sno/csi-snap-df6fed52-f9eb-4711-af8b-5d6fe1940a7c@csi-snap-df6fed52-f9eb-4711-af8b-5d6fe1940a7c ~~~ Below is the golden image and corresponding RBD image: ~~~ # oc get volumesnapshot rhel9-6c486c3e5f8c -o json |jq '.status.boundVolumeSnapshotContentName' "snapcontent-34a43465-0341-4c26-84af-9da741e91b81" # oc get volumesnapshotcontent snapcontent-34a43465-0341-4c26-84af-9da741e91b81 -o json |jq '.status.snapshotHandle,.spec.source.volumeHandle' "0001-0011-openshift-storage-0000000000000001-df6fed52-f9eb-4711-af8b-5d6fe1940a7c" "0001-0011-openshift-storage-0000000000000001-0d5f7a5d-3650-4efc-830f-fd8be1b4bf06" # rbd snap ls sno/csi-snap-df6fed52-f9eb-4711-af8b-5d6fe1940a7c SNAPID NAME SIZE PROTECTED TIMESTAMP 89 csi-snap-df6fed52-f9eb-4711-af8b-5d6fe1940a7c 30 GiB Mon Feb 19 15:10:25 2024 # rbd info sno/csi-snap-df6fed52-f9eb-4711-af8b-5d6fe1940a7c |grep parent op_features: clone-parent, clone-child parent: sno/csi-vol-0d5f7a5d-3650-4efc-830f-fd8be1b4bf06@1a58acf9-08dc-4a14-845f-44e918ff718e (trash b5aa5e1635793) ~~~ CDI deletes older PVCs when a new version of OS image is imported and by default only keeps three versions of images[2]. Once the above golden image is deleted, cloning of VMs created from the image will fail: ~~~ # oc delete volumesnapshot rhel9-6c486c3e5f8c # rbd trash ls --pool sno |grep csi-snap-df6fed52-f9eb-4711-af8b-5d6fe1940a7c b5aa5204dc99d csi-snap-df6fed52-f9eb-4711-af8b-5d6fe1940a7c ~~~ The cloning of rhel9-minimum-mammal will fail: ~~~ # oc get pvc tmp-pvc-ac6e3d87-98a9-493f-a5dd-f7f7bed409f8 -n nijin-cnv -o json |jq '.spec.dataSource' { "apiGroup": null, "kind": "PersistentVolumeClaim", "name": "rhel9-minimum-mammal" } I0219 15:45:29.679359 1 utils.go:206] ID: 298 Req-ID: pvc-f2322676-d577-48b9-abd1-2c8972a2ae5f GRPC request: {"capacity_range":{"required_bytes":32212254720},"name":"pvc-f2322676-d577-48b9-abd1-2c8972a2ae5f","parameters":{"clusterID":"openshift-storage","csi.storage.k8s.io/pv/name":"pvc-f2322676-d577-48b9-abd1-2c8972a2ae5f","csi.storage.k8s.io/pvc/name":"tmp-pvc-ac6e3d87-98a9-493f-a5dd-f7f7bed409f8","csi.storage.k8s.io/pvc/namespace":"nijin-cnv","imageFeatures":"layering,deep-flatten,exclusive-lock,object-map,fast-diff","imageFormat":"2","pool":"sno"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Block":{}},"access_mode":{"mode":5}}],"volume_content_source":{"Type":{"Volume":{"volume_id":"0001-0011-openshift-storage-0000000000000001-8f886291-10e7-49f6-9d7f-69db4bc6d21e"}}}} I0219 15:45:29.679512 1 rbd_util.go:1308] ID: 298 Req-ID: pvc-f2322676-d577-48b9-abd1-2c8972a2ae5f setting disableInUseChecks: true image features: [layering fast-diff exclusive-lock object-map deep-flatten] mounter: rbd I0219 15:45:29.681111 1 omap.go:88] ID: 298 Req-ID: pvc-f2322676-d577-48b9-abd1-2c8972a2ae5f got omap values: (pool="sno", namespace="", name="csi.volume.8f886291-10e7-49f6-9d7f-69db4bc6d21e"): map[csi.imageid:b5aa5232a500f csi.imagename:csi-vol-8f886291-10e7-49f6-9d7f-69db4bc6d21e csi.volname:pvc-f9685b54-fd9e-44d0-b882-aa7d707e588b csi.volume.owner:openshift-virtualization-os-images] I0219 15:45:29.709824 1 omap.go:88] ID: 298 Req-ID: pvc-f2322676-d577-48b9-abd1-2c8972a2ae5f got omap values: (pool="sno", namespace="", name="csi.volumes.default"): map[] E0219 15:45:29.739129 1 utils.go:210] ID: 298 Req-ID: pvc-f2322676-d577-48b9-abd1-2c8972a2ae5f GRPC error: rpc error: code = Internal desc = image not found: RBD image not found ~~~ Version of all relevant components (if applicable): odf-operator.v4.14.4 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Users will be unable to clone VMs if the source VMs' golden images have been garbage collected by CDI Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create a VM from a golden image using ceph rbd storage class with block mode. 2. Delete the golden image VolumeSnapshot. 3. Try to clone the VM created in step [1]. Actual results: PVC cloning is failing with error "RBD image not found" Expected results: Cloning should work. Additional info: [1] https://docs.openshift.com/container-platform/4.14/virt/virtual_machines/creating_vms_rh/virt-creating-vms-from-rh-images-overview.html#virt-about-golden-images_virt-creating-vms-from-rh-images-overview [2] https://github.com/kubevirt/containerized-data-importer/blob/42ec627e3593c45027c2ebffb32f26a182c9cdee/pkg/controller/dataimportcron-controller.go#L769
Adding
Looks like the bug is still not fixed in 4.16.0-94. I've tried the first 4 steps out of the following (scenario specified in comment#3) - Create PVC ( created pvc1) - Create Snapshot ) created pvc1snap1) - Delete PVC ( deleted pvc1) - Restore Snapshot into pvc-restore ( tried to restore pvc1snap1 to a new pvc) Expected result: the restore action should be enabled in the UI and should work. Actual result: the option to restore is greyed out. I did check whether it is possible to restore from a snapshot when the initial pvc is not deleted.
Due to comments #9 and #10 the BZ has failed QA and was reopened (changed the status to Assigned).
@Rakshith, I've deployed another 4.16 cluster and reproduced this scenario again with TWX block mode pvc, and the problem was reproduced once again. To be more specific: 1) I've created RWX block mode pvc ( ypersky-pvc1) 2) I've successfully created a snapshot ( ypersky=pvc1-snapshot1) 3) I've successfully restored this snapshot ypersky=pvc1-snapshot1 to PVC ( ypersky-pvc1-snapshot1-restore1) ( to make sure that Restore is possible when the initial pvc is not deleted). 4) I've deleted ypersky-pvc1 5) When I try to restore again from ypersky=pvc1-snapshot1 - the option of Restore is greyed out as in the attached print screen. I did try to change access mode to any of the RWO, RWX, ROX - for each one of those options the Restore is not possible ( greyed out). You are welcome to check on this cluster : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-simple-deploy-odf-cluster/66/ The above cluster will be available for a few more days. Reopening the BZ again.
As for the BZ verification on the cli - it is possible to restore a pvc from snapshot on 4.16.0-94, when the parent PVC is deleted scenario: 1) create pvc1 2) create ypersky-pvc1-snapshot1 3) delete pvc1 4) create pvc1-1-snap1-restore with the following command: oc create -f <restore_yaml> While the content of the yaml file is: --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc1-snapshot1-restore1-cli namespace: default spec: accessModes: - ReadWriteMany dataSource: apiGroup: snapshot.storage.k8s.io kind: VolumeSnapshot name: ypersky-pvc1-snapshot1 resources: requests: storage: 1Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Block Moving this BZ to verified state and will open a new BZ for the UI.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591