Description of problem: If your default storage class was not supporting snapshots, boot source images, created by the DataImportCron in openshift-virtualization-os-images namespace, will be imported as the DVs/PVCs. When you switch the default storage class to OCS, you can re-import the images by deleting the old DVs. The DV/PVC will be re-imported, VolumeSnapshot object will be created, and DV/PVC will be removed automatically. Alex akalenyu looked at it, and sees 2 issues: Issue 1: Snapshots are being made out of the previous storage class (when changing SC from HPP->OCS) Issue 2: When deleting the old storage class DVs, there may be a race where the snapshot got created, but the DV didn't recreate Version-Release number of selected component (if applicable): 4.14 How reproducible: Always Steps to Reproduce: 1. Have a non-snapshotable default storage class (HPP) 2. See that DVs/PVCs were imported $ oc get dv -A NAMESPACE NAME PHASE PROGRESS RESTARTS AGE openshift-virtualization-os-images centos-stream8-b9b768dcd73b Succeeded 100.0% 18h openshift-virtualization-os-images centos-stream9-362e1f1d9f11 Succeeded 100.0% 18h openshift-virtualization-os-images centos7-680e9b4e0fba Succeeded 100.0% 18h openshift-virtualization-os-images fedora-f7cc15256f08 Succeeded 100.0% 18h openshift-virtualization-os-images rhel8-0da894200daa Succeeded 100.0% 18h openshift-virtualization-os-images rhel9-b006ef7856b6 Succeeded 100.0% 18h 3. Make HPP non-default, make OCS default oc patch storageclass ocs-storagecluster-ceph-rbd -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}' 4. Delete one DV $ oc delete dv -n openshift-virtualization-os-images rhel9-b006ef7856b6 datavolume.cdi.kubevirt.io "rhel9-b006ef7856b6" deleted 5. DV didn't get recreated (but should have been), VolumeSnapshot was created, but it's not Ready $ oc get VolumeSnapshot -A NAMESPACE NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE openshift-virtualization-os-images rhel9-b006ef7856b6 false rhel9-b006ef7856b6 ocs-storagecluster-rbdplugin-snapclass 13s [cloud-user@ocp-psi-executor ~]$ oc get VolumeSnapshot -n openshift-virtualization-os-images rhel9-b006ef7856b6 -oyaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: annotations: cdi.kubevirt.io/storage.import.lastUseTime: "2023-07-27T14:31:32.631870881Z" creationTimestamp: "2023-07-27T14:31:32Z" finalizers: - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection generation: 1 labels: app: containerized-data-importer app.kubernetes.io/component: storage app.kubernetes.io/managed-by: cdi-controller app.kubernetes.io/part-of: hyperconverged-cluster app.kubernetes.io/version: 4.14.0 cdi.kubevirt.io: "" cdi.kubevirt.io/dataImportCron: rhel9-image-cron name: rhel9-b006ef7856b6 namespace: openshift-virtualization-os-images resourceVersion: "1182048" uid: d69181d0-4195-4b3f-91b4-ba3631f05249 spec: source: persistentVolumeClaimName: rhel9-b006ef7856b6 volumeSnapshotClassName: ocs-storagecluster-rbdplugin-snapclass status: error: message: 'Failed to create snapshot content with error snapshot controller failed to update rhel9-b006ef7856b6 on API server: cannot get claim from snapshot' 6. See that 2 minutes later, other VolumeSnapshots are created while old DVs were not yet deleted $ oc get VolumeSnapshot -A NAMESPACE NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE openshift-virtualization-os-images centos-stream8-b9b768dcd73b false centos-stream8-b9b768dcd73b ocs-storagecluster-rbdplugin-snapclass snapcontent-8455f2ea-0d70-4998-9fa5-bbc42133b1f5 23s openshift-virtualization-os-images centos-stream9-362e1f1d9f11 false centos-stream9-362e1f1d9f11 ocs-storagecluster-rbdplugin-snapclass snapcontent-3eec6ff1-f73f-493f-b61b-58abfeec5b65 23s openshift-virtualization-os-images centos7-680e9b4e0fba false centos7-680e9b4e0fba ocs-storagecluster-rbdplugin-snapclass snapcontent-76229453-37ff-40f6-8ce0-94e15a5b912c 23s openshift-virtualization-os-images fedora-f7cc15256f08 false fedora-f7cc15256f08 ocs-storagecluster-rbdplugin-snapclass snapcontent-94d05d80-20f5-4861-a7af-344f19842a61 23s openshift-virtualization-os-images rhel8-0da894200daa false rhel8-0da894200daa ocs-storagecluster-rbdplugin-snapclass snapcontent-df7f9a06-4a2e-41b1-8f04-a16758daf4e8 23s openshift-virtualization-os-images rhel9-b006ef7856b6 false rhel9-b006ef7856b6 ocs-storagecluster-rbdplugin-snapclass 2m47s 7. See the yaml of another VolumeSnapshot, whose DV/PVC wasn't deleted and still using non-snapshotable HPP: spec: source: persistentVolumeClaimName: centos-stream8-b9b768dcd73b volumeSnapshotClassName: ocs-storagecluster-rbdplugin-snapclass status: boundVolumeSnapshotContentName: snapcontent-8455f2ea-0d70-4998-9fa5-bbc42133b1f5 error: message: 'Failed to check and update snapshot content: failed to take snapshot of the volume pvc-e59ee8cd-57d0-4ecf-906f-0ab7a1f8ba72: "rpc error: code = Internal desc = panic runtime error: invalid memory address or nil pointer dereference"' time: "2023-07-27T14:33:56Z" readyToUse: false 8. To fix the broken VolumeSnapshot of the first deleted DV: delete that VolumeSnapshot $ oc delete VolumeSnapshot -n openshift-virtualization-os-images rhel9-b006ef7856b6 volumesnapshot.snapshot.storage.k8s.io "rhel9-b006ef7856b6" deleted 9. This will trigger the DV/PVC to re-import on OCS, create a VolumeSnapshot that will be ReadyToUse, and DV/PVC will be deleted automatically. Actual results: Re-importing requires more steps. Expected results: Re-importing should happen once we switch the storage class and delete the old DVs.
We also should encounter this situation: OCS was the default, DataImportCron images were imported and stayed as VolumeSnapshots But then we changed the default storage class to HPP - new DVs/PVCs are not created unless we delete the VolumeSnapshot And there are reconcile errors in the log
*** Bug 2228606 has been marked as a duplicate of this bug. ***
Verified on 4.14.0: Steps: 1. Made ocs default storage class: [cloud-user@ocp-psi-executor-xl ~]$ oc patch storageclass hostpath-csi-basic -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "false"}}}' storageclass.storage.k8s.io/hostpath-csi-basic patched [cloud-user@ocp-psi-executor-xl ~]$ oc patch storageclass ocs-storagecluster-ceph-rbd -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}' storageclass.storage.k8s.io/ocs-storagecluster-ceph-rbd patched 2. delete rhel9 DV [cloud-user@ocp-psi-executor-xl ~]$ oc delete dv $NS rhel9-a1947a1edca5 datavolume.cdi.kubevirt.io "rhel9-a1947a1edca5" deleted An import started, and the DV gets recreated: [cloud-user@ocp-psi-executor-xl ~]$ oc get dv $NS rhel9-a1947a1edca5 NAME PHASE PROGRESS RESTARTS AGE rhel9-a1947a1edca5 ImportInProgress 82.94% 50s 3. After the import is finished - a volumesnapshot is created out of ocs storageclass [cloud-user@ocp-psi-executor-xl ~]$ oc get volumesnapshot $NS NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE rhel9-a1947a1edca5 true rhel9-a1947a1edca5 30Gi ocs-storagecluster-rbdplugin-snapclass snapcontent-e4a9b852-0b1c-4e47-b143-060428944515 51s 53s 4. The DV and the PVC are deleted: [cloud-user@ocp-psi-executor-xl ~]$ oc get dv $NS NAME PHASE PROGRESS RESTARTS AGE centos-stream8-894237fb27f8 Succeeded 100.0% 3d18h centos-stream9-a37c5c3cb1d0 Succeeded 100.0% 3d18h centos7-680e9b4e0fba Succeeded 100.0% 3d18h fedora-f7cc15256f08 Succeeded 100.0% 3d18h rhel8-b8545b0b6174 Succeeded 100.0% 3d18h [cloud-user@ocp-psi-executor-xl ~]$ oc get pvc $NS NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE centos-stream8-894237fb27f8 Bound pvc-fb9907c2-3bf6-442c-834c-2e83e4916e7e 149Gi RWO hostpath-csi-basic 3d18h centos-stream9-a37c5c3cb1d0 Bound pvc-22920d27-fe5f-4f27-8532-a42cb3f523c7 149Gi RWO hostpath-csi-basic 3d18h centos7-680e9b4e0fba Bound pvc-66459f77-eeff-49e6-9717-8f3a47d0a681 149Gi RWO hostpath-csi-basic 3d18h fedora-f7cc15256f08 Bound pvc-847e8a14-06c0-4c23-8e7a-f21919d2bdc0 149Gi RWO hostpath-csi-basic 3d18h rhel8-b8545b0b6174 Bound pvc-848690c2-73a6-434b-9735-9c8c25ce06c6 149Gi RWO hostpath-csi-basic 18m 5. set HPP as default SC again. The result is the snapshot is still availble and the DV wasnt imported: [cloud-user@ocp-psi-executor-xl ~]$ oc get dv $NS NAME PHASE PROGRESS RESTARTS AGE centos-stream8-894237fb27f8 Succeeded 100.0% 3d18h centos-stream9-a37c5c3cb1d0 Succeeded 100.0% 3d18h centos7-680e9b4e0fba Succeeded 100.0% 3d18h fedora-f7cc15256f08 Succeeded 100.0% 3d18h rhel8-b8545b0b6174 Succeeded 100.0% 3d18h [cloud-user@ocp-psi-executor-xl ~]$ oc get volumesnapshot $NS NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE rhel9-a1947a1edca5 true rhel9-a1947a1edca5 30Gi ocs-storagecluster-rbdplugin-snapclass snapcontent-e4a9b852-0b1c-4e47-b143-060428944515 4m54s 4m56s 6. Deleted the rhel9 volumesnapshot Result: The dv get imported, and after completed a pvc and dv are bound: [cloud-user@ocp-psi-executor-xl ~]$ oc delete volumesnapshot $NS rhel9-a1947a1edca5 volumesnapshot.snapshot.storage.k8s.io "rhel9-a1947a1edca5" deleted NAME PHASE PROGRESS RESTARTS AGE rhel9-a1947a1edca5 Succeeded 100.0% 4m25s [cloud-user@ocp-psi-executor-xl ~]$ ^dv^pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rhel9-a1947a1edca5 Bound pvc-885fd0d5-de6e-4bd0-836a-36e4df1df647 149Gi RWO hostpath-csi-basic 4m33s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817