Bug 2229454

Summary: CSI volume cloning stuck
Product: Container Native Virtualization (CNV) Reporter: Roni Kishner <rkishner>
Component: StorageAssignee: Adam Litke <alitke>
Status: CLOSED NOTABUG QA Contact: Natalie Gavrielov <ngavrilo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.14.0CC: jpeimer
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-06 18:04:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roni Kishner 2023-08-06 07:03:16 UTC
Description of problem:
When trying to start a VM from a dataSource using PVC source, the DV is stuck on CSICloneInProgress status, and the PVC is stuck on Pending status.

The VM dataVolumeTemplates:
  dataVolumeTemplates:
  - apiVersion: cdi.kubevirt.io/v1beta1
    kind: DataVolume
    metadata:
      creationTimestamp: null
      name: fedora
    spec:
      pvc:
        accessModes:
        - ReadWriteMany
        resources:
          requests:
            storage: "34087042032"
        storageClassName: ocs-storagecluster-ceph-rbd
      sourceRef:
        kind: DataSource
        name: fedora
        namespace: openshift-virtualization-os-images

Version-Release number of selected component (if applicable):
4.14

How reproducible:
100%

Steps to Reproduce:
1.Create a VM with a similiar dataVolumeTemplates
2.Start the VM

Actual results:
The DV/PVC are stuck

Expected results:
The DV/PVC should be created and the VMI should start 

Additional info:
The docs mention the source and the destination PVC must exist in the same namespace, so maybe this is related ? - https://docs.openshift.com/container-platform/4.14/storage/container_storage_interface/persistent-storage-csi-cloning.html

Comment 5 Jenia Peimer 2023-08-06 11:19:03 UTC
We have a PVC in 'openshift-virtualization-os-images' ns on 'hostpath-csi-basic' - ReadWriteOnce, Filesystem,
and trying to cross-namespace clone it to 'ocs-storagecluster-ceph-rbd' 

In VM's dataVolumeTemplates we used 'pvc' API and only requested the ReadWriteMany, but did not specify the volumeMode, 
and in 'tmp-pvc-' it appeared to be Filesystem. 

OCS doesn't support the ReadWriteMany + Filesystem, so the tmp-pvc is Pending

[cloud-user@ocp-psi-executor jenia]$ oc get pvc -n openshift-virtualization-os-images tmp-pvc-f86706b6-9f6c-411f-8a05-de50f8b7a71f -oyaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/clonePhase: Pending
    cdi.kubevirt.io/cloneType: csi-clone
    cdi.kubevirt.io/dataSourceNamespace: openshift-virtualization-os-images
    cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE2OTEzMTc0NjUsImlhdCI6MTY5MTMxNzE2NSwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJmZWRvcmEtZjdjYzE1MjU2ZjA4IiwibmFtZXNwYWNlIjoib3BlbnNoaWZ0LXZpcnR1YWxpemF0aW9uLW9zLWltYWdlcyIsIm5iZiI6MTY5MTMxNzE2NSwib3BlcmF0aW9uIjoiQ2xvbmUiLCJwYXJhbXMiOnsidGFyZ2V0TmFtZSI6ImZlZG9yYSIsInRhcmdldE5hbWVzcGFjZSI6Imluc3RhbmNlLXR5cGVzLXRlc3Qtdm0tb3ZlcnJpZGUtcHJlZiJ9LCJyZXNvdXJjZSI6eyJncm91cCI6IiIsInJlc291cmNlIjoicGVyc2lzdGVudHZvbHVtZWNsYWltcyIsInZlcnNpb24iOiJ2MSJ9fQ.pHz5m1qZ6eQDrlu39nFWhn1ff6VuoZQieUlp1YDRHi_YHZs791lOD0o4inPNwEt_ceW-0WSGyfESZqOVEqSpL4CQ_lfPaiJGXO_whdjK6nxUYbilzEkAZvBOw30UyG9PjZY5Cad-jtydZ7zCWV5rscwPqj87pyQNec6Uq9nEmv62fovrvm6NR1cVF9yKFAdyi3ep6smkaS_5nzneooR4e2ziZtXkoLoDvK3AJXb0uw6OUhMf2xa4MCIuXKiHH5bcS99N0wL832WNkoaH2kn6XFxdHBPhnJrmqzGCMfQDz8WJxHVuXKjRlZhbhjd-3Jh-N4Ko5nHtIgbMawQc_4XnUA
    cdi.kubevirt.io/storage.condition.running: "false"
    cdi.kubevirt.io/storage.condition.running.message: Clone Pending
    cdi.kubevirt.io/storage.condition.running.reason: Pending
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.populator.kind: VolumeCloneSource
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    cdi.kubevirt.io/storage.usePopulator: "true"
    volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
  creationTimestamp: "2023-08-06T10:19:25Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    alerts.k8s.io/KubePersistentVolumeFillingUp: disabled
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 4.14.0
    cdi.kubevirt.io/OwnedByUID: f86706b6-9f6c-411f-8a05-de50f8b7a71f
    kubevirt.io/created-by: 6cca3fd0-6976-43dc-a266-c12f189912fc
  name: tmp-pvc-f86706b6-9f6c-411f-8a05-de50f8b7a71f
  namespace: openshift-virtualization-os-images
  resourceVersion: "111423"
  uid: fe316d3e-1300-4206-b44e-c9c628f85b94
spec:
  accessModes:
  - ReadWriteMany
  dataSource:
    apiGroup: null
    kind: PersistentVolumeClaim
    name: fedora-f7cc15256f08
  dataSourceRef:
    apiGroup: null
    kind: PersistentVolumeClaim
    name: fedora-f7cc15256f08
  resources:
    requests:
      storage: 149Gi
  storageClassName: ocs-storagecluster-ceph-rbd
  volumeMode: Filesystem
status:
  phase: Pending



When I used 'storage' api - clone succeeded

spec:
  dataVolumeTemplates:
  - apiVersion: cdi.kubevirt.io/v1beta1
    kind: DataVolume
    metadata:
      name: fedora
    spec:
      sourceRef:
        kind: DataSource
        name: fedora
        namespace: openshift-virtualization-os-images
      storage:
        resources:
          requests:
            storage: '34087042032'


DV:
instance-types-test-vm-override-pref     fedora        Succeeded          100.0%          5m26s

And target 'fedora' pvc is OCS, ReadWriteMany, Block

Comment 6 Roni Kishner 2023-08-06 18:04:13 UTC
Either adding the "volumeMode: Block" or changing the storage class to "hostpath-csi-basic" fixed it as mentioned by Jenia