Bug 2227100

Summary: Cannot create a VM from template after creating a VolumeSnapshotClass
Product: Container Native Virtualization (CNV) Reporter: joherr
Component: StorageAssignee: Arnon Gilboa <agilboa>
Status: CLOSED DUPLICATE QA Contact: Natalie Gavrielov <ngavrilo>
Severity: medium Docs Contact:
Priority: high    
Version: 4.13.2   
Target Milestone: ---   
Target Release: 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.14.0.rhel9-1387 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-05 18:34:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description joherr 2023-07-27 21:29:13 UTC
Description of problem:
When using the Trident provisioner, I cannot create a VM from template using a bootsource after creating a VolumeSnapshotClass. Creating a VM works fine before creating the VolumeSnapshotClass.

If I add or set "allowVolumeExpansion: true" in the Trident storage class, it works. But there are no indications in the UI or DV status that this is the issue.


Version-Release number of selected component (if applicable):
OpenShift Virtualization 4.13.2
Trident Operator 23.04.0


How reproducible:
Consistent

Steps to Reproduce:
1.Install the Trident Operator
2.Create a storageclass for Trident without allowVolumeExpansion set to true.
~~~
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: trident-cnvi-svm
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
  backendType: "ontap-nas"
~~~

3. Create a RHEL 9 VM from template and boot source. This works fine.

4. Create a VolumeSnapshotClass for Trident.
~~~
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: trident-snapclass
driver: csi.trident.netapp.io
deletionPolicy: Delete
~~~

5. Try to create another RHEL 9 VM from template and boot source. This will never complete. The DV Phase stays in "SnapshotForSmartCloneInProgress"
~~~
...
  status:
    conditions:
    - lastHeartbeatTime: "2023-07-27T21:18:04Z"
      lastTransitionTime: "2023-07-27T21:18:03Z"
      message: No PVC found
      reason: NotFound
      status: Unknown
      type: Bound
    - lastHeartbeatTime: "2023-07-27T21:18:04Z"
      lastTransitionTime: "2023-07-27T21:18:03Z"
      status: "False"
      type: Ready
    - lastHeartbeatTime: "2023-07-27T21:18:03Z"
      lastTransitionTime: "2023-07-27T21:18:03Z"
      status: "False"
      type: Running
    phase: SnapshotForSmartCloneInProgress
~~~

The cdi-deployment pod shows log messages like such:
~~~
{"level":"error","ts":1690492689.26244,"logger":"controller.datavolume-pvc-clone-controller","msg":"Reconciler error","name":"rhel9-after-vsc","namespace":"winding","error":"persistentvolumeclaims \"cdi-tmp-1434dc2c-c012-406b-afc5-438912628dcf\" is forbidden: only dynamically provisioned pvc can be resized and the storageclass that provisions the pvc must support resize","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
~~~

6. Remove the StorageClass and recreate it with allowVolumeExpansion set to true
~~~
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: trident-cnvi-svm
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
parameters:
  backendType: "ontap-nas"
~~~

7. Create a RHEL 9 VM from template and bootsource. This will now work (rather quickly).


Actual results:


Expected results:
I would expect a notification in the UI or DV that this was the issue.


Additional info:

Comment 1 Adam Litke 2023-08-04 18:53:47 UTC
In this case we should be reverting to host-assisted cloning with a reason that the storage class does not support volume expansion.  Arnon, let's take care to fix the basic bug in 4.14 and also make sure the new host-assisted reason code works for this scenario in 4.15.  I believe you made a change to prefer csi-clone as the smart cloning strategy so let's make sure to test both csi-clone and the snapshot-based clone strategy.

Comment 2 Arnon Gilboa 2023-09-05 18:33:14 UTC
This is a clone of bz #2227013 which was fixed in v4.14.0.rhel9-1387.

For csi.trident.netapp.io we prefer snapshot as the smart cloning strategy, as it has VolumeSnapshotClass, and the provisioner is not explicitly listed as csi-clone in the clone strategy table.

However, in the above scenario we fallback to host-assisted cloning, so in the PVC events we will see:
  Type     Reason                Age   From              Message
  ----     ------                ----  ----              -------
  Warning  NoVolumeExpansion     71s   clone-populator   No volume expansion is possible

And in the PVC annotations:
               cdi.kubevirt.io/cloneFallbackReason: No volume expansion is possible
               cdi.kubevirt.io/clonePhase: Succeeded
               cdi.kubevirt.io/cloneType: copy  <--- host-assisted cloning

The fallback annotation and event are already covered in several functional tests.

Comment 3 Arnon Gilboa 2023-09-05 18:34:40 UTC

*** This bug has been marked as a duplicate of bug 2227013 ***