Bug 2227100

Summary: Cannot create a VM from template after creating a VolumeSnapshotClass
Product: Container Native Virtualization (CNV) Reporter: joherr
Component: StorageAssignee: Arnon Gilboa <agilboa>
Status: ASSIGNED --- QA Contact: Natalie Gavrielov <ngavrilo>
Severity: medium Docs Contact:
Priority: high    
Version: 4.13.2   
Target Milestone: ---   
Target Release: 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description joherr 2023-07-27 21:29:13 UTC
Description of problem:
When using the Trident provisioner, I cannot create a VM from template using a bootsource after creating a VolumeSnapshotClass. Creating a VM works fine before creating the VolumeSnapshotClass.

If I add or set "allowVolumeExpansion: true" in the Trident storage class, it works. But there are no indications in the UI or DV status that this is the issue.


Version-Release number of selected component (if applicable):
OpenShift Virtualization 4.13.2
Trident Operator 23.04.0


How reproducible:
Consistent

Steps to Reproduce:
1.Install the Trident Operator
2.Create a storageclass for Trident without allowVolumeExpansion set to true.
~~~
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: trident-cnvi-svm
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
  backendType: "ontap-nas"
~~~

3. Create a RHEL 9 VM from template and boot source. This works fine.

4. Create a VolumeSnapshotClass for Trident.
~~~
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: trident-snapclass
driver: csi.trident.netapp.io
deletionPolicy: Delete
~~~

5. Try to create another RHEL 9 VM from template and boot source. This will never complete. The DV Phase stays in "SnapshotForSmartCloneInProgress"
~~~
...
  status:
    conditions:
    - lastHeartbeatTime: "2023-07-27T21:18:04Z"
      lastTransitionTime: "2023-07-27T21:18:03Z"
      message: No PVC found
      reason: NotFound
      status: Unknown
      type: Bound
    - lastHeartbeatTime: "2023-07-27T21:18:04Z"
      lastTransitionTime: "2023-07-27T21:18:03Z"
      status: "False"
      type: Ready
    - lastHeartbeatTime: "2023-07-27T21:18:03Z"
      lastTransitionTime: "2023-07-27T21:18:03Z"
      status: "False"
      type: Running
    phase: SnapshotForSmartCloneInProgress
~~~

The cdi-deployment pod shows log messages like such:
~~~
{"level":"error","ts":1690492689.26244,"logger":"controller.datavolume-pvc-clone-controller","msg":"Reconciler error","name":"rhel9-after-vsc","namespace":"winding","error":"persistentvolumeclaims \"cdi-tmp-1434dc2c-c012-406b-afc5-438912628dcf\" is forbidden: only dynamically provisioned pvc can be resized and the storageclass that provisions the pvc must support resize","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
~~~

6. Remove the StorageClass and recreate it with allowVolumeExpansion set to true
~~~
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: trident-cnvi-svm
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.trident.netapp.io
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
parameters:
  backendType: "ontap-nas"
~~~

7. Create a RHEL 9 VM from template and bootsource. This will now work (rather quickly).


Actual results:


Expected results:
I would expect a notification in the UI or DV that this was the issue.


Additional info:

Comment 1 Adam Litke 2023-08-04 18:53:47 UTC
In this case we should be reverting to host-assisted cloning with a reason that the storage class does not support volume expansion.  Arnon, let's take care to fix the basic bug in 4.14 and also make sure the new host-assisted reason code works for this scenario in 4.15.  I believe you made a change to prefer csi-clone as the smart cloning strategy so let's make sure to test both csi-clone and the snapshot-based clone strategy.