Bug 2101168

Summary: [4.10.z]VM restore PVC uses exact source PVC request size
Product: Container Native Virtualization (CNV) Reporter: skagan
Component: StorageAssignee: skagan
Status: CLOSED ERRATA QA Contact: Yan Du <yadu>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.10.3CC: akalenyu, alitke, cnv-qe-bugs, jcall, mrashish, pelauter, skagan, yadu
Target Milestone: ---   
Target Release: 4.10.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: CNV v4.10.3-6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2086825 Environment:
Last Closed: 2022-07-20 16:06:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2086825    
Bug Blocks:    

Description skagan 2022-06-26 07:25:35 UTC
+++ This bug was initially created as a clone of Bug #2086825 +++

Description of problem:
VM restore PVC uses exact source PVC request size, discarding the possibility that status.capacity.storage (and thus volumesnapshot.status.restoreSize) is different


Version-Release number of selected component (if applicable):
CNV 4.11.0
CNV 4.10.*


How reproducible:
100%

Steps to Reproduce:
1. Attempt to restore a VM with PVC whose spec.storage != status.capacity (manifests below)

Actual results:
ceph csi driver does not allow srcSize != targetSize, restored PVC Pending:
  Warning  ProvisioningFailed    116s (x11 over 9m24s)   openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-7d96f8b4d5-s6lzj_dc0077ff-a0f2-4034-ae60-c6e15766ee54  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": error getting handle for DataSource Type VolumeSnapshot by Name vmsnapshot-74690574-f6a4-457c-8682-c830f74d5e0b-volume-dv-disk: requested volume size 8053063680 is less than the size 8589934592 for the source snapshot vmsnapshot-74690574-f6a4-457c-8682-c830f74d5e0b-volume-dv-disk

Expected results:
Success

Additional info:
Reproduced on OCS using 7.5Gi which gets round-up by the provisioner
[akalenyu@localhost manifests]$ oc get pvc simple-dv -o yaml | grep storage:
      storage: 7680Mi
    storage: 8Gi

[akalenyu@localhost manifests]$ cat dv.yaml 
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: simple-dv
  namespace: akalenyu
spec:
  source:
      http:
         url: "http://.../Fedora-Cloud-Base-34-1.2.x86_64.qcow2"
  pvc:
    storageClassName: ocs-storagecluster-ceph-rbd
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 7.5Gi
[akalenyu@localhost manifests]$ cat vm.yaml 
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: simple-vm
spec:
  running: true
  template:
    metadata:
      labels: {kubevirt.io/domain: simple-vm,
        kubevirt.io/vm: simple-vm}
    spec:
      domain:
        devices:
          disks:
          - disk: {bus: virtio}
            name: dv-disk
          - disk: {bus: virtio}
            name: cloudinitdisk
        resources:
          requests: {memory: 2048M}
      volumes:
      - dataVolume: {name: simple-dv}
        name: dv-disk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            password: fedora
            chpasswd: { expire: False }
        name: cloudinitdisk
[akalenyu@localhost manifests]$ cat snapshot.yaml 
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
  name: snap-simple-vm
spec:
  source:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: simple-vm
[akalenyu@localhost manifests]$ cat restore.yaml 
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineRestore
metadata:
  name: restore-simple-vm
spec:
  target:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: simple-vm
  virtualMachineSnapshotName: snap-simple-vm

--- Additional comment from Adam Litke on 2022-05-18 20:16:46 UTC ---

Shelly, is this the same as https://bugzilla.redhat.com/show_bug.cgi?id=2021354 ?

--- Additional comment from  on 2022-05-19 08:41:15 UTC ---

Adam if you recall I asked you what should be done with this issue. The bug you mentioned is a bug that was caused by a bug in the smart clone. It uncovered a possible scenario of restoring a pvc that the user requested a size X, got a bigger actual size Y, so he wrote more then X. That in certain provisioners can cause this bug in the VM restore. The bug you mentioned was cause by the bug in the smart clone that by an error got to a similar scenario, thats why me and Alex talked about opening a new bugzilla.

--- Additional comment from Adam Litke on 2022-06-17 14:39:51 UTC ---

Increasing severity and requesting blocker? because this issue will prevent VM Restore in almost all cases when using a Filesystem mode PV due to FS overhead causing the PVC size to be larger than requested.  Peter, can you set blocker+?

Comment 2 Yan Du 2022-07-06 06:44:40 UTC
Test on CNV v4.10.3-16, issue has been fixed.

  Normal  ProvisioningSucceeded  9m17s  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-668654579b-rbplg_e433fec3-74cf-4a9d-95ad-09474014add3  Successfully provisioned volume pvc-396044c4-be15-4015-b806-56468fb6ad55
  Normal  ImportSucceeded        8m14s  import-controller                                                                                                   Import Successful

  ----    ------                         ----             ----                -------
  Normal  VirtualMachineRestoreComplete  5s (x3 over 5s)  restore-controller  Successfully completed VirtualMachineRestore restore-simple-vm

Comment 10 errata-xmlrpc 2022-07-20 16:06:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.10.3 RPMs), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:5674