Bug 2033191 - Cloning a Block DV to VM with Filesystem with not big enough size comes to endless loop - using pvc api
Summary: Cloning a Block DV to VM with Filesystem with not big enough size comes to en...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.1
Assignee: Maya Rashish
QA Contact: Jenia Peimer
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-16 07:34 UTC by Jenia Peimer
Modified: 2022-12-01 21:10 UTC (History)
6 users (show)

Fixed In Version: v4.11.1-20
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-01 21:10:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt containerized-data-importer pull 2299 0 None Merged Clone fs to block fails on size validation 2022-08-02 15:59:48 UTC
Github kubevirt containerized-data-importer pull 2336 0 None Merged Manual cherry pick 2299 to release v1.49 2022-08-02 15:59:48 UTC
Github kubevirt containerized-data-importer pull 2368 0 None Merged Fix failure to resize when the request image size is specified in milli- scale. 2022-08-15 08:19:41 UTC
Red Hat Issue Tracker CNV-15431 0 None None None 2022-11-30 14:27:37 UTC

Description Jenia Peimer 2021-12-16 07:34:29 UTC
Created attachment 1846503 [details]
full-log-and-o-yamls-of-pvc-and-vm

Created attachment 1846503 [details]
full-log-and-o-yamls-of-pvc-and-vm

Created attachment 1846503 [details]
full-log-and-o-yamls-of-pvc-and-vm

Description of problem:
Cloning a Block DV to VM with Filesystem with not big enough size comes to endless loop - using pvc api

Version-Release number of selected component (if applicable):
4.10, 4.9, 4.8

How reproducible:
Always

Steps to Reproduce:
1. Create DV with Block volume mode, size: 30Gi.
2. Create a VM with Filesystem volume mode from this DV, size: 30Gi.
3. See that VM's DV import restarts over and over, with no clear error message (except in the log).

Actual results:
The endless loop of attempts to clone

Expected results:
If the target size is too small (considering overhead), 
we should either get a clear error message before starting to clone/import or 
resize requested pvc size.

Additional info:
This doesn't happen when using 'storage' api for target vm:
  dataVolumeTemplates:
    spec:
      storage:

Source dv and target vm yamls:
$ cat dv-src.yaml 
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata:
 namespace: "openshift-virtualization-os-images"
 name: cirros-dv
spec:
 source:
     http:
        url: "http://.../rhel-images/rhel-83.qcow2"
        secretRef: ""  
 pvc:
   accessModes:
   - ReadWriteOnce
   resources:
     requests:
       storage: 30Gi
   storageClassName: ocs-storagecluster-ceph-rbd
   volumeMode: Block

$ cat vm-dv-clone-30.yaml 
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-dv-clone
  name: vm-dv-clone
spec:
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-dv-clone
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: root-disk
        resources:
          requests:
            memory: 64M
      volumes:
      - dataVolume:
          name: target-dv-vm-clone
        name: root-disk
  dataVolumeTemplates:
  - metadata:
      name: target-dv-vm-clone
    spec:
      pvc:
        storageClassName: hostpath-provisioner
        volumeMode: Filesystem
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 30Gi
      source:
        pvc:
          namespace: "openshift-virtualization-os-images"
          name: "cirros-dv"

$ oc get pvc -A
NAMESPACE                            NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
default                              target-dv-vm-clone            Bound    pvc-243763c7-e0e4-45bc-903c-853525139eb3   68Gi       RWO            hostpath-provisioner          14m
openshift-storage                    ocs-deviceset-0-data-0cb4kz   Bound    local-pv-c63a60c0                          70Gi       RWO            local-block                   10h
openshift-storage                    ocs-deviceset-0-data-1cxt7d   Bound    local-pv-a82afe98                          70Gi       RWO            local-block                   10h
openshift-storage                    ocs-deviceset-0-data-2vvt2x   Bound    local-pv-f3ca87ab                          70Gi       RWO            local-block                   10h
openshift-virtualization-os-images   cirros-dv                     Bound    pvc-87536c65-b6dd-4898-a5dc-dca020b5d74c   30Gi       RWO            ocs-storagecluster-ceph-rbd   17m

$ oc logs cdi-upload-target-dv-vm-clone 
I1216 07:01:31.857162       1 uploadserver.go:69] Running server on 0.0.0.0:8443
I1216 07:01:51.261542       1 uploadserver.go:374] Content type header is "blockdevice-clone"
I1216 07:01:51.261636       1 data-processor.go:340] Calculating available size
I1216 07:01:51.261767       1 data-processor.go:352] Checking out file system volume size.
I1216 07:01:51.261803       1 data-processor.go:360] Request image size not empty.
I1216 07:01:51.261828       1 data-processor.go:365] Target size 30Gi.
I1216 07:01:51.262002       1 data-processor.go:243] New phase: TransferDataFile
I1216 07:01:51.262161       1 util.go:174] Writing data...
I1216 07:04:57.516880       1 data-processor.go:243] New phase: Resize
W1216 07:04:57.692691       1 data-processor.go:322] Available space less than requested size, resizing image to available space 30440161280.
I1216 07:04:57.692708       1 data-processor.go:330] Calculated new size is < than current size, not resizing: requested size 30440161280, virtual size: 32212254720.
I1216 07:04:57.692719       1 data-processor.go:249] Validating image
E1216 07:04:57.895010       1 data-processor.go:240] Virtual image size 32212254720 is larger than available size 30440580710 (PVC size 32212254720, reserved overhead 0.055000%). A larger PVC is required.
Unable to resize disk image to requested size
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:229
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:172
kubevirt.io/containerized-data-importer/pkg/uploadserver.newUploadStreamProcessor
	/remote-source/app/pkg/uploadserver/uploadserver.go:439
kubevirt.io/containerized-data-importer/pkg/uploadserver.(*uploadServerApp).processUpload
	/remote-source/app/pkg/uploadserver/uploadserver.go:381
kubevirt.io/containerized-data-importer/pkg/uploadserver.(*uploadServerApp).uploadHandler.func1
	/remote-source/app/pkg/uploadserver/uploadserver.go:407
net/http.HandlerFunc.ServeHTTP
	/usr/lib/golang/src/net/http/server.go:2049
net/http.(*ServeMux).ServeHTTP
	/usr/lib/golang/src/net/http/server.go:2428
kubevirt.io/containerized-data-importer/pkg/uploadserver.(*uploadServerApp).ServeHTTP
	/remote-source/app/pkg/uploadserver/uploadserver.go:256
net/http.serverHandler.ServeHTTP
	/usr/lib/golang/src/net/http/server.go:2867
net/http.(*conn).serve
	/usr/lib/golang/src/net/http/server.go:1932
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1371
E1216 07:04:57.895161       1 uploadserver.go:387] Saving stream failed: Unable to resize disk image to requested size: Virtual image size 32212254720 is larger than available size 30440580710 (PVC size 32212254720, reserved overhead 0.055000%). A larger PVC is required.
I1216 07:04:59.676727       1 uploadserver.go:374] Content type header is "blockdevice-clone"
I1216 07:04:59.676820       1 data-processor.go:340] Calculating available size
E1216 07:04:59.678141       1 data-processor.go:344] exit status 1, blockdev: ioctl error on BLKGETSIZE64: Inappropriate ioctl for device
I1216 07:04:59.678160       1 data-processor.go:352] Checking out file system volume size.
I1216 07:04:59.678172       1 data-processor.go:360] Request image size not empty.
I1216 07:04:59.678181       1 data-processor.go:365] Target size 30Gi.
I1216 07:04:59.678275       1 util.go:39] deleting file: /data/disk.img
I1216 07:04:59.899644       1 data-processor.go:243] New phase: TransferDataFile
I1216 07:04:59.899811       1 util.go:174] Writing data...
I1216 07:07:36.388073       1 data-processor.go:243] New phase: Resize
W1216 07:07:36.574707       1 data-processor.go:322] Available space less than requested size, resizing image to available space 30440161280.
I1216 07:07:36.574733       1 data-processor.go:330] Calculated new size is < than current size, not resizing: requested size 30440161280, virtual size: 32212254720.
I1216 07:07:36.574743       1 data-processor.go:249] Validating image
E1216 07:07:36.767359       1 data-processor.go:240] Virtual image size 32212254720 is larger than available size 30440580710 (PVC size 32212254720, reserved overhead 0.055000%). A larger PVC is required.
Unable to resize disk image to requested size
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:229
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:172
kubevirt.io/containerized-data-importer/pkg/uploadserver.newUploadStreamProcessor
	/remote-source/app/pkg/uploadserver/uploadserver.go:439
kubevirt.io/containerized-data-importer/pkg/uploadserver.(*uploadServerApp).processUpload
	/remote-source/app/pkg/uploadserver/uploadserver.go:381
kubevirt.io/containerized-data-importer/pkg/uploadserver.(*uploadServerApp).uploadHandler.func1
	/remote-source/app/pkg/uploadserver/uploadserver.go:407
net/http.HandlerFunc.ServeHTTP
	/usr/lib/golang/src/net/http/server.go:2049
net/http.(*ServeMux).ServeHTTP
	/usr/lib/golang/src/net/http/server.go:2428
kubevirt.io/containerized-data-importer/pkg/uploadserver.(*uploadServerApp).ServeHTTP
	/remote-source/app/pkg/uploadserver/uploadserver.go:256
net/http.serverHandler.ServeHTTP
	/usr/lib/golang/src/net/http/server.go:2867
net/http.(*conn).serve
	/usr/lib/golang/src/net/http/server.go:1932
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1371


P.S. It would be great to edit the message in the log "reserved overhead 0.055000%" so it'll be either 5.5% or 0.055000 (without %)

Comment 1 Jenia Peimer 2021-12-16 07:48:20 UTC
Created attachment 1846504 [details]
full-log-and-o-yamls-of-pvc-and-vm

Comment 3 Yan Du 2021-12-22 13:25:09 UTC
Hi, Maya, could you please take a look? Maybe the DV condition could have clear messages, so the user will know how to fix it

Comment 4 Jenia Peimer 2022-02-01 12:48:16 UTC
As @brybacki suggested,
reproduced on CNV v4.10.0-641, but no relevant events yet:

$ oc get dv
NAME                          PHASE             PROGRESS   RESTARTS   AGE
target-dv-vm-clone            CloneInProgress   18.33%     2          12m

$ oc describe dv target-dv-vm-clone 
.....
  Phase:                   CloneInProgress
  Progress:                19.35%
  Restart Count:           2
Events:
  Type    Reason           Age   From                   Message
  ----    ------           ----  ----                   -------
  Normal  Pending          12m   datavolume-controller  PVC target-dv-vm-clone Pending
  Normal  Bound            12m   datavolume-controller  PVC target-dv-vm-clone Bound
  Normal  CloneInProgress  12m   datavolume-controller  Cloning from openshift-virtualization-os-images/cirros-dv into default/target-dv-vm-clone in progress

Comment 5 Maya Rashish 2022-07-13 10:42:29 UTC
Bartosz, prompted by the opposite problem (failing to clone fs->block because of inexact validation), has already merged a change for this to the 4.11 branch.
Now when cloning block->fs with the exact same size the clone will not start (it's staying in CloneScheduled) and emits a warning event "target resources requests storage size is smaller than the source".

However, while trying to reproduce this bug, I uncovered a similar issue - if we clone to a slightly larger pvc, the size of the scratch space PVC is too small, resulting in similar "endless loop" behavior.

Comment 6 Maya Rashish 2022-08-07 09:43:49 UTC
Will backport this once 4.11.1 backports are possible.

Comment 7 Fuhui Yang 2022-10-20 10:08:51 UTC
Verified on version 4.11.1-20.

Steps to verify: 
1. Create a DV with Block volume mode, size: 30Gi.
2. Create a VM which source is the DV from step 1, using Filesystem volume mode, specifying size 30Gi and storageClassName hostpath-csi-basic.
3. Check the status of DV from step 2, and check CDI's log.

Results:
$ oc get dv
NAME                 PHASE            PROGRESS   RESTARTS   AGE
target-dv-vm-clone   CloneScheduled   N/A                   126m

$ oc get pod
No resources found in fuhui namespace.

$ oc describe dv
...
  Phase:                   CloneScheduled
  Progress:                N/A
Events:
  Type    Reason   Age   From                   Message
  ----    ------   ----  ----                   -------
  Normal  Pending  66s   datavolume-controller  PVC target-dv-vm-clone Pending
  Normal  Bound    65s   datavolume-controller  PVC target-dv-vm-clone Bound

$ oc logs cdi-deployment
...
{"level":"error", ... ,"msg":"Error validating clone spec, ignoring","PVC":"fuhui/target-dv-vm-clone","error":"target resources requests storage size is smaller than the source","errorVerbose":"target resources requests storage size is smaller than the source\n ...}

Comment 15 errata-xmlrpc 2022-12-01 21:10:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.11.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8750


Note You need to log in before you can comment on or make changes to this bug.