Description of problem: * Created a 1Gi golden fedora pvc, which imported Fedora cloud qcow2 image. * Created a DataVolume, sized 1Gi, with source the PVC above, it failed with "no space left on device". * Increased the DataVolume size to 8Gi (just to make sure) and it worked. The cloning process seems to make the resulting file a non-sparse one, mounted both LVs on one of the cluster nodes to run qemu-img info on them and here are the results: [root@node02 ~]# qemu-img info /mnt/1/brick/disk.img image: /mnt/1/brick/disk.img file format: raw virtual size: 4.0G (4294967296 bytes) disk size: 778M [root@node02 ~]# qemu-img info /mnt/2/brick/disk.img image: /mnt/2/brick/disk.img file format: raw virtual size: 4.0G (4294967296 bytes) disk size: 4.0G Looking at the LVs, both belong to the same VG and doesn't seem to be a single difference in the settings: [root@node02 ~]# lvdisplay /dev/vg_729faf60b0a3b879b051db5abc4e94a1/brick_ec125589977d4a629c7020436615f4f6 --- Logical volume --- LV Path /dev/vg_729faf60b0a3b879b051db5abc4e94a1/brick_ec125589977d4a629c7020436615f4f6 LV Name brick_ec125589977d4a629c7020436615f4f6 VG Name vg_729faf60b0a3b879b051db5abc4e94a1 LV UUID 4YcvuU-Bxaq-ScDv-qpww-OIsF-J611-yan3oi LV Write Access read/write LV Creation host, time node02.example.com, 2018-12-12 06:35:54 -0500 LV Pool name tp_ec125589977d4a629c7020436615f4f6 LV Status available # open 1 LV Size 8.00 GiB Mapped size 50.17% Current LE 2048 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 252:19 [root@node02 ~]# lvdisplay /dev/vg_729faf60b0a3b879b051db5abc4e94a1/brick_0e488dded91b45787fa21a742b792753 --- Logical volume --- LV Path /dev/vg_729faf60b0a3b879b051db5abc4e94a1/brick_0e488dded91b45787fa21a742b792753 LV Name brick_0e488dded91b45787fa21a742b792753 VG Name vg_729faf60b0a3b879b051db5abc4e94a1 LV UUID 5b7lTN-iUav-Cxm1-t5Fp-5WLL-pqqL-ZktxPu LV Write Access read/write LV Creation host, time node02.example.com, 2018-12-11 12:02:38 -0500 LV Pool name tp_0e488dded91b45787fa21a742b792753 LV Status available # open 1 LV Size 1.00 GiB Mapped size 78.98% Current LE 256 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 252:14 Also, running xfs_info against the LVs show the same settings: [root@node02 ~]# xfs_info /dev/vg_729faf60b0a3b879b051db5abc4e94a1/brick_ec125589977d4a629c7020436615f4f6 meta-data=/dev/mapper/vg_729faf60b0a3b879b051db5abc4e94a1-brick_ec125589977d4a629c7020436615f4f6 isize=512 agcount=8, agsize=262144 blk = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=2097152, imaxpct=25 = sunit=64 swidth=64 blks naming =version 2 bsize=8192 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=64 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [root@node02 ~]# xfs_info /dev/vg_729faf60b0a3b879b051db5abc4e94a1/brick_0e488dded91b45787fa21a742b792753 meta-data=/dev/mapper/vg_729faf60b0a3b879b051db5abc4e94a1-brick_0e488dded91b45787fa21a742b792753 isize=512 agcount=8, agsize=32768 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=64 swidth=64 blks naming =version 2 bsize=8192 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=64 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 So the problem doesn't seem to be related to either filesystem or LVs. Version-Release number of selected component (if applicable): OCP: v3.11.43 CDI: v1.3.0 OCS images: registry.redhat.io/rhgs3/rhgs-server-rhel7 latest 53c83497482a2 431MB registry.redhat.io/rhgs3/rhgs-volmanager-rhel7 latest 64d4857090bd8 327MB How reproducible: Always. Steps to Reproduce: 1. 2. 3. Actual results: cloner pod ends in *Error* [root@workstation-ddc4 ~]# oc logs -f clone-target-pod-rbwlb -n 201dv cloner: Starting clone target cloner: check if the fifo pipe was created by the cloning source pod /tmp/clone/image / cloner: extract the image from /tmp/clone/socket/c662a4c2-fe01-11e8-8133-2cabcdef0010/pipe into /tmp/clone/image directory ./ ./disk.img tar: ./disk.img: Cannot write: No space left on device tar: ./disk.img: Cannot utime: No space left on device Expected results: Is this the expected result? Or the resulting file should be a sparse file as well? Additional info: * Golden PVC definition: apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: cdi.kubevirt.io/storage.import.endpoint: https://download.fedoraproject.org/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.qcow2 cdi.kubevirt.io/storage.import.importPodName: importer-golden-fedora-pvc-vdmmq cdi.kubevirt.io/storage.pod.phase: Succeeded kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{"cdi.kubevirt.io/storage.import.endpoint":"https://download.fedoraproject.org/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.qcow2"},"labels":{"app":"containerized-data-importer"},"name":"golden-fedora-pvc","namespace":"201dvsource"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}}}} pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/glusterfs creationTimestamp: 2018-12-11T17:02:38Z finalizers: - kubernetes.io/pvc-protection labels: app: containerized-data-importer name: golden-fedora-pvc namespace: 201dvsource resourceVersion: "33005" selfLink: /api/v1/namespaces/201dvsource/persistentvolumeclaims/golden-fedora-pvc uid: 904b11db-fd66-11e8-869d-2cabcdef0010 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: glusterfs-storage volumeName: pvc-904b11db-fd66-11e8-869d-2cabcdef0010 status: accessModes: - ReadWriteOnce capacity: storage: 1Gi phase: Bound * DataVolume definition apiVersion: cdi.kubevirt.io/v1alpha1 kind: DataVolume metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"cdi.kubevirt.io/v1alpha1","kind":"DataVolume","metadata":{"annotations":{},"name":"example-pvc-dv","namespace":"201dv"},"spec":{"pvc":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"8Gi"}}},"source":{"pvc":{"name":"golden-fedora-pvc","namespace":"201dvsource"}}}} creationTimestamp: 2018-12-12T11:35:54Z generation: 1 name: example-pvc-dv namespace: 201dv resourceVersion: "55568" selfLink: /apis/cdi.kubevirt.io/v1alpha1/namespaces/201dv/datavolumes/example-pvc-dv uid: 15d83cc9-fe02-11e8-8133-2cabcdef0010 spec: pvc: accessModes: - ReadWriteOnce resources: requests: storage: 8Gi source: pvc: name: golden-fedora-pvc namespace: 201dvsource status: phase: Succeeded NOTE: This is the DV that succeded, the original had `requests.storage: 1Gi`.
I've run some tests locally after seeing the cloner script uses `tar cv` to send the file a named pipe and I think that's the problem here, please see the example below: * Test #1 - tar cv: ⏚ [sjr:~/Downloads] % qemu-img info Fedora-Cloud-Base-29-1.2.x86_64.raw image: Fedora-Cloud-Base-29-1.2.x86_64.raw file format: raw virtual size: 4.0G (4294967296 bytes) disk size: 778M ⏚ [sjr:~/Downloads] % mkfifo /tmp/testfifo ⏚ [sjr:~/Downloads] % tar cv Fedora-Cloud-Base-29-1.2.x86_64.raw > /tmp/testfifo Fedora-Cloud-Base-29-1.2.x86_64.raw * Results #1 - tar cv: ⏚ [sjr:~/Downloads/tmp] 4s 2 % cat /tmp/testfifo| tar xv Fedora-Cloud-Base-29-1.2.x86_64.raw ⏚ [sjr:~/Downloads/tmp] 10s % qemu-img info Fedora-Cloud-Base-29-1.2.x86_64.raw image: Fedora-Cloud-Base-29-1.2.x86_64.raw file format: raw virtual size: 4.0G (4294967296 bytes) disk size: 4.0G * Test #2 - tar cv --sparse: ⏚ [sjr:~/Downloads] 14s % tar cv --sparse Fedora-Cloud-Base-29-1.2.x86_64.raw > /tmp/testfifo Fedora-Cloud-Base-29-1.2.x86_64.raw * Results #2 - tar cv --sparse: ⏚ [sjr:~/Downloads/tmp2] % cat /tmp/testfifo| tar xv Fedora-Cloud-Base-29-1.2.x86_64.raw ⏚ [sjr:~/Downloads/tmp2] 11s % qemu-img info Fedora-Cloud-Base-29-1.2.x86_64.raw image: Fedora-Cloud-Base-29-1.2.x86_64.raw file format: raw virtual size: 4.0G (4294967296 bytes) disk size: 778M Looking at the GNU tar manual [0], it looks quite safe to add this parameter, I'd be willing to send the PR myself if it looks like the right solution, thoughts? [0] https://www.gnu.org/software/tar/manual/html_node/sparse.html
*** Bug 1658614 has been marked as a duplicate of this bug. ***
Yes. The --sparse option would be a good idea. I'd love it if you could submit the PR.
Just posted the PR upstream: https://github.com/kubevirt/containerized-data-importer/pull/575
The PR https://github.com/kubevirt/containerized-data-importer/pull/575 has been merged. Thanks Sergi!
Tested with openshift penshift v3.11.59, virt-cdi-importer:v1.4.0-6 1) I used the image you provided https://download.fedoraproject.org/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.qcow2 file format: raw virtual size: 4.0G (4294967296 bytes) disk size: 707M and set PVC and DV storage to 1Gi. The importer finished without throwing storage limitation messages such as "no space", and it didn't resize the image to available space. But finally, no VM created. See: http://pastebin.test.redhat.com/706431 2) I also tried image http://fileshare.englab.nay.redhat.com/pub/libra/mnt/qa/scratch/OpenShift_QE/CleanImage/Fedora-Live-Workstation-x86_64-21-5.qcow2: file format: raw virtual size: 20G (21474836480 bytes) disk size: 889M The importer was in retry-loop, and I can't capture any storage limitation messages. Then I expand DataVolume to 21Gi(my cluster‘s upper limit). Re-created the DV generated PVC and I can see "no space" in the PVC event and then bound successfully, and the import progress went further. Unfortunately, it still in retry-loop and I never saw its finish. My questions: 1) Now the image is a sparse image, so during creation, it took up only as much actual disk space as the data contained (707M), 1Gi PVC is sufficient. In this sense, the bug has been fixed. 2) Perhaps it's an environmental issue, like https://bugzilla.redhat.com/show_bug.cgi?id=1662099 Make sense?
Qixuan, the bug, at least when I reported it, it was affecting the cloner (image/process), but not the importer and the PR applies only to the cloner as well. Failures on the importer should not be related, I think. The importer, IIRC, was doing its job well when importing an sparse image, but the problem was on the cloning process as it was using tar without the --sparse option. Does that help?
Thanks, Sergi, make sense. Tested with virt-cdi-importer:v1.4.0-8 I used test image https://dl.fedoraproject.org/pub/fedora/linux/releases/27/CloudImages/x86_64/images/Fedora-Cloud-Base-27-1.6.x86_64.qcow2 https://raw.githubusercontent.com/qwang1/cnv-test-file/master/golden-pvc.yaml https://raw.githubusercontent.com/qwang1/cnv-test-file/master/example-clone-dv.yaml [root@cnv-executor-qwang-1-master1 ~]# qemu-img info Fedora-Cloud-Base-27-1.6.x86_64.qcow2 image: Fedora-Cloud-Base-27-1.6.x86_64.qcow2 file format: qcow2 virtual size: 4.0G (4294967296 bytes) disk size: 222M cluster_size: 65536 Format specific information: compat: 0.10 CDI importer behavior changed recently. It checks if target size < virtual size, reject import. Here the qcow2 image was 222M (ActualSize), VM read 4.0Gi (VirtualSize), correct? @awels request.storage should >= 4.0G. Otherwise, I would get an error message: "Virtual image size 4294967296 is larger than available size 1018417152, shrink not yet supported." So I set requests.storage to 5Gi on golden-pvc.yaml. After importing completed, I used example-clone-dv.yaml whose requests.storage was 1Gi. If the image was sparsed, 1Gi would be sufficient. [root@cnv-executor-qwang-1-master1 ~]# oc get dv NAME AGE example-clone-dv 3s [root@cnv-executor-qwang-1-master1 ~]# oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE example-clone-dv Bound pvc-1f1532f1-30f6-11e9-8e8c-fa163eecabce 1Gi RWO glusterfs-storage 15s golden-pvc Bound pvc-8b0af6ed-30ef-11e9-8e8c-fa163eecabce 5Gi RWO glusterfs-storage 47m [root@cnv-executor-qwang-1-master1 ~]# oc get all NAME READY STATUS RESTARTS AGE pod/clone-source-pod-s96bw 1/1 Running 0 10s pod/clone-target-pod-rflmq 1/1 Running 0 10s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/glusterfs-dynamic-1f1532f1-30f6-11e9-8e8c-fa163eecabce ClusterIP 172.30.93.216 <none> 1/TCP 10s service/glusterfs-dynamic-8b0af6ed-30ef-11e9-8e8c-fa163eecabce ClusterIP 172.30.104.122 <none> 1/TCP 47m [root@cnv-executor-qwang-1-master1 ~]# oc logs -f pod/clone-source-pod-7g77q cloner: Starting clone source cloner: creating fifo pipe cloner: creating tarball of the image and redirecting it to /tmp/clone/socket/65c1e0fd-30f8-11e9-8e8c-fa163eecabce/pipe /tmp/clone/image / 5270437888 ./ ./disk.img / cloner: finished writing image to /tmp/clone/socket/65c1e0fd-30f8-11e9-8e8c-fa163eecabce/pipe [root@cnv-executor-qwang-1-master1 ~]# oc logs -f pod/clone-target-pod-tl7sf cloner: Starting clone target cloner: check if the fifo pipe was created by the cloning source pod cloner: 0: fifo pipe has not been created by the source pod. Waiting 3 seconds before checking again... cloner: check if the fifo pipe was created by the cloning source pod /tmp/clone/image / cloner: extract the image from /tmp/clone/socket/65c1e0fd-30f8-11e9-8e8c-fa163eecabce/pipe into /tmp/clone/image directory I0215 08:05:26.340401 8 clonertarget.go:53] Starting cloner target I0215 08:05:27.287010 8 clonertarget.go:102] Reading total size I0215 08:05:27.287139 8 clonertarget.go:144] total size: 5270437888 I0215 08:05:27.288033 8 util.go:97] begin untar... I0215 08:05:28.288366 8 clonertarget.go:127] 0.00 I0215 08:05:29.288524 8 clonertarget.go:127] 0.00 I0215 08:05:30.288685 8 clonertarget.go:127] 0.00 I0215 08:05:31.288801 8 clonertarget.go:127] 0.00 I0215 08:05:32.288994 8 clonertarget.go:127] 0.00 I0215 08:05:33.289126 8 clonertarget.go:127] 0.00 I0215 08:05:34.289256 8 clonertarget.go:127] 0.00 I0215 08:05:35.289385 8 clonertarget.go:127] 0.00 I0215 08:05:36.289516 8 clonertarget.go:127] 0.00 I0215 08:05:37.289750 8 clonertarget.go:127] 0.00 I0215 08:05:38.289916 8 clonertarget.go:127] 0.00 I0215 08:05:39.290044 8 clonertarget.go:127] 0.00 I0215 08:05:40.290213 8 clonertarget.go:127] 0.00 I0215 08:05:41.290428 8 clonertarget.go:127] 0.00 I0215 08:05:42.290567 8 clonertarget.go:127] 0.00 I0215 08:05:43.290787 8 clonertarget.go:127] 0.00 I0215 08:05:44.290949 8 clonertarget.go:127] 0.00 I0215 08:05:45.291809 8 clonertarget.go:127] 0.00 I0215 08:05:46.291948 8 clonertarget.go:127] 0.00 I0215 08:05:47.292076 8 clonertarget.go:127] 0.00 I0215 08:05:48.292229 8 clonertarget.go:127] 0.00 I0215 08:05:49.292358 8 clonertarget.go:127] 0.00 I0215 08:05:50.292484 8 clonertarget.go:127] 0.00 I0215 08:05:51.292628 8 clonertarget.go:127] 0.10 I0215 08:05:52.292795 8 clonertarget.go:127] 0.23 I0215 08:05:53.292930 8 clonertarget.go:127] 0.40 I0215 08:05:54.293059 8 clonertarget.go:127] 0.57 I0215 08:05:55.293186 8 clonertarget.go:127] 0.74 I0215 08:05:56.293332 8 clonertarget.go:127] 0.92 I0215 08:05:57.293458 8 clonertarget.go:127] 1.12 I0215 08:05:58.293610 8 clonertarget.go:127] 1.28 I0215 08:05:59.293786 8 clonertarget.go:127] 1.45 I0215 08:06:00.293934 8 clonertarget.go:127] 1.61 I0215 08:06:01.294053 8 clonertarget.go:127] 1.77 I0215 08:06:02.294189 8 clonertarget.go:127] 1.92 I0215 08:06:03.294343 8 clonertarget.go:127] 2.13 I0215 08:06:04.294495 8 clonertarget.go:127] 2.33 I0215 08:06:05.294619 8 clonertarget.go:127] 2.52 I0215 08:06:06.294803 8 clonertarget.go:127] 2.71 I0215 08:06:07.294936 8 clonertarget.go:127] 2.86 I0215 08:06:08.295104 8 clonertarget.go:127] 3.00 I0215 08:06:09.295264 8 clonertarget.go:127] 3.12 I0215 08:06:10.295421 8 clonertarget.go:127] 3.25 I0215 08:06:11.295574 8 clonertarget.go:127] 3.40 I0215 08:06:12.295701 8 clonertarget.go:127] 3.60 I0215 08:06:13.295844 8 clonertarget.go:127] 3.72 I0215 08:06:14.295970 8 clonertarget.go:127] 3.88 I0215 08:06:15.296087 8 clonertarget.go:127] 4.00 I0215 08:06:16.296229 8 clonertarget.go:127] 4.19 I0215 08:06:17.296390 8 clonertarget.go:127] 4.34 I0215 08:06:18.296562 8 clonertarget.go:127] 4.34 I0215 08:06:19.296753 8 clonertarget.go:127] 4.52 I0215 08:06:20.296888 8 clonertarget.go:127] 4.73 I0215 08:06:21.297016 8 clonertarget.go:127] 4.91 I0215 08:06:22.297140 8 clonertarget.go:127] 5.09 I0215 08:06:23.297319 8 clonertarget.go:127] 5.24 I0215 08:06:24.297464 8 clonertarget.go:127] 5.26 I0215 08:06:25.297609 8 clonertarget.go:127] 5.45 I0215 08:06:26.297755 8 clonertarget.go:127] 5.63 I0215 08:06:27.297881 8 clonertarget.go:127] 5.81 I0215 08:06:28.298059 8 clonertarget.go:127] 5.98 I0215 08:06:29.298227 8 clonertarget.go:127] 6.18 I0215 08:06:30.298361 8 clonertarget.go:127] 6.37 I0215 08:06:31.298494 8 clonertarget.go:127] 6.55 I0215 08:06:32.298623 8 clonertarget.go:127] 6.73 I0215 08:06:33.298760 8 clonertarget.go:127] 6.90 I0215 08:06:34.298907 8 clonertarget.go:127] 7.08 I0215 08:06:35.299053 8 clonertarget.go:127] 7.26 I0215 08:06:36.299179 8 clonertarget.go:127] 7.44 I0215 08:06:37.299345 8 clonertarget.go:127] 7.61 I0215 08:06:38.299489 8 clonertarget.go:127] 7.80 I0215 08:06:39.299967 8 clonertarget.go:127] 7.99 I0215 08:06:40.300124 8 clonertarget.go:127] 8.19 I0215 08:06:41.300232 8 clonertarget.go:127] 8.40 I0215 08:06:42.300401 8 clonertarget.go:127] 8.61 I0215 08:06:43.300600 8 clonertarget.go:127] 8.82 I0215 08:06:44.300834 8 clonertarget.go:127] 9.03 I0215 08:06:45.300981 8 clonertarget.go:127] 9.22 I0215 08:06:46.301105 8 clonertarget.go:127] 9.39 I0215 08:06:47.301823 8 clonertarget.go:127] 9.56 I0215 08:06:48.301949 8 clonertarget.go:127] 9.73 I0215 08:06:49.302086 8 clonertarget.go:127] 9.92 I0215 08:06:50.302248 8 clonertarget.go:127] 10.09 I0215 08:06:51.302403 8 clonertarget.go:127] 10.24 I0215 08:06:52.302536 8 clonertarget.go:127] 10.24 I0215 08:06:53.302725 8 clonertarget.go:127] 10.24 I0215 08:06:54.302851 8 clonertarget.go:127] 10.24 I0215 08:06:54.384040 8 clonertarget.go:98] clone complete / cloner: finished cloning image from /tmp/clone/socket/65c1e0fd-30f8-11e9-8e8c-fa163eecabce/pipe to /tmp/clone/image If the clone image was a non-sparse image, 1Gi couldn't satisfy it, so I think this bug has been fixed. Meanwhile, some questions came up: 1) The image was imported to PVC, where can I check it? Searched "disk.img" on all of my nodes and got nothing. 2) The above clone progress was truly complete? It was wired that jumped from 10% to complete immediately. 3) Where was the cloned image? There was not a directory called /tmp/clone/image.