Bug 1679134

Summary:	Error uploading Windows image with virtctl tool
Product:	Container Native Virtualization (CNV)	Reporter:	Juan Manuel Parrilla Madrid <jparrill>
Component:	Storage	Assignee:	Irit Goihman <igoihman>
Status:	CLOSED ERRATA	QA Contact:	Qixuan Wang <qixuan.wang>
Severity:	high	Docs Contact:
Priority:	high
Version:	1.4	CC:	alitke, bschmaus, cnv-qe-bugs, fsimonce, igoihman, jparrill, mhenriks, ncredi, sgordon, ycui
Target Milestone:	---
Target Release:	2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	1.9.3	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-07-24 20:15:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1705080
Bug Blocks:

Description Juan Manuel Parrilla Madrid 2019-02-20 12:32:01 UTC

Description of problem:

Trying to upload a qcow2 image to kubevirt as PVC using virtctl image-upload command fails with a "Unexpected return value 500"

Version-Release number of selected component (if applicable):

OCP 3.11
OCS 3.11
CNV 1.4

Packages:
kubevirt-ansible-0.12.3-1.f9f3ed4.noarch.rpm
kubevirt-cdi-manifests-1.5.1-2.c2a2fb4.noarch.rpm
kubevirt-manifests-0.13.3-1.g573d572.79a47ba.noarch.rpm
kubevirt-templates-0.4.1-1.23cf119.noarch.rpm
kubevirt-virtctl-0.13.3-1.g573d572.79a47ba.x86_64.rpm
kubevirt-virtctl-redistributable-0.13.3-1.g573d572.79a47ba.x86_64.rpm
python-kubevirt-0.1.0-1.b257d50.noarch.rpm

How reproducible:


Steps to Reproduce:
1. Deploy CNV using kubevirt-ansible
2. use virtctl image-upload command to get the qcow2 miage as PVC:
```
virtctl image-upload --image-path /opt/extra/w2k.qcow2 --pvc-name windows-pvc-manual --uploadproxy-url https://`oc get route -n cdi cdi-uploadproxy-route --template='{{ .spec.host }}'` --insecure --pvc-size 50Gi
```
The windows image comes from https://cloudbase.it/windows-cloud-images/
3. Meanwhile the upload is working take a look to cdi-deployment pod with:
```
oc logs cdi-deployment-58c6dd6c98-qwfvh -n cdi -f
```

Actual results:

The cdi-deployment pod finishes with a Golang error: https://pastebin.com/BXjBqkjZ
The upload stays but finishes with the error: "Unexpected return value 500"

Expected results:

Upload the image as PVC correctly

Additional info:

Comment 1 Juan Manuel Parrilla Madrid 2019-02-20 12:40:47 UTC

More info:

With a windows qcow2 (12gb):

[root@unified-demo02-master0 ~]# virtctl image-upload --image-path /opt/extra/w2k.qcow2 --pvc-name windows-pvc-manual --uploadproxy-url https://`oc get route -n cdi cdi-uploadproxy-route --template='{{ .spec.host }}'` --insecure --pvc-size 50Gi
PVC myvms/windows-pvc-manual created
Waiting for PVC windows-pvc-manual upload pod to be running...
Pod now running
Uploading data to https://cdi-uploadproxy-route-cdi.apps.cnv-comm.10.8.120.153.nip.io

 11.18 GiB / 11.18 GiB [======================================================================================================================] 100.00% 12m16s

Unexpected return value 500

With Fedora qcow2 (300 MiB):

[root@unified-demo02-master0 extra]# virtctl image-upload --image-path /opt/extra/Fedora-Cloud-Base-29-1.2.x86_64.qcow2 --pvc-name fedora-pvc-manual --uploadproxy-url https://`oc get route -n cdi cdi-uploadproxy-route --template='{{ .spec.host }}'` --insecure --pvc-size 8Gi
PVC myvms/fedora-pvc-manual created
Waiting for PVC fedora-pvc-manual upload pod to be running...
Pod now running
Uploading data to https://cdi-uploadproxy-route-cdi.apps.cnv-comm.10.8.120.153.nip.io

 293.25 MiB / 293.25 MiB [=======================================================================================================================] 100.00% 33s

Uploading /opt/extra/Fedora-Cloud-Base-29-1.2.x86_64.qcow2 completed successfully


Seems like happens with big images

Comment 3 Adam Litke 2019-02-20 14:56:01 UTC

Please post logs from the cdi uploadproxy pod and the relevant cdi uploadserver pod.  Also, please post the output of 'oc get -n cdi all'

Comment 4 Juan Manuel Parrilla Madrid 2019-02-20 15:24:02 UTC

Sure,

The CDI pods keep following this behaviour:

For Uploadproxy there is nothing relevant:
[root@unified-demo02-master0 ~]# oc get pods -w -n cdi
NAME                               READY     STATUS             RESTARTS   AGE
cdi-apiserver-7dcd78644b-dpqll     1/1       Running            0          1d
cdi-deployment-58c6dd6c98-qwfvh    0/1       CrashLoopBackOff   25         1d
cdi-operator-5cd78b89-wqtrv        1/1       Running            27         1d
cdi-uploadproxy-7b76f58c56-fm7jg   1/1       Running            0          1d
cdi-operator-5cd78b89-wqtrv        0/1       Error              27         1d
cdi-deployment-58c6dd6c98-qwfvh    1/1       Running            26         1d
cdi-operator-5cd78b89-wqtrv        0/1       CrashLoopBackOff   27         1d
cdi-deployment-58c6dd6c98-qwfvh    0/1       Error              26         1d
cdi-operator-5cd78b89-wqtrv        1/1       Running            28         1d
cdi-deployment-58c6dd6c98-qwfvh    0/1       CrashLoopBackOff   26         1d
cdi-deployment-58c6dd6c98-qwfvh    1/1       Running            27         1d


Logs from operator pod:
[root@unified-demo02-master0 ~]# oc logs -f cdi-operator-5cd78b89-wqtrv -n cdi
{"level":"info","ts":1550675793.7313886,"logger":"controler","msg":"Resource updated","Request.Namespace":"","Request.Name":"cdi","namespace":"cdi","name":"cdi-api","type":"*v1.Service"}
{"level":"info","ts":1550675793.7357757,"logger":"controler","msg":"Resource updated","Request.Namespace":"","Request.Name":"cdi","namespace":"cdi","name":"cdi-apiserver","type":"*v1.Deployment"}
{"level":"info","ts":1550675793.739132,"logger":"controler","msg":"Resource updated","Request.Namespace":"","Request.Name":"cdi","namespace":"cdi","name":"cdi-sa","type":"*v1.ServiceAccount"}
{"level":"info","ts":1550675793.7440128,"logger":"controler","msg":"Resource updated","Request.Namespace":"","Request.Name":"cdi","namespace":"cdi","name":"cdi-deployment","type":"*v1.Deployment"}
{"level":"info","ts":1550675793.7481909,"logger":"controler","msg":"Resource updated","Request.Namespace":"","Request.Name":"cdi","namespace":"cdi","name":"cdi-uploadproxy","type":"*v1.Service"}
{"level":"info","ts":1550675793.752048,"logger":"controler","msg":"Resource updated","Request.Namespace":"","Request.Name":"cdi","namespace":"cdi","name":"cdi-uploadproxy","type":"*v1.Deployment"}
{"level":"info","ts":1550675793.7523513,"logger":"controler","msg":"CDI is running","Request.Namespace":"","Request.Name":"cdi"}
I0220 15:19:09.774746       1 leaderelection.go:249] failed to renew lease cdi/cdi-controler-leader-election: failed to tryAcquireOrRenew context deadline exceeded
{"level":"error","ts":1550675949.7752702,"logger":"cmd","msg":"manager exited non-zero","error":"leader election lost","stacktrace":"kubevirt.io/containerized-data-importer/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/kubevirt.io/containerized-data-importer/cmd/cdi-operator/operator.go:106\nruntime.main\n\t/opt/rh/go-toolset-1.10/root/usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:198"}

Logs from UploadProxy:
[root@unified-demo02-master0 RDP]# oc logs cdi-uploadproxy-7b76f58c56-fm7jg -f
I0219 12:43:09.188515       1 uploadproxy.go:47] Note: increase the -v level in the api deployment for more detailed logging, eg. -v=2 or -v=3
W0219 12:43:09.189555       1 client_config.go:548] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0220 10:32:37.426169       1 uploadproxy.go:149] Received valid token: pvc: windows-pvc-manual, namespace: glusterfs
I0220 10:46:08.290438       1 uploadproxy.go:149] Received valid token: pvc: windows-pvc-manual, namespace: myvms
I0220 10:58:39.189080       1 uploadproxy.go:149] Received valid token: pvc: windows-pvc-manual, namespace: myvms
I0220 11:14:30.680309       1 uploadproxy.go:149] Received valid token: pvc: windows-pvc-manual, namespace: myvms
I0220 11:32:11.573448       1 uploadproxy.go:149] Received valid token: pvc: windows-pvc-manual, namespace: myvms
I0220 12:05:52.667341       1 uploadproxy.go:149] Received valid token: pvc: windows-pvc-manual, namespace: myvms
I0220 12:21:18.893829       1 uploadproxy.go:149] Received valid token: pvc: windows-pvc-manual, namespace: myvms
I0220 12:37:40.614278       1 uploadproxy.go:149] Received valid token: pvc: fedora-pvc-manual, namespace: myvms
I0220 15:10:07.656014       1 uploadproxy.go:149] Received valid token: pvc: windows-pvc-manual, namespace: myvms

Logs from UploadServerPod (still there after the failure):
[root@unified-demo02-master0 ~]# oc logs cdi-upload-windows-pvc-manual -f
I0220 15:10:02.586735       1 uploadserver.go:59] Upload destination: /data/disk.img
I0220 15:10:02.588734       1 uploadserver.go:61] Running server on 0.0.0.0:8443
I0220 15:10:07.735313       1 dataStream.go:322] Saving stream to "/data/disk.img"...
I0220 15:10:07.889354       1 util.go:38] begin import...
I0220 15:17:53.265811       1 prlimit.go:107] ExecWithLimits qemu-img, [info --output=json /data/disk52fdfc072182654f.img]
I0220 15:17:55.896098       1 prlimit.go:107] ExecWithLimits qemu-img, [info --output=json /data/disk52fdfc072182654f.img]
I0220 15:17:55.923823       1 prlimit.go:107] ExecWithLimits qemu-img, [convert -p -f qcow2 -O raw /data/disk52fdfc072182654f.img /data/disk.img]
E0220 15:22:11.321794       1 prlimit.go:160]     (0.00/100%)    (1.01/100%)    (2.02/100%)    (3.03/100%)    (4.04/100%)    (5.04/100%)    (6.05/100%)    (7.06/100%)    (8.07/100%)    (9.09/100%)    (10.10/100%)    (11.10/100%)    (12.11/100%)    (13.12/100%)    (14.13/100%)    (15.14/100%)    (16.15/100%)    (17.16/100%)    (18.17/100%)    (19.17/100%)    (20.18/100%)    (21.19/100%)    (22.20/100%)    (23.20/100%)    (24.21/100%)    (25.22/100%)    (26.22/100%)    (27.23/100%)    (28.24/100%)    (29.25/100%)    (30.26/100%)    (31.27/100%)    (32.27/100%)    (33.28/100%)    (34.29/100%)    (35.30/100%)    (36.30/100%)    (37.31/100%)    (38.32/100%)    (39.33/100%)
E0220 15:22:20.814630       1 uploadserver.go:203] Saving stream failed: data stream copy failed: Local qcow to raw conversion failed: could not convert local qcow2 image to raw: qemu-img execution failed: signal: killed

Elements from CDI Namespace:
[root@unified-demo02-master0 ~]# oc get all -n cdi
NAME                                   READY     STATUS    RESTARTS   AGE
pod/cdi-apiserver-7dcd78644b-dpqll     1/1       Running   0          1d
pod/cdi-deployment-58c6dd6c98-qwfvh    1/1       Running   27         1d
pod/cdi-operator-5cd78b89-wqtrv        1/1       Running   29         1d
pod/cdi-uploadproxy-7b76f58c56-fm7jg   1/1       Running   0          1d

NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/cdi-api           ClusterIP   172.30.25.110    <none>        443/TCP   1d
service/cdi-uploadproxy   ClusterIP   172.30.151.191   <none>        443/TCP   1d

NAME                              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cdi-apiserver     1         1         1            1           1d
deployment.apps/cdi-deployment    1         1         1            1           1d
deployment.apps/cdi-operator      1         1         1            1           1d
deployment.apps/cdi-uploadproxy   1         1         1            1           1d

NAME                                         DESIRED   CURRENT   READY     AGE
replicaset.apps/cdi-apiserver-7dcd78644b     1         1         1         1d
replicaset.apps/cdi-deployment-58c6dd6c98    1         1         1         1d
replicaset.apps/cdi-operator-5cd78b89        1         1         1         1d
replicaset.apps/cdi-uploadproxy-7b76f58c56   1         1         1         1d

NAME                                             HOST/PORT                                                     PATH      SERVICES          PORT      TERMINATION   WILDCARD
route.route.openshift.io/cdi-uploadproxy-route   cdi-uploadproxy-route-cdi.apps.cnv-comm.10.8.120.153.nip.io             cdi-uploadproxy   <all>     passthrough   None

Comment 5 Adam Litke 2019-02-20 16:03:37 UTC

Please provide:

qemu-img info <windows.qcow>

oc get pvc -o yaml <windows disk pvc>

Comment 6 Juan Manuel Parrilla Madrid 2019-02-20 16:17:40 UTC

PVC Yaml:

[root@unified-demo02-master0 ~]# oc get pvc -o yaml windows-pvc-manual
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/storage.pod.phase: Running
    cdi.kubevirt.io/storage.upload.target: ""
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/glusterfs
  creationTimestamp: 2019-02-20T15:09:31Z
  finalizers:
  - kubernetes.io/pvc-protection
  name: windows-pvc-manual
  namespace: myvms
  resourceVersion: "341830"
  selfLink: /api/v1/namespaces/myvms/persistentvolumeclaims/windows-pvc-manual
  uid: 863ab0ad-3521-11e9-a251-001a4a2315d4
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: glusterfs-storage
  volumeName: pvc-863ab0ad-3521-11e9-a251-001a4a2315d4
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 50Gi
  phase: Bound

I've tried to boot up a VM using the PVC, even failing the upload process but as I expected "Not bootable device" error appears

qcow2 info:

[root@unified-demo02-master0 ~]# qemu-img info /opt/extra/w2k.qcow2 
image: /opt/extra/w2k.qcow2
file format: qcow2
virtual size: 12G (13096714240 bytes)
disk size: 11G
cluster_size: 65536
Format specific information:
    compat: 0.10

Comment 8 Michael Henriksen 2019-02-20 18:52:40 UTC

The CPU resource limit we put on "qemu-img convert" (30 sec) seems to be kicking in.  I did the following:

The upload pod was still running when I accessed the system.  I used "oc cp" to copy the image file to the pod.  I then execed into the pod and ran "qemu-img convert" manually (no process limits).  While it was running, I was watching the process in top from another window.  The process never came close to hitting our memory limit (1G).  The convert operation did complete successfully after several minutes.  So, I think our upload operation was killed by prlimit.

I then edited the cdi controller yaml to reference a cdi-uploadserver container I built locally and pushed to docker hub.  It had no cpu limits.  I re-ran virtctl and the upload succeeded.  Here is the comand output: 

[root@unified-demo02-master0 ~]# virtctl image-upload --image-path /opt/extra/w2k.qcow2 --pvc-name windows-pvc-manual --uploadproxy-url https://`oc get route -n cdi cdi-uploadproxy-route --template='{{ .spec.host }}'` --insecure --pvc-size 50Gi
Using existing PVC myvms/windows-pvc-manual
Uploading data to https://cdi-uploadproxy-route-cdi.apps.cnv-comm.10.8.120.153.nip.io

 11.18 GiB / 11.18 GiB [=====================================================================================================================================================================================================================================] 100.00% 50m26s

Uploading /opt/extra/w2k.qcow2 completed successfully

Can verify the pvc data at myvms/windows-pvc-manual.

I put the system back into original state otherwise.

Comment 9 Adam Litke 2019-02-20 21:06:36 UTC

Juan, can you change your flow to upload a compressed (.gz) raw file?  Ie. do the conversion yourself and then compress it.  It should be the same size to transfer but will avoid the conversion step which is being killed.  For 1.4 we may want to suggest this workaround.

Comment 13 Qixuan Wang 2019-06-10 08:25:57 UTC

Tested with hco-bundle-registry:v2.0.0-24, virt-cdi-uploadproxy:v2.0.0-25, virt-cdi-uploadserver:v2.0.0-25 on Bare Metal.
The image was uploaded 100% but got "Unexpected return value 504". During "Validating image" phase, we can see it on the disk. When upload pod was gone and then checked disk, the image was deleted as well.  


[root@hp-dl380g10-08 ~]# qemu-img info win_12.qcow2
image: win_12.qcow2
file format: qcow2
virtual size: 21G (22548578304 bytes)
disk size: 7.9G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false


Gave sufficient PVC to upload
[root@hp-dl380g10-08 ~]# virtctl image-upload --image-path=./win_12.qcow2 --uploadproxy-url=https://cdi-uploadproxy-kubevirt-hyperconverged.apps.working.oc4 --pvc-name=win12-1 --pvc-size=24Gi --insecure
PVC qwang3/win12-1 created
Waiting for PVC win12-1 upload pod to be running...
Pod now running
Uploading data to https://cdi-uploadproxy-kubevirt-hyperconverged.apps.working.oc4

 7.86 GiB / 7.86 GiB [========================================================================================================] 100.00% 1m57s

Unexpected return value 504


Checked where the upload pod was running.
[root@hp-dl380g10-08 ~]# oc get pod -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP            NODE                           NOMINATED NODE
cdi-upload-win12-1   1/1     Running   0          34s   10.129.1.61   working-dpp8b-worker-0-lxknt   <none>


On that node where upload pod was running, we can see the scatch disk and targeted disk.
[root@working-dpp8b-worker-0-lxknt local-sc]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    252:0    0 16.5G  0 disk
|-vda1 252:1    0    1M  0 part
|-vda2 252:2    0    1G  0 part /boot
`-vda3 252:3    0 15.5G  0 part /sysroot
vdb    252:16   0   25G  0 disk /var/lib/kubelet/pods/a7fffa4e-8b56-11e9-baa6-664f163f5f0f/volumes/kubernetes.io~local-volume/local-pv-317d78
vdc    252:32   0   25G  0 disk
vdd    252:48   0   25G  0 disk /var/lib/kubelet/pods/a7fffa4e-8b56-11e9-baa6-664f163f5f0f/volumes/kubernetes.io~local-volume/local-pv-d21b55
vde    252:64   0   25G  0 disk


[root@hp-dl380g10-08 ~]# oc logs -f cdi-upload-win12-1
I0610 08:06:42.215803       1 uploadserver.go:70] Upload destination: /data/disk.img
I0610 08:06:42.225598       1 uploadserver.go:72] Running server on 0.0.0.0:8443
2019/06/10 08:06:49 http: TLS handshake error from 10.130.1.64:45366: EOF
I0610 08:06:49.729136       1 data-processor.go:237] Calculating available size
I0610 08:06:49.841510       1 data-processor.go:245] Checking out file system volume size.
I0610 08:06:49.841656       1 data-processor.go:249] Request image size not empty.
I0610 08:06:49.841705       1 data-processor.go:254] Target size 24Gi.
I0610 08:06:49.842048       1 data-processor.go:167] New phase: TransferScratch
I0610 08:06:49.850385       1 util.go:140] Writing data...
I0610 08:08:17.306867       1 data-processor.go:167] New phase: Process
I0610 08:08:17.306960       1 data-processor.go:167] New phase: Convert
I0610 08:08:17.306970       1 data-processor.go:173] Validating image
I0610 08:10:06.611373       1 data-processor.go:167] New phase: Resize
I0610 08:10:06.964799       1 data-processor.go:230] Expanding image size to: 24Gi
I0610 08:10:07.005019       1 data-processor.go:167] New phase: Complete
I0610 08:10:07.016420       1 util.go:37] deleting file: /scratch/tmpimage
I0610 08:10:07.257361       1 uploadserver.go:254] Wrote data to /data/disk.img
I0610 08:10:07.257411       1 uploadserver.go:146] Shutting down http server after successful upload
I0610 08:10:07.757632       1 uploadserver.go:80] UploadServer successfully exited


After upload pod finished it's job, checked whether the image was upload
[root@working-dpp8b-worker-0-lxknt local-sc]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    252:0    0 16.5G  0 disk
|-vda1 252:1    0    1M  0 part
|-vda2 252:2    0    1G  0 part /boot
`-vda3 252:3    0 15.5G  0 part /sysroot
vdb    252:16   0   25G  0 disk
vdc    252:32   0   25G  0 disk
vdd    252:48   0   25G  0 disk
vde    252:64   0   25G  0 disk


[root@hp-dl380g10-08 ~]# oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM            STORAGECLASS   REASON   AGE
local-pv-317d785d   25Gi       RWO            Delete           Bound       qwang3/win12-1   local-sc                17m
local-pv-7a67dcda   25Gi       RWO            Delete           Available                    local-sc                3d2h
local-pv-8ebf1a76   25Gi       RWO            Delete           Available                    local-sc                17m
local-pv-9275d680   25Gi       RWO            Delete           Available                    local-sc                17m
local-pv-b050a179   25Gi       RWO            Delete           Available                    local-sc                3d2h
local-pv-b6362b94   25Gi       RWO            Delete           Available                    local-sc                17m
local-pv-d21b55bf   25Gi       RWO            Delete           Available                    local-sc                104s
local-pv-f7bc4a6b   25Gi       RWO            Delete           Available                    local-sc                9m26s

Comment 14 Michael Henriksen 2019-06-10 17:11:11 UTC

504 is not returned anywhere in CDI code.  The OpenShift router -> cdi-uploadproxy connection must be timing out.

Comment 15 Irit Goihman 2019-06-13 13:08:11 UTC

After investigating this issue, this is indeed route connection timeout.
It happens with large qcow files that need extra time to validate and resize.
We can configure uploadproxy route and override default timeout and solves the issue:

# oc annotate route cdi-uploadproxy -n kubevirt-hyperconverged --overwrite haproxy.router.openshift.io/timeout=2m

# virtctl image-upload --image-path=./win_12.qcow2 --uploadproxy-url=https://cdi-uploadproxy-kubevirt-hyperconverged.apps.working.oc4 --pvc-name=win12-1 --pvc-size=24Gi --insecure
PVC local-storage/win12-1 created
Waiting for PVC win12-1 upload pod to be running...
Pod now running
Uploading data to https://cdi-uploadproxy-kubevirt-hyperconverged.apps.working.oc4

 7.86 GiB / 7.86 GiB [==============================================================================================================================================================================] 100.00% 3m10s

Uploading ./win_12.qcow2 completed successfully

I will create a PR with the route annoation.

Comment 16 Irit Goihman 2019-06-13 14:22:39 UTC

The issue needs to be addressed with changes in virtctl imageupload.
virtctl keeps the connection open and waits until the image is converted to raw and resized. This is causing timeouts when uploading large qcow2 images.
We will fix it in 2.1, for now we can set a high timeout in order to avoid error messages.

Comment 18 Irit Goihman 2019-06-13 17:30:20 UTC

https://github.com/kubevirt/containerized-data-importer/pull/854

Comment 19 Qixuan Wang 2019-06-21 17:13:31 UTC

Tested with hco-bundle-registry:v2.0.0-30, virt-cdi-xxx:v2.0.0-27(CDI v1.9.3), the bug has been fixed, thanks.

Here is verification steps:

[cnv-qe-jenkins@cnv-executor-qwang-cnv2 ~]$ virtctl image-upload --image-path=./win_12.qcow2 --uploadproxy-url=https://cdi-uploadproxy-kubevirt-hyperconverged.apps.working.oc4 --pvc-name=win12-1 --pvc-size=24Gi --insecure
PVC qwang1/win12-1 created
Waiting for PVC win12-1 upload pod to be running...
Pod now running
Uploading data to https://cdi-uploadproxy-kubevirt-hyperconverged.apps.working.oc4

 7.86 GiB / 7.86 GiB [========================================================================================================================] 100.00% 2m51s

Uploading ./win_12.qcow2 completed successfully

[cnv-qe-jenkins@cnv-executor-qwang-cnv2 ~]$ oc logs -f cdi-upload-win12-1
I0621 16:26:34.429962       1 uploadserver.go:70] Upload destination: /data/disk.img
I0621 16:26:34.430332       1 uploadserver.go:72] Running server on 0.0.0.0:8443
2019/06/21 16:26:43 http: TLS handshake error from 10.129.0.35:56478: EOF
I0621 16:26:43.756115       1 data-processor.go:237] Calculating available size
I0621 16:26:43.760013       1 data-processor.go:245] Checking out file system volume size.
I0621 16:26:43.760075       1 data-processor.go:249] Request image size not empty.
I0621 16:26:43.760112       1 data-processor.go:254] Target size 24Gi.
I0621 16:26:43.760416       1 data-processor.go:167] New phase: TransferScratch
I0621 16:26:43.764231       1 util.go:140] Writing data...
I0621 16:29:08.803901       1 data-processor.go:167] New phase: Process
I0621 16:29:08.804001       1 data-processor.go:167] New phase: Convert
I0621 16:29:08.804007       1 data-processor.go:173] Validating image
I0621 16:29:34.397822       1 data-processor.go:167] New phase: Resize
I0621 16:29:34.442107       1 data-processor.go:230] Expanding image size to: 24Gi
I0621 16:29:34.476646       1 data-processor.go:167] New phase: Complete
I0621 16:29:34.476778       1 util.go:37] deleting file: /scratch/tmpimage
I0621 16:29:34.780353       1 uploadserver.go:254] Wrote data to /data/disk.img
I0621 16:29:34.780541       1 uploadserver.go:146] Shutting down http server after successful upload
I0621 16:29:35.281110       1 uploadserver.go:80] UploadServer successfully exited

[cnv-qe-jenkins@cnv-executor-qwang-cnv2 ~]$ oc get pv | grep win12-1
local-pv-7d87de98   25Gi       RWO            Delete           Bound       qwang1/win12-1   local-sc                 4h47m


[cnv-qe-jenkins@cnv-executor-qwang-cnv2 ~]$ oc describe pv local-pv-7d87de98
Name:              local-pv-7d87de98
Labels:            <none>
Annotations:       pv.kubernetes.io/bound-by-controller=yes
                   pv.kubernetes.io/provisioned-by=local-volume-provisioner-host-172-16-0-38-ded6d8e9-9415-11e9-a85a-fa163eee6529
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      local-sc
Status:            Bound
Claim:             qwang1/win12-1
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          25Gi
Node Affinity:
  Required Terms:
    Term 0:        kubernetes.io/hostname in [host-172-16-0-38]
Message:
Source:
    Type:  LocalVolume (a persistent volume backed by local storage on a node)
    Path:  /mnt/local-storage/local-sc/vde
Events:    <none>


[core@host-172-16-0-38 ~]$ sudo mkdir bug
[core@host-172-16-0-38 ~]$ sudo mount /mnt/local-storage/local-sc/vde bug
[core@host-172-16-0-38 ~]$ ls -lsh bug/
total 7.5G
7.5G -rw-r--r--. 1 1000550000 1000550000 24G Jun 21 16:29 disk.img

Comment 21 errata-xmlrpc 2019-07-24 20:15:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1850