Description of problem: During some first baseline results of migration from RHV to MTV using rhev as a provider, we noticed some suspicious results. Test details : the VMs OS version is RHEL 7.6 they are located on FC Storage Domain using single host (rhev) version > rhv-release-4.4.7-6, target storage class is OCS. I've checked the VMS were made correctly, they are filled correctly, the disk types are correct. attached are the results from the baseline cycles : 1. 100 GB Sparse Thin disk of which 2GB actual size rhel76-100gb-os-only-thin migration took: 2m26s / 2m34s 2. 100 GB Sparse Thin disk of which 35GB actual size (rhel76-100gb-30usage-thin) migration took: 4m9s / 4m10s 3. 100 GB Sparse Thin disk of which 72GB actual size rhel76-100gb-70usage-thin migration took: 3m38s / 3m48s Open questions : - why does it take 2.5 mins to transfer 2GB disk? - why is 72 GB sparse disk faster than 35GB sparse disk? - we see the same behavior on different OCP clouds cloud20 & cloud38 Importer issues are: 1. not using image extents - performs one GET request, 2. copying the zeroes over the wire, and likely writing all the zeroes to the target disk. 3. importer pod first asks for the TotalSize of the oVirt disk object, and only checks Content-Length if the total size was zero, should 4. Importing pod progress does not update correctly - Tzahi needs to check if importer memory is decreasing as disk flushes/writes happen. feedback from Nir soffer: 1. not using image extents - performs one GET request, 2. copying the zeroes over the wire, and likely writing all the zeroes to the target disk. 3. importer pod first asks for the TotalSize of the oVirt disk object, and only checks Content-Length if the total size was zero the full logs from the baseline results can be found under : https://drive.google.com/drive/folders/1d1PPrwNZ8XHmU-SPwhLBI89lPguKTeXp?usp=sharing logs : 1. ovirt-imageio-daemon.log 2. rhel76-100gb-30usage-thin.log 3. rhel76-100gb-70usage-thin.log 4. rhel76-100gb-os-only-2gb-usage-thin.log 5. vdsm.log Version-Release number of selected component (if applicable): Cloud20 OCP-48 CNV-48-451 OCS-48 MTV-2.1.0-44
Retargeting to 4.10.0 to have enough time to investigate the solution and test it. It will also allow time to use govirtclient internally.
This should be fixed in CNV v4.10.0-524.
Tested again using red02 as provider with the same VMs : 1. 100 GB Sparse Thin disk of which 2GB actual size rhel76-100gb-os-only-thin migration took: 00:00:18 sec data copy speed - 111.1 MB/s 2. 100 GB Sparse Thin disk of which 35GB actual size (rhel76-100gb-30usage-thin) migration took: 00:04:59 min data copy speed -100.33 MB/s 3. 100 GB Sparse Thin disk of which 72GB actual size rhel76-100gb-70usage-thin migration took: 00:10:02 min data copy speed -116.28 MB/s those results are more reasonable then the previous: * on the previous cycle it took 2.5 mins to transfer 2GB disk, now it took 18 sec on the same data * 72 GB sparse disk faster than 35GB sparse disk , now the 35GB is faster then the 72GB as expected the average data copy speed is 100MB/s ( 800mb/s ) Version-Release number of selected component (if applicable): * Cloud10 * OCP-4.10 * OCS-4.9.1 * CNV-v4.10.0-598 * MTV-2.3.0-15 the full logs from the baseline results can be found under : https://drive.google.com/drive/folders/1IT7dKmfacIvMpyccQ593DgMQuu0vI-Ym?usp=sharing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947