1857815 – Copying HE-VM's disk to iSCSI storage volume, during deployment of 4.4 HE takes too long.

Bug 1857815 - Copying HE-VM's disk to iSCSI storage volume, during deployment of 4.4 HE takes too long.

Summary: Copying HE-VM's disk to iSCSI storage volume, during deployment of 4.4 HE tak...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-hosted-engine-setup
Classification:	oVirt
Component:	General
Sub Component:
Version:	2.4.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.5.0
Target Release:	2.6.1
Assignee:	Arik
QA Contact:	Avihai
Docs Contact:
URL:
Whiteboard:
Depends On:	1861682
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-16 15:45 UTC by Nikolai Sednev
Modified:	2022-04-20 06:30 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2022-03-17 13:09:05 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	mlehrer: needinfo- pm-rhel: ovirt-4.5? lsvaty: blocker-

Attachments	(Terms of Use)
Screenshot from 2020-07-16 15-57-13.png (175.25 KB, image/png) 2020-07-16 15:45 UTC, Nikolai Sednev	no flags	Details
Screenshot from 2020-07-14 15-00-47.png (202.25 KB, image/png) 2020-07-16 15:46 UTC, Nikolai Sednev	no flags	Details
Screenshot from 2020-07-16 18-49-21.png (204.32 KB, image/png) 2020-07-16 15:50 UTC, Nikolai Sednev	no flags	Details
ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200716172342-cou85b.log (652.90 KB, text/plain) 2020-07-20 10:52 UTC, Nikolai Sednev	no flags	Details
Show Obsolete (1) View All

Description Nikolai Sednev 2020-07-16 15:45:55 UTC

Created attachment 1701415 [details]
Screenshot from 2020-07-16 15-57-13.png

Description of problem:
[ INFO  ] TASK [ovirt.hosted_engine_setup : Copy local VM disk to shared storage] task takes too much time while compared to 4.3 over iSCSI same storage from same host with different OS.
On 4.3 RHEL7.8 HE-VM's disk copied much faster than from 4.4 RHEL8.2. Byte sizes are also different. 

This is how iSCSI deployment uploads an image in 4.3: Screenshot from 2020-07-16 15-57-13.png. Disk is 59.3GB of 100GB (during deployment I'm creating a smaller 59GB disk than it's actual volume at iSCSI storage "Please specify the size of the VM disk in GiB: [59]:").

And this is how it's done in 4.4: Screenshot from 2020-07-14 15-00-47.png and the volume even 120GB on iSCSI, but the same size of 59GB disk is created during deployment.

Same host, same network, same storage, different code handling between 4.3 and 4.4 for iSCSI and different byte sizes, in 4.3 it's >512KB, while in 4.4 it's =<8KB. The larger the frame, the higher the throughput.


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch


How reproducible:


Steps to Reproduce:
1.Deploy 4.3HE over iSCSI and compare HE-VM's disk copy time with the same deployment of 4.4 HE over iSCSI. 

Actual results:
Copying HE-VM's disk during deployment of 4.4 HE takes too long.

Expected results:
Disk copy should be the same speed in 4.4 like it was in 4.3.

Additional info:
Screenshots from iSCSI storage attached:
Screenshot from 2020-07-16 15-57-13.png
Screenshot from 2020-07-14 15-00-47.png

Comment 1 Nikolai Sednev 2020-07-16 15:46:31 UTC

Created attachment 1701416 [details]
Screenshot from 2020-07-14 15-00-47.png

Comment 2 Nikolai Sednev 2020-07-16 15:50:18 UTC

Created attachment 1701417 [details]
Screenshot from 2020-07-16 18-49-21.png

Comment 3 Nikolai Sednev 2020-07-16 15:52:45 UTC

Reproduction is very consistent, happens all the time at the same host.

Comment 4 Michal Skrivanek 2020-07-17 04:43:45 UTC

Unlikely to have anything to do with installer. Please verify storage throughput on respective bare RHELs and then we can compare something

Comment 6 Nikolai Sednev 2020-07-20 10:50:52 UTC

(In reply to Michal Skrivanek from comment #4)
> Unlikely to have anything to do with installer. Please verify storage
> throughput on respective bare RHELs and then we can compare something

Take a look on https://bugzilla.redhat.com/show_bug.cgi?id=1854867#c16 and https://bugzilla.redhat.com/show_bug.cgi?id=1854867#c19 , it might be related.

We definitely have an issue here. I also see in ovirt-hosted-engine-setup-20200716172342-cou85b.log:
2020-07-16 18:32:10,102+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK
 [ovirt.hosted_engine_setup : Copy local VM disk to shared storage]
2020-07-16 19:14:11,289+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 chan
ged: [localhost]
2020-07-16 19:14:11,891+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK
 [ovirt.hosted_engine_setup : Verify copy of VM disk]

Copy started at 18:32:10 and got verified at 19:14:11. This is about 42 minutes!

Comment 7 Nikolai Sednev 2020-07-20 10:52:55 UTC

Created attachment 1701735 [details]
ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200716172342-cou85b.log

Comment 8 Lukas Svaty 2020-07-20 11:04:06 UTC

Did we try on multiple environments or is it present only on single one?

Comment 9 mlehrer 2020-07-20 13:05:13 UTC

(In reply to Lukas Svaty from comment #8)
> Did we try on multiple environments or is it present only on single one?

Assuming the problematic flow is: 'Copy local VM disk to shared storage' and not the verify disk
So we think the difference might be?

4.4: qemu-img convert -f qcow2 -O raw -t none -T none {{ local_vm_disk_path }} {{ he_virtio_disk_path }}
4.3: qemu-img convert -n -O raw {{ local_vm_disk_path }} {{ he_virtio_disk_path }}

https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/ovirt-4.3/tasks/create_target_vm/03_hosted_engine_final_tasks.yml#L245


IIUC you just need to grab the value of local_vm_disk_path and he_virtio_disk_path from your logs and you can likely re-execute this to compare manually.

Comment 10 Nikolai Sednev 2020-07-20 14:10:25 UTC

(In reply to mlehrer from comment #9)
> (In reply to Lukas Svaty from comment #8)
> > Did we try on multiple environments or is it present only on single one?
> 
> Assuming the problematic flow is: 'Copy local VM disk to shared storage' and
> not the verify disk
> So we think the difference might be?
> 
> 4.4: qemu-img convert -f qcow2 -O raw -t none -T none {{ local_vm_disk_path
> }} {{ he_virtio_disk_path }}
> 4.3: qemu-img convert -n -O raw {{ local_vm_disk_path }} {{
> he_virtio_disk_path }}
> 
> https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/ovirt-4.3/
> tasks/create_target_vm/03_hosted_engine_final_tasks.yml#L245
> 
> 
> IIUC you just need to grab the value of local_vm_disk_path and
> he_virtio_disk_path from your logs and you can likely re-execute this to
> compare manually.

These are exactly the lines that were changed between the versions, I verified one of the bugs related to this change.

Comment 11 Nikolai Sednev 2020-07-20 14:13:14 UTC

Local VM path being cleaned up, once deployment is done, so I can't really check that manually.

Comment 12 Nikolai Sednev 2020-07-23 12:20:05 UTC

In comparison the FC deployment works just fine and really fast at the VM-Disk copy part, it takes a few minutes if not less.
I just checked that with:
ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch

The bug is iSCSI specific.

Comment 13 Nikolai Sednev 2020-07-23 13:52:09 UTC

(In reply to Lukas Svaty from comment #8)
> Did we try on multiple environments or is it present only on single one?

I tried it from pumas and from almas, the result was just the same. I both environments writing of the disk took about 42 minutes over iSCSI.

Comment 14 mlehrer 2020-07-27 15:49:29 UTC

The bz its reported that ovirt-hosted-engine deploy role is taking along time to copy a disk in 4.3 vs 4.4 deployment

When checking the ansible repository deployment for the line of code to see the difference for copying we see:

4.4: qemu-img convert -f qcow2 -O raw -t none -T none {{ local_vm_disk_path }} {{ he_virtio_disk_path }}
4.3: qemu-img convert -n -O raw {{ local_vm_disk_path }} {{ he_virtio_disk_path }}


We also see the same qemu-img convert -f qcow2 -O raw -t none -T none used in copying disks flows by VDSM in 4.4 on ISCSI storage
Copying raw empty 100GB between ISCSI SDs: 8 mins
Copying qcow2 with 66 utilization of data of 100GB disk between ISCSI SDs: 6min 40s [1]

Neither takes 44 minutes to copy 100GB

So in order to understand if "qemu-img convert -n -O raw" vs "qemu-img convert -f qcow2 -O raw -t none -T none" is problematic then a disk image which contains partial data should be tested.  Specifically 59GB of 100GB data is what is deployed should be put on a local disk and written to an iscsi lun on that storage.

Additionally its unclear if the issue is really around copying or if its zeroing - see known issue:
BZ 1847192 - 'qemu-img convert uses possibly slow pre-zeroing on block storage'


I suggest contacting someone from qemu side, and the corresponding dev owner of the ansible role to provide more advice here.




[1] 2020-07-27 14:43:50,309+0000 DEBUG (tasks/0) [common.commands] /usr/bin/taskset --cpu-list 0-63 /usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img convert -p -t none -T none -f qcow2 -O qcow2 -n /rhev/data-center/mnt/blockSD/a51def3e-a684-44d2-aa90-9f6833af95e8/images/5a3a3cb2-a65a-449d-b4ba-48c4e8e4c374/221bf8be-4c2f-4dc8-a420-72d8355af479 /rhev/data-center/mnt/blockSD/a51def3e-a684-44d2-aa90-9f6833af95e8/images/842c8535-04c8-4616-9907-6c08dbc2581b/5b14c95b-cdaa-4af0-b92c-e3ef091a59dd (cwd None) (commands:153)

Comment 15 Nir Soffer 2020-08-03 11:35:12 UTC

Mordechai, I suggest you reproduce this issue in the scale lab before
we contact qemu folks.

This smells like an issue with specific environment. Testing in the scale
lab will quickly prove this.

Important - when copying to block storage (ISCSI/FC), the command should be:

   qemu-img convert -f qcow2 -O raw -t none -T none -W src dst

The -W option can give up to 6 times better performance, but it should
be used only with raw volume on block storage (see bug 1511891).

If you reproduce the issue, please run the qemu-img command with strace
like this to understand whats happening:

    strace -f -tt -T -o convert.trace qemu-img convert ...

Comment 16 Nikolai Sednev 2020-08-04 08:36:02 UTC

See also https://bugzilla.redhat.com/show_bug.cgi?id=1828888#c19 , the issue is probably related to https://bugzilla.redhat.com/show_bug.cgi?id=1855250

Comment 17 Nir Soffer 2020-08-17 16:22:22 UTC

This may be related to bug 1855250 but unlikely, see comment 15.

Comment 18 Tzahi Ashkenazi 2020-08-25 12:58:58 UTC

testet on the following environment :
engine : red-03 
host : f01-h11-000-r620.rdu2.scalelab.redhat.com
SD size 1TB iscsi DATA  
empty disk on SD 100GB preallocated 
qcow file size 66.6GB 
utilis : nmon , iostst 
version : 
rhv-release-4.4.1-12-001.noarch
vdsm-4.40.22-1.el8ev.x86_64

1. source file - new qcow file was generated from existing VM with usage capacity of 66GB  out of 100GB 
2. destination file - new empty disk  from SD on iscsi , 100GB preallocate ,  /dev/$vg_name /$lv_name  | by new disk id

time qemu-img convert -f qcow2 -O raw -t none -T none -W /tmp/87838ad2-d7ea-47e8-970e-35253ed2b7d7.qcow2 /dev/44fd5f13-db6f-4c79-923c-87c83b030829/5e502e66-5ad5-483d-8567-2178d481330b
real 7m44.908s
user 0m3.554s
sys 0m12.806s

new disk - without the -W option : 
 time qemu-img convert -f qcow2 -O raw -t none -T none  /tmp/87838ad2-d7ea-47e8-970e-35253ed2b7d7.qcow2 /dev/44fd5f13-db6f-4c79-923c-87c83b030829/bd0cbce1-67bf-4aae-b985-464dadc77e70

real 8m3.674s
user 0m4.401s
sys 0m14.343s


time blkdiscard --zeroout /dev/44fd5f13-db6f-4c79-923c-87c83b030829/bd0cbce1-67bf-4aae-b985-464dadc77e70
real 0m49.677s
user 0m0.001s
sys 0m0.062s


from nir, on the blkdiscard command  : 

So the worst case when using the qemu-img version that does unneeded zeroing, the operation
can be ~50 seconds slower because of the unneeded zeroing.

The bug in qemu-img is that it will do:

1. zero entire destination device (100g)
2. copy data from source to destination (65g)

Instead of:

1. copy data from source to destination (65g) 
2. zero unallocated/zero ares in destination (35g) 



Future testing :
test again with the -W flag on the  new qemu version

Comment 19 Nir Soffer 2020-09-23 09:51:21 UTC

Fixing blocking bug, we depend on advanced virt, not on rhel virt.
This bug can be tested since Sep 1.

Comment 20 Nir Soffer 2021-02-26 23:44:28 UTC

(In reply to Tzahi Ashkenazi from comment #18)

> time qemu-img convert -f qcow2 -O raw -t none -T none -W
> /tmp/87838ad2-d7ea-47e8-970e-35253ed2b7d7.qcow2
> /dev/44fd5f13-db6f-4c79-923c-87c83b030829/5e502e66-5ad5-483d-8567-
> 2178d481330b
> real 7m44.908s
> user 0m3.554s
> sys 0m12.806s
> 
> new disk - without the -W option : 
>  time qemu-img convert -f qcow2 -O raw -t none -T none 
> /tmp/87838ad2-d7ea-47e8-970e-35253ed2b7d7.qcow2
> /dev/44fd5f13-db6f-4c79-923c-87c83b030829/bd0cbce1-67bf-4aae-b985-
> 464dadc77e70
> 
> real 8m3.674s
> user 0m4.401s
> sys 0m14.343s

These results do not make sense. See similar test done when local file is
on NFS:
https://bugzilla.redhat.com/show_bug.cgi?id=1511891#c57

Copying relevant test results:

image       qemu-img    qemu-img/-W      dd    parallel-dd
----------------------------------------------------------
100/19G          242             41      165           128
100/52G          658            119      197           144
100/86G         1230            189      238           132

Your image contains 66g of data so we expect something between
the 100/52 and 100/86 results, around 135 seconds, but you got
464 seconds.

Maybe the qcow2 image was fully allocated? when you report results,
show the output of "qemu-img info filename".

The issue may be slow local disk. Please show also this command
on the host running the test:

    time qemu-img convert -n -f qcow2 -O raw -T none -W /path/to/image.qcow2 'json:{"file.driver":"null-co","file.size":"100g"}'

This will show how fast we can read from the local disk.

Here is an example on local server ssd (SDELL PERC H740P Mini):

# hostname -s
f02-h23-000-r640

# qemu-img info /var/tmp/nsoffer-v2v.qcow2 
image: /var/tmp/nsoffer-v2v.qcow2
file format: qcow2
virtual size: 100 GiB (107374182400 bytes)
disk size: 66.5 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# time qemu-img convert -p -n -f qcow2 -O raw -T none -W /var/tmp/nsoffer-v2v.qcow2 'json:{"file.driver":"null-co","file.size":"100g"}'
    (100.00/100%)

real	2m16.965s
user	0m4.963s
sys	0m11.690s

When coping this image we cannot be faster than that.

Same test when image is on a fast ssd (Dell Express Flash PM1725b 3.2TB SFF):

# time qemu-img convert -p -n -f qcow2 -O raw -T none -W /scratch/nsoffer-v2v.qcow2 'json:{"file.driver":"null-co","file.size":"100g"}'
    (100.00/100%)

real	0m20.752s
user	0m3.458s
sys	0m3.765s

Copying image to 100G raw volume from the slow ssd:

# time qemu-img convert -p -n -f qcow2 -O raw -t none -T none -W /var/tmp/nsoffer-v2v.qcow2 /dev/f7b5c299-df2a-42bc-85d7-b60027f14e00/1faddbe2-2760-4f10-b1a5-e41f511f66f9
    (100.00/100%)

real	2m29.164s
user	0m6.574s
sys	0m18.346s

And from the fast ssd:

# time qemu-img convert -p -n -f qcow2 -O raw -t none -T none -W /scratch/nsoffer-v2v.qcow2 /dev/f7b5c299-df2a-42bc-85d7-b60027f14e00/1faddbe2-2760-4f10-b1a5-e41f511f66f9
    (100.00/100%)

real	2m6.933s
user	0m6.214s
sys	0m15.495s

In this case we are limited by the storage server.

For reference, same copy without the -W flag, from the slow ssd:

# time qemu-img convert -p -n -f qcow2 -O raw -t none -T none /var/tmp/nsoffer-v2v.qcow2 /dev/f7b5c299-df2a-42bc-85d7-b60027f14e00/1faddbe2-2760-4f10-b1a5-e41f511f66f9
    (100.00/100%)

real	4m45.186s
user	0m9.602s
sys	0m20.103s

With iSCSI we are likely to get slower copies, in particular if iscsi
multipathing is not configured.

Comment 21 Tzahi Ashkenazi 2021-03-11 17:04:03 UTC

synced with nir today 
i will run the test again on the new/stable release which contains the new qemu-img   ( ovirt-4.4.7 )
on the same host > f02-h23-000-r640.rdu2.scalelab.redhat.com 
virtual size: 100 GiB (107374182400 bytes)
disk size: 66.5 GiB
on rhev-red-01

Comment 22 RHEL Program Management 2021-04-28 11:11:22 UTC

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 23 Eyal Shenitzky 2021-08-26 12:35:09 UTC

Tzachi, any progress on this issue? is it still exist?

Comment 28 Arik 2022-03-16 13:36:44 UTC

Nicolai, ith bz 1855250 fixed on RHEL 8.5 in Nov 2021, can you please give it another short on 4.4.10?

Comment 29 Nikolai Sednev 2022-03-16 14:14:23 UTC

(In reply to Arik from comment #28)
> Nicolai, ith bz 1855250 fixed on RHEL 8.5 in Nov 2021, can you please give
> it another short on 4.4.10?

Its still pretty slow on Red Hat Enterprise Linux release 8.6 Beta (Ootpa)
ovirt-hosted-engine-setup-2.6.1-1.el8ev.noarch
ovirt-hosted-engine-ha-2.5.0-1.el8ev.noarch

I could test it over iSCSI using Silk storage. Might be its this specific storage issue, in case that you have any other alternative iSCSI storage, we can check it there to see if now the issue have been resolved.

Comment 30 Nikolai Sednev 2022-03-16 14:15:08 UTC

Created attachment 1866211 [details]
deployment over iscsi screenshot

Comment 31 Nir Soffer 2022-03-16 14:48:01 UTC

Comment 15 shows how qemu-img convert should be used to get good performance.
This is how vdsm copy images.

This bug is about copying the hosted engine vm to storage which is done *outside*
of vdsm. The code doing this is not maintained by the storage team, so so this
doe snot look like a storage bug.

Note that in oVirt 4.4.6 vdsm supports external endpoint when copying diskks:
https://github.com/oVirt/vdsm/blob/3b87510e20a45a5d5d17cd57f5fe5d475ceb2179/lib/vdsm/api/vdsm-api.yml#L8259

Hosted engine code can use SDM.copy_data API to copy the engine image image
to storage, instead of implementing the copy (and using the wrong flags).

Example usage:

# cat args.json
{
    "job_id": "job-id",
    "source": {
        "endpoint_type": "external",
        "url": "/path/to/image.qcow2",
        "create": false,
        "is_zero": true
    },
    "destination": {
        "endpoint_type": "div",
        "sd_id": "storage-domain-id",
        "img_id": "image-id",
        "vol_id": "volume-id"
    }
}

# vdsm-client -f args.json SDM copy_data

This can of course be used using the vdsm client library.

Using this API ensures that the copy is done correctly and efficiently,
and if there is an issue the storage team can support it.

Comment 32 Arik 2022-03-16 14:54:23 UTC

(In reply to Nikolai Sednev from comment #29)
> (In reply to Arik from comment #28)
> > Nicolai, ith bz 1855250 fixed on RHEL 8.5 in Nov 2021, can you please give
> > it another short on 4.4.10?
> 
> Its still pretty slow on Red Hat Enterprise Linux release 8.6 Beta (Ootpa)
> ovirt-hosted-engine-setup-2.6.1-1.el8ev.noarch
> ovirt-hosted-engine-ha-2.5.0-1.el8ev.noarch
> 
> I could test it over iSCSI using Silk storage. Might be its this specific
> storage issue, in case that you have any other alternative iSCSI storage, we
> can check it there to see if now the issue have been resolved.

Yes, please do and please also check on a different environment

Comment 33 Nikolai Sednev 2022-03-16 17:30:12 UTC

(In reply to Arik from comment #32)
> (In reply to Nikolai Sednev from comment #29)
> > (In reply to Arik from comment #28)
> > > Nicolai, ith bz 1855250 fixed on RHEL 8.5 in Nov 2021, can you please give
> > > it another short on 4.4.10?
> > 
> > Its still pretty slow on Red Hat Enterprise Linux release 8.6 Beta (Ootpa)
> > ovirt-hosted-engine-setup-2.6.1-1.el8ev.noarch
> > ovirt-hosted-engine-ha-2.5.0-1.el8ev.noarch
> > 
> > I could test it over iSCSI using Silk storage. Might be its this specific
> > storage issue, in case that you have any other alternative iSCSI storage, we
> > can check it there to see if now the issue have been resolved.
> 
> Yes, please do and please also check on a different environment

The only iSCSI I currently have is Silk, so I can't see at the moment if it improves on another environment. 

Avihai, do you have any other iSCSI capable storage, other than Silk?

Comment 37 Sandro Bonazzola 2022-04-20 06:30:33 UTC

This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.