Bug 1828888 - Copy disk task risky and not optimal
Summary: Copy disk task risky and not optimal
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-ansible-collection
Classification: oVirt
Component: hosted-engine-setup
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.4.1
: 1.1.5
Assignee: Asaf Rachmani
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-28 14:22 UTC by Sandro Bonazzola
Modified: 2020-08-18 06:29 UTC (History)
6 users (show)

Fixed In Version: ovirt-ansible-hosted-engine-setup-1.1.5
Doc Type: Enhancement
Doc Text:
This enhancement improves the reliability of copying local virtual machine disks to shared storage.
Clone Of:
Environment:
Last Closed: 2020-08-05 06:25:34 UTC
oVirt Team: Integration
Embargoed:
sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack+
sbonazzo: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-hosted-engine-setup pull 326 0 None closed storage: Use the same flags that VDSM uses for converting image 2020-10-22 09:01:25 UTC

Description Sandro Bonazzola 2020-04-28 14:22:17 UTC
Nir Soffer from storage team reviewed the command used to copy the image and shared some notes:

The copy disk task use this command:
  - name: Copy local VM disk to shared storage
    command: qemu-img convert -n -O raw {{ local_vm_disk_path }} {{ he_virtio_disk_path }}
    environment: "{{ he_cmd_lang }}"
    become: true
    become_user: vdsm
    become_method: sudo
    changed_when: true

-n skip creation of the image - this should work but it is not how vdsm copy images for file storage.
 It is best to avoid this option, ensuring that the target disk is created correctly.

-f is missing - we never let qemu probe image format

not using direct I/O. This should work but using direct I/O is recommended, especially in hyper-converged environment.

Suggestion is to change the command to:

    qemu-img convert -f qcow2 -O raw -t none -T none {{ local_vm_disk_path }} {{ he_virtio_disk_path }}

When copying to block storage vdsm uses now "-n" - this was added to avoid a bug in qemu on 8.2 that was already fixed there, so shouldn't be necessary.

Comment 1 Asaf Rachmani 2020-05-25 14:49:13 UTC
AFAIK, the volume was already created by the engine via VDSM, so at this point, we can be sure that the volume is there. Do you still think we need to avoid -n option?

Comment 2 Nir Soffer 2020-05-25 15:24:55 UTC
-n should work, since already created the volume, but -n is not
correct for all cases, and this make your code more fragile.

I suggest to use the same flags used by vdsm to ensure that we
get consistent results compatible with volumes created by vdsm.

The best solution would to add vdsm api or tool to copy images
that is part of vdsm, so vdsm developers are responsible for 
getting the same results when disks are copied externally.

Until vdsm provides this, using the same code as used by vdsm is 
the safe approach.

Comment 3 Nir Soffer 2020-06-30 11:30:09 UTC
Vdsm is going to move to -n for all qemu-img convert calls
(used only for block storage now):
https://gerrit.ovirt.org/c/109784

So I think the -n flag is the right way for hosted engine
setup, since it keeps the volume preallocation.

Volume preallocation is not correct in current version of vdsm.
The volume is always created as sparse, and we depended on 
-o preallocation=falloc argument qemu-img convert to handle
preallocation during the convert process. Due to issues on 
NFS 3, we are going to drop this, and create the volume
as preallocated (if needed) when the volume is created.

So the correct way to copy a volume would be:

    qemu-img convert -n -f {raw|qcow2} -O {raw,qcow2} -t none -T none src dst

Keeping these details synced with hosted engine setup does not
seems the right way, so we should think how to move this step
to vdsm. The simplest way would be to provide vdsm-tool copy-volume
that can be used to copy a volume from storage domain to any other
location (can be another storage domain).

Comment 4 Sandro Bonazzola 2020-06-30 11:37:15 UTC
current implementation in hosted engine setup uses:
  qemu-img convert -f qcow2 -O raw -t none -T none {{ local_vm_disk_path }} {{ he_virtio_disk_path }}
As per https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/pull/326/files
  do we need to move back this bug to assigned or is this good enough and we can track suggestion on comment #3 in a separate RFE for vdsm to add the copy-volume command to vdsm-tool?

Comment 5 Nir Soffer 2020-06-30 11:48:33 UTC
Current way is better than the old way, which did not use -t none -T none
these options are very important.

Since current vdsm does not preallocate volume correctly, using -n or not
using it you will always create sparse volume, and you cannot fix it in
hosted engine.

Once bug 1850267 is resolved we need to add -n to the command to copying
a volume will preserve the allocation.

We can open RFE for adding adding a way to copy disks from storage domain
when engine is not running.

Comment 13 SATHEESARAN 2020-07-10 07:35:01 UTC
From RHHI-V point of testing is completed.
RHV hostedengine deployment is completed with the changes in the package - ovirt-ansible-hosted-engine-setup-1.1.5-1.el8ev.noarch

Changed task looks like

<snip>
  - name: Copy local VM disk to shared storage
    command: >-
      qemu-img convert -f qcow2 -O raw -t none -T none {{ local_vm_disk_path }} {{ he_virtio_disk_path }}
    environment: "{{ he_cmd_lang }}"
    become: true
    become_user: vdsm
    become_method: sudo
    changed_when: true
</snip>

This change works good with hyperconverged and glusterfs storage domain.

Please continue with other storage domains too.

Comment 14 Nikolai Sednev 2020-07-13 10:36:00 UTC
Works for me on these components:
Software Version:4.4.1.8-0.7.el8ev
ovirt-ansible-hosted-engine-setup-1.1.5-1.el8ev.noarch
rhvm-appliance-4.4-20200604.0.el8ev.x86_64
ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch
Red Hat Enterprise Linux release 8.2 (Ootpa)
Linux 4.18.0-193.13.1.el8_2.x86_64 #1 SMP Tue Jul 7 14:03:09 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


Deployed on NFS and iSCSI successfully.

Comment 15 Nikolai Sednev 2020-07-13 11:05:56 UTC
 - name: Copy local VM disk to shared storage
    command: >-
      qemu-img convert -f qcow2 -O raw -t none -T none {{ local_vm_disk_path }} {{ he_virtio_disk_path }}
    environment: "{{ he_cmd_lang }}"
    become: true
    become_user: vdsm
    become_method: sudo
    changed_when: true

Comment 16 Nikolai Sednev 2020-07-13 14:18:42 UTC
I've noticed that copy took much longer now than it took before. The speed was about 24Mbps to flash storage over 10Gbps interfaces on both host and storage. Local disk is:
 *-disk
                description: ATA Disk
                product: ST1000NM0033-9ZM
                physical id: 0.0.0
                bus info: scsi@0:0.0.0
                logical name: /dev/sda
                version: GA06
                serial: Z1W1EYKH
                size: 931GiB (1TB)
                capabilities: partitioned partitioned:dos
                configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=f8c4a2c1

https://www.disctech.com/Seagate-Constellation-ES-3-ST1000NM0033-1TB-1000GB-Enterprise-SATA-Hard-Drive

The average read speed should be above 100MBps, somewhere around 152MBps vs 24MBps, which was observed.
The minimum copy speed shouldn't been limited neither by local disk read speed, nor by the flash storage, which also was not loaded at all during the deployment.
I copied minimal required HE disk size of the engine, ~60GB to the flash storage and the copy took too much time and was very slow.

Comment 17 Nir Soffer 2020-07-13 15:14:31 UTC
(In reply to Nikolai Sednev from comment #16)
> I've noticed that copy took much longer now than it took before. 

Before we use the cached I/O, which is not recommended for oVirt
setup. This can cause timeouts in sanlock when the kernel flushes
the page cache. Current code use the same flags used by vdsm.

It is not clear how did you measure 24 MiB/s, but you can easily
test this outside of the ansible script, just run:

    time qemu-img convert -f qcow2 -O raw -t none -T none local_vm_disk_pat he_virtio_disk_path

This may be slow for NFS < 4.2 since it does not support efficient
zeroing.

This may be low on block storage if it does not have fast zeroing
due to inefficient pre-zero step (see bug 1855250).

Comment 19 Nikolai Sednev 2020-07-14 00:27:16 UTC
(In reply to Nir Soffer from comment #17)
> (In reply to Nikolai Sednev from comment #16)
> > I've noticed that copy took much longer now than it took before. 
> 
> Before we use the cached I/O, which is not recommended for oVirt
> setup. This can cause timeouts in sanlock when the kernel flushes
> the page cache. Current code use the same flags used by vdsm.
> 
> It is not clear how did you measure 24 MiB/s, but you can easily
I monitored it from the iSCSI storage side, through it's UI on specifically mine target volume.
> test this outside of the ansible script, just run:
> 
>     time qemu-img convert -f qcow2 -O raw -t none -T none local_vm_disk_pat
> he_virtio_disk_path
This is what exactly new ansible role does:
- name: Copy local VM disk to shared storage
    command: >-
      qemu-img convert -f qcow2 -O raw -t none -T none {{ local_vm_disk_path }} {{ he_virtio_disk_path }}
    environment: "{{ he_cmd_lang }}"
    become: true
    become_user: vdsm
    become_method: sudo
    changed_when: true

> 
> This may be slow for NFS < 4.2 since it does not support efficient
> zeroing.
> 
> This may be low on block storage if it does not have fast zeroing
> due to inefficient pre-zero step (see bug 1855250).
The volume was clean and fresh new, not sure if zeroing is required.

Comment 20 Sandro Bonazzola 2020-08-05 06:25:34 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.