Bug 1927985 - [RFE] Speed up export-to-OVA on NFS by aligning loopback device offset
Summary: [RFE] Speed up export-to-OVA on NFS by aligning loopback device offset
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.4
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.5.0
: ---
Assignee: Shmuel Melamud
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On: 2021545
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-12 01:50 UTC by Germano Veit Michel
Modified: 2022-05-31 14:36 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-4.5.0
Doc Type: Enhancement
Doc Text:
With this release, Padding between files has been added for exporting a virtual machine to an Open Virtual Appliance (OVA). The goal is to align disks in the OVA to the edge of a block of the underlying filesystem. As a result,disks are written faster during export, especially with an NFS partition.
Clone Of:
Environment:
Last Closed: 2022-05-26 16:22:27 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:4711 0 None None None 2022-05-26 16:22:43 UTC
oVirt gerrit 116115 0 master MERGED core: Align disks in OVA to 4K boundary 2021-10-18 18:54:47 UTC
oVirt gerrit 117570 0 master MERGED core: Add padding in OVA in compatibility versions 4.7+ 2021-12-12 11:51:47 UTC

Description Germano Veit Michel 2021-02-12 01:50:50 UTC
Description of problem:

pack_ova.py sets a loopback device to build the OVA, writing the disk image directly at an offset on the file.

https://github.com/oVirt/ovirt-engine/blob/master/packaging/ansible-runner-service-project/project/roles/ovirt-ova-pack/files/pack_ova.py#L52

def convert_disks(ova_path):
    for path, offset in six.iteritems(path_to_offset):
        print("converting disk: %s, offset %s" % (path, offset))
        output = check_output(['losetup', '--find', '--show', '-o', offset,
                               ova_path])
        loop = from_bytes(output.splitlines()[0])
        loop_stat = os.stat(loop)
        call(['udevadm', 'settle'])
        vdsm_user = pwd.getpwnam('vdsm')
        os.chown(loop, vdsm_user.pw_uid, vdsm_user.pw_gid)
        try:
            qemu_cmd = ("qemu-img convert -p -T none -O qcow2 '%s' '%s'"
                        % (path, loop))
            check_call(['su', '-p', '-c', qemu_cmd, 'vdsm'])

This offset can cause unaligned writes, reducing performance greatly if its over NFS to another server. See:

# Write to loopback device backed on file on NFS, no offset
$ losetup --find --show /ova/loop_test.qcow2
$ time qemu-img convert -p -T none -O qcow2 /dev/640bd68d-bfde-45e6-9333-71316fc46893/210f2a85-55dd-4217-8889-39c51d3ef89e /dev/loop0
    (100.00/100%)

real	1m31.108s
user	0m5.053s
sys	0m20.775s

# Write to loopback device backed on file on NFS, with offset
$ losetup --find --show -o 14842 /ova/loop_test.qcow2
$ time qemu-img convert -p -T none -O qcow2 /dev/640bd68d-bfde-45e6-9333-71316fc46893/210f2a85-55dd-4217-8889-39c51d3ef89e /dev/loop0
    (100.00/100%)

real	18m18.530s
user	0m4.254s
sys	0m46.925s

The NFS share on this case had rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,local_lock=none

Version-Release number of selected component (if applicable):
* RHV 4.3.10 + RHEL 7.8 - 3.10.0-1127.el7.x86_64 (test above)
* RHV 4.4.4  + RHEL 8.3 - 4.18.0-240.10.1.el8_3.x86_64 (test below)

On latest RHVH 4.4.4 with RHEL 8.3 as NFS server, similar thing:
host2.kvm:/exports/nfs on /mnt/nfs type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.100.1,local_lock=none,addr=192.168.100.2)

# losetup --find --show -o 14842 /mnt/nfs/test_offset 
/dev/loop0
# losetup --find --show /mnt/nfs/test
/dev/loop1

# time qemu-img convert -O qcow2 -T none /dev/test/lv1 /dev/loop0 
real	0m3.852s
user	0m0.107s
sys	0m1.265s
# time qemu-img convert -O qcow2 -T none /dev/test/lv1 /dev/loop1

real	0m1.917s
user	0m0.135s
sys	0m0.477s

How reproducible:
Always, but on RHEL8.3 it doesn't seem to be as pronounced as on RHEL 7.8 host.

Steps to Reproduce:
1. Create lo devices backed on NFS
$ truncate -s 2G /mnt/test0
$ truncate -s 2G /mnt/test1
$ losetup --find --show -o 14842 /mnt/test0
$ losetup --find --show /mnt/test1

2. Convert
$ qemu-img convert -p -T none -O qcow2 /dev/test/lv1 /dev/loop0
$ qemu-img convert -p -T none -O qcow2 /dev/test/lv1 /dev/loop1

Actual results:
* Much slower OVA export over NFS.
* Fails if engine ansible timeout is not tuned.

Additional info:
* Over local disk this does not seem to have a big impact.
* No matter the storage backing this NFS, this is slower, even if its backed by tmpfs on NFS server

Comment 3 Liran Rotenberg 2021-02-23 17:58:11 UTC
The important part that I tested is about getting timeout:
Actual results:
* Much slower OVA export over NFS.
* Fails if engine ansible timeout is not tuned.
(from comment #0)

This didn't fail to me in around ~4 hours running while the pack_ova.py ran with:
while True:
    sleep(1000)

The engine kept the command running and failed only after I killed the process on the host.
This is because the move to run the long export OVA parts in async within the engine.

Comment 4 Arik 2021-02-23 18:35:18 UTC
(In reply to Liran Rotenberg from comment #3)
> This is because the move to run the long export OVA parts in async within
> the engine.

Yeah, that's an important thing to note - that we significantly changed the way that ansible script is executed in 4.4.4.
So it makes sense that the export-to-OVA task no longer times out.

That said, we can align the offset in order to improve the time to export the OVA to NFS.

Comment 9 Arik 2021-11-04 13:27:03 UTC
Need to consider forward compatibility

Comment 12 Arik 2022-05-03 14:28:04 UTC
Although we implemented the change, we don't notice a difference on QE environments
We can't say that it's fixed but on the other hand, it may be improved in some scenario / on some hardware
Therefore failing this bug for now and re-targeting it to 4.5.1, we'll try to investigate this a bit further by then

Comment 19 Nisim Simsolo 2022-05-23 14:04:45 UTC
Verified:
ovirt-engine-4.5.0.6-0.7.el8ev
vdsm-4.50.0.13-1.el8ev.x86_64
qemu-kvm-6.2.0-11.module+el8.6.0+14707+5aa4b42d.x86_64
libvirt-daemon-8.0.0-5.module+el8.6.0+14480+c0a3aa0f.x86_64
ansible-runner-2.1.3-1.el8ev.noarch

Comment 24 errata-xmlrpc 2022-05-26 16:22:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711


Note You need to log in before you can comment on or make changes to this bug.