1927985 – [RFE] Speed up export-to-OVA on NFS by aligning loopback device offset

Bug 1927985 - [RFE] Speed up export-to-OVA on NFS by aligning loopback device offset

Summary: [RFE] Speed up export-to-OVA on NFS by aligning loopback device offset

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.4.4
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.5.0
Target Release:	---
Assignee:	Shmuel Melamud
QA Contact:	Nisim Simsolo
Docs Contact:
URL:
Whiteboard:
Depends On:	2021545
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-12 01:50 UTC by Germano Veit Michel
Modified:	2022-05-31 14:36 UTC (History)
CC List:	6 users (show)
Fixed In Version:	ovirt-engine-4.5.0
Doc Type:	Enhancement
Doc Text:	With this release, Padding between files has been added for exporting a virtual machine to an Open Virtual Appliance (OVA). The goal is to align disks in the OVA to the edge of a block of the underlying filesystem. As a result,disks are written faster during export, especially with an NFS partition.
Clone Of:
Environment:
Last Closed:	2022-05-26 16:22:27 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2022:4711	None	None	None	2022-05-26 16:22:43 UTC
oVirt gerrit	116115	master	MERGED	core: Align disks in OVA to 4K boundary	2021-10-18 18:54:47 UTC
oVirt gerrit	117570	master	MERGED	core: Add padding in OVA in compatibility versions 4.7+	2021-12-12 11:51:47 UTC

Description Germano Veit Michel 2021-02-12 01:50:50 UTC

Description of problem:

pack_ova.py sets a loopback device to build the OVA, writing the disk image directly at an offset on the file.

https://github.com/oVirt/ovirt-engine/blob/master/packaging/ansible-runner-service-project/project/roles/ovirt-ova-pack/files/pack_ova.py#L52

def convert_disks(ova_path):
    for path, offset in six.iteritems(path_to_offset):
        print("converting disk: %s, offset %s" % (path, offset))
        output = check_output(['losetup', '--find', '--show', '-o', offset,
                               ova_path])
        loop = from_bytes(output.splitlines()[0])
        loop_stat = os.stat(loop)
        call(['udevadm', 'settle'])
        vdsm_user = pwd.getpwnam('vdsm')
        os.chown(loop, vdsm_user.pw_uid, vdsm_user.pw_gid)
        try:
            qemu_cmd = ("qemu-img convert -p -T none -O qcow2 '%s' '%s'"
                        % (path, loop))
            check_call(['su', '-p', '-c', qemu_cmd, 'vdsm'])

This offset can cause unaligned writes, reducing performance greatly if its over NFS to another server. See:

# Write to loopback device backed on file on NFS, no offset
$ losetup --find --show /ova/loop_test.qcow2
$ time qemu-img convert -p -T none -O qcow2 /dev/640bd68d-bfde-45e6-9333-71316fc46893/210f2a85-55dd-4217-8889-39c51d3ef89e /dev/loop0
    (100.00/100%)

real	1m31.108s
user	0m5.053s
sys	0m20.775s

# Write to loopback device backed on file on NFS, with offset
$ losetup --find --show -o 14842 /ova/loop_test.qcow2
$ time qemu-img convert -p -T none -O qcow2 /dev/640bd68d-bfde-45e6-9333-71316fc46893/210f2a85-55dd-4217-8889-39c51d3ef89e /dev/loop0
    (100.00/100%)

real	18m18.530s
user	0m4.254s
sys	0m46.925s

The NFS share on this case had rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,local_lock=none

Version-Release number of selected component (if applicable):
* RHV 4.3.10 + RHEL 7.8 - 3.10.0-1127.el7.x86_64 (test above)
* RHV 4.4.4  + RHEL 8.3 - 4.18.0-240.10.1.el8_3.x86_64 (test below)

On latest RHVH 4.4.4 with RHEL 8.3 as NFS server, similar thing:
host2.kvm:/exports/nfs on /mnt/nfs type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.100.1,local_lock=none,addr=192.168.100.2)

# losetup --find --show -o 14842 /mnt/nfs/test_offset 
/dev/loop0
# losetup --find --show /mnt/nfs/test
/dev/loop1

# time qemu-img convert -O qcow2 -T none /dev/test/lv1 /dev/loop0 
real	0m3.852s
user	0m0.107s
sys	0m1.265s
# time qemu-img convert -O qcow2 -T none /dev/test/lv1 /dev/loop1

real	0m1.917s
user	0m0.135s
sys	0m0.477s

How reproducible:
Always, but on RHEL8.3 it doesn't seem to be as pronounced as on RHEL 7.8 host.

Steps to Reproduce:
1. Create lo devices backed on NFS
$ truncate -s 2G /mnt/test0
$ truncate -s 2G /mnt/test1
$ losetup --find --show -o 14842 /mnt/test0
$ losetup --find --show /mnt/test1

2. Convert
$ qemu-img convert -p -T none -O qcow2 /dev/test/lv1 /dev/loop0
$ qemu-img convert -p -T none -O qcow2 /dev/test/lv1 /dev/loop1

Actual results:
* Much slower OVA export over NFS.
* Fails if engine ansible timeout is not tuned.

Additional info:
* Over local disk this does not seem to have a big impact.
* No matter the storage backing this NFS, this is slower, even if its backed by tmpfs on NFS server

Comment 3 Liran Rotenberg 2021-02-23 17:58:11 UTC

The important part that I tested is about getting timeout:
Actual results:
* Much slower OVA export over NFS.
* Fails if engine ansible timeout is not tuned.
(from comment #0)

This didn't fail to me in around ~4 hours running while the pack_ova.py ran with:
while True:
    sleep(1000)

The engine kept the command running and failed only after I killed the process on the host.
This is because the move to run the long export OVA parts in async within the engine.

Comment 4 Arik 2021-02-23 18:35:18 UTC

(In reply to Liran Rotenberg from comment #3)
> This is because the move to run the long export OVA parts in async within
> the engine.

Yeah, that's an important thing to note - that we significantly changed the way that ansible script is executed in 4.4.4.
So it makes sense that the export-to-OVA task no longer times out.

That said, we can align the offset in order to improve the time to export the OVA to NFS.

Comment 9 Arik 2021-11-04 13:27:03 UTC

Need to consider forward compatibility

Comment 12 Arik 2022-05-03 14:28:04 UTC

Although we implemented the change, we don't notice a difference on QE environments
We can't say that it's fixed but on the other hand, it may be improved in some scenario / on some hardware
Therefore failing this bug for now and re-targeting it to 4.5.1, we'll try to investigate this a bit further by then

Comment 19 Nisim Simsolo 2022-05-23 14:04:45 UTC

Verified:
ovirt-engine-4.5.0.6-0.7.el8ev
vdsm-4.50.0.13-1.el8ev.x86_64
qemu-kvm-6.2.0-11.module+el8.6.0+14707+5aa4b42d.x86_64
libvirt-daemon-8.0.0-5.module+el8.6.0+14480+c0a3aa0f.x86_64
ansible-runner-2.1.3-1.el8ev.noarch

Comment 24 errata-xmlrpc 2022-05-26 16:22:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711

Note You need to log in before you can comment on or make changes to this bug.