Red Hat Bugzilla – Bug 1274065
[RFE] Hosted-Engine: Use a qcow2 image for the appliance
Last modified: 2018-02-05 05:56:53 EST
Description of problem:
When using the appliance flow in HE setup, the extraction is taking a large portion of the setup time.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Use appliance flow in
Extraction process takes long
Extraction process is quick
Raising the priority, because the flow could be nice, but is really slowed down due to this bug.
Ryan, could you quantify the slow down compared to the whole installation duration?
(In reply to Fabian Deutsch from comment #1)
> Raising the priority, because the flow could be nice, but is really slowed
> down due to this bug.
> Ryan, could you quantify the slow down compared to the whole installation
It's not really a bug, it's an RFE: it could be faster but it's working.
The issue is about python not efficiently handling sparse files so the real gain depends just from the image sparseness.
Agreed, it's an RFE.
Are we running virt-sparsify and virt-sysprep on the image before packing it?
The files should be sparse, but as Simone says: Python does not handle those efficiently when extracting tars.
Simone, the new installation flow in 4.2 makes a difference here?
(In reply to Yaniv Kaul from comment #6)
> Simone, the new installation flow in 4.2 makes a difference here?
We are using system tar via ansible with --sparse option:
On my test system with a 7200 rpm disk is taking about 50 seconds to extract a 2.4 qcow2 sparse image from a 800M ova file:
2017-11-16 12:14:51,291+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.v2_playbook_on_task_start:164 TASK [Extract appliance to local vm dir]
2017-11-16 12:15:41,756+0100 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.v2_runner_on_ok:120 changed: [localhost]
[root@c74he20171031h1 ~]# ls -lh /usr/share/ovirt-engine-appliance/ovirt-engine-appliance-4.2-20171114.1.el7.centos.ova
-rw-r--r--. 1 root root 808M 14 nov 14.59 /usr/share/ovirt-engine-appliance/ovirt-engine-appliance-4.2-20171114.1.el7.centos.ova
[root@c74he20171031h1 ~]# du -h /var/tmp/localvm/images/d1746233-6cf3-42b8-9efb-b481954a8d3f/d7cdd432-9bc6-47f6-a6d8-73340e59647b
[root@c74he20171031h1 ~]# file /var/tmp/localvm/images/d1746233-6cf3-42b8-9efb-b481954a8d3f/d7cdd432-9bc6-47f6-a6d8-73340e59647b
/var/tmp/localvm/images/d1746233-6cf3-42b8-9efb-b481954a8d3f/d7cdd432-9bc6-47f6-a6d8-73340e59647b: QEMU QCOW Image (v3), 53687091200 bytes
If we want to improve, I think we should evaluate avoiding the RPM(OVA(QCOW2)) packaging and just ship the qcow2 disk in an rpm file using it in place with a snapshot to revert on issues.
Alternatively we should use the RHEL cloud image, and virt-customize it on the spot with latest Engine, etc.
Should take a lot more time, but would reduce the initial size and ensure we use the latest-greatest.
85710 is a patch to package the appliance also as a qcow2 image.
Keeping current bug for using this image.
Based on some internal discussions, this is the current idea:
1. Package the appliance (also) as a qcow2 image, to be extracted directly to the local disk, so that one can start a libvirt vm from it immediately, without further extraction/copying. This is bug 1528987.
2. Use this image in hosted-engine setup. Currently we'll only do this in "node-zero". The local engine vm will run from the qcow2 image (probably with a snapshot, so that we can easily revert). Still TBD how to create the final engine vm disk image, perhaps some variation(s) on qemu-img copying.
On a test deploy (on a vm on my laptop, with SSD), "Extract appliance to local vm dir" took 55 seconds, which is the time I hoped to save. Installing "ovirt-engine-appliance-qcow2-4.2-20171226.1.el7.centos.noarch.rpm", generated by jenkins for 85710, took 74 seconds. Installing the regular appliance took 9 seconds. So not sure we are saving much by this.
The patch is for the appliance, removing from here. Also moving to NEW, this might be a simple case of premature optimization - need to measure first on various different machines and see if we manage to find a flow with a significant saving, otherwise close the bug.