Bug 1813028

Summary: VM OVA is created with empty disk if loop device doesn't exist in the host during the export
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: vdsmAssignee: Steven Rosenberg <srosenbe>
Status: CLOSED ERRATA QA Contact: Nisim Simsolo <nsimsolo>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.8CC: ahadas, gianluca.cecchi, hhaberma, jortialc, kemyers, kmashalk, lsurette, michal.skrivanek, mtessun, nsimsolo, rdlugyhe, srevivo, srosenbe, thomas, ycui
Target Milestone: ovirt-4.4.1Flags: lsvaty: testing_plan_complete-
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, if you exported a virtual machine (VM) as an Open Virtual Appliance (OVA) file from a host that was missing a loop device, and imported the OVA elsewhere, the resulting VM had an empty disk (no OS) and could not run. This was caused by a timing and permissions issue related to the missing loop device. The current release fixes the timing and permission issues. As a result, the VM to OVA export includes the guest OS. Now, when you create a VM from the OVA, the VM can run.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-04 13:27:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Export OVA with 4 disks as seen from the Import Dialog
none
Imported OVA with 4 disks and OS none

Description nijin ashok 2020-03-12 18:33:38 UTC
Description of problem:

It takes a few milliseconds for the udev to set the correct permission for the loop device when it's created for the first time. Initially, the loop device will be created with root:root ownership with permission 600 and after that the udev rule is triggered to change this ownership to root:disk and 660.

====
[root@dhcp0-2 ~]# losetup --find --show /var/tmp/test.out;ls -l /dev/loop0;sleep 1;ls -l /dev/loop0
/dev/loop0
brw-------. 1 root root 7, 0 Mar 12 23:22 /dev/loop0
brw-rw----. 1 root disk 7, 0 Mar 12 23:22 /dev/loop0

From the debug output of udev.

Mar 12 23:22:52 dhcp0-2.ansirhv.redhat.com systemd-udevd[3406]: handling device node '/dev/loop0', devnum=b7:0, mode=0660, uid=0, gid=6
Mar 12 23:22:52 dhcp0-2.ansirhv.redhat.com systemd-udevd[3406]: set permissions /dev/loop0, 060660, uid=0, gid=6
====

In the pack_ova.py script, we are immediately executing the "qemu-img convert" after losetup and the "qemu-img convert" will be executed before the udev triggers the permission.

====
     46         output = check_output(['losetup', '--find', '--show', '-o', offset,
     47                                ova_path])
     48         loop = output.splitlines()[0]
     49         loop_stat = os.stat(loop)
     50         vdsm_user = pwd.getpwnam('vdsm')
     51         os.chown(loop, vdsm_user.pw_uid, vdsm_user.pw_gid)
     52         try:
     53             qemu_cmd = ("qemu-img convert -T none -O qcow2 '%s' '%s'"
     54                         % (path, loop))
     55             call(['su', '-p', '-c', qemu_cmd, 'vdsm'])
====

Although the script sets the 36:36 permission, the group will not be having enough permission at the time of convert because udev might not be triggered by that time.

So the exporting VM as OVA fails with the error below (It's a VM with 4 disks)

====
skipping disk: path=/rhev/data-center/mnt/blockSD/ed2f3f39-fb26-41d6-bbbc-ddb14aa5f6d5/images/1d603271-d078-4ceb-b740-2089998aa0ac/8e568a87-f90a-46af-8b37-3e2e0d7ae02a size=1074135040
skipping disk: path=/rhev/data-center/mnt/blockSD/ed2f3f39-fb26-41d6-bbbc-ddb14aa5f6d5/images/8ec686ae-4ecb-4366-9a39-b2cd687db2eb/a3d1d40a-6877-4f35-a4a8-34bc1bd69ba6 size=1074135040
skipping disk: path=/rhev/data-center/mnt/blockSD/ed2f3f39-fb26-41d6-bbbc-ddb14aa5f6d5/images/a141ad6e-0abb-45fa-9d32-2401183dbc0f/28c50a3c-9829-4fba-8a62-885ee1b81e22 size=1074135040
skipping disk: path=/rhev/data-center/mnt/blockSD/ed2f3f39-fb26-41d6-bbbc-ddb14aa5f6d5/images/4c9ddfbc-aded-4f89-a1b0-a9251164a089/ff6e995d-733f-400d-bd9d-c32d79c18beb size=1074135040
converting disk: /rhev/data-center/mnt/blockSD/ed2f3f39-fb26-41d6-bbbc-ddb14aa5f6d5/images/4c9ddfbc-aded-4f89-a1b0-a9251164a089/ff6e995d-733f-400d-bd9d-c32d79c18beb, offset 3222422016
qemu-img: /dev/loop0: error while converting qcow2: Could not open device: Permission denied
converting disk: /rhev/data-center/mnt/blockSD/ed2f3f39-fb26-41d6-bbbc-ddb14aa5f6d5/images/1d603271-d078-4ceb-b740-2089998aa0ac/8e568a87-f90a-46af-8b37-3e2e0d7ae02a, offset 15360
converting disk: /rhev/data-center/mnt/blockSD/ed2f3f39-fb26-41d6-bbbc-ddb14aa5f6d5/images/8ec686ae-4ecb-4366-9a39-b2cd687db2eb/a3d1d40a-6877-4f35-a4a8-34bc1bd69ba6, offset 1074150912
qemu-img: /dev/loop1: error while converting qcow2: Could not open device: Permission denied
converting disk: /rhev/data-center/mnt/blockSD/ed2f3f39-fb26-41d6-bbbc-ddb14aa5f6d5/images/a141ad6e-0abb-45fa-9d32-2401183dbc0f/28c50a3c-9829-4fba-8a62-885ee1b81e22, offset 2148286464
=====

As seen above two disk conversion failed with "permission denied" error as those conversions happened when loop0 and loop1 were just created. The others worked because they used the already created loop0.

I think the pack_ova.py should explicitly set permission as 660 instead of waiting for the udev to set the permission.

Also, the worst part is the engine reports the whole operation as successful and OVA is packed with those empty disks. The user only realizes about the empty disk when they upload the OVA to another environment. The reported customer also observed when they imported the OVA to the destination environment where they see an empty uninitialized disk in the Windows VM.


Version-Release number of selected component (if applicable):

ovirt-engine-tools-4.3.8.2-0.4.el7.noarch
rhvm-4.3.8.2-0.4.el7.noarch

How reproducible:

100%

Steps to Reproduce:
1. Make sure that there is no /dev/loop devices in the host. The reboot of the host will clear the loop devices.
2. Try to export VM as OVA and check the disk in the exported OVA. It will be an empty disk.
3.Check the ovirt-export-ova-ansible log under "/var/log/ovirt-engine/ova/" and we will see "permission denied" error message.

Actual results:

VM OVA is created with empty disk if loop device doesn't exist in the host during the export

Expected results:

Export of VM to OVA should work.

Additional info:

Comment 1 Ryan Barry 2020-03-13 01:49:17 UTC
Instead of hardcoding the permissions, let's poll the created device

Comment 2 Steven Rosenberg 2020-03-25 13:48:36 UTC
I did not simulate this issue with the current master branch ovirt 4.4.x. There was an issue with ansible runner that blocked this issue and will be fixed in ansible-runner 1.4.5, to be released in the next EL8 ovirt-master-release repo. Fedora may take longer.

With this fix, I tested this on Hosts without the /dev/loop0 existing before testing including without disks, with 1 disk and with 4 disks. I also performed a clean install and performed an export ova on a VM with 4 disks, which was successful. The /etc/loop0 file did not exist before the export, only loop-control, after exporting, this loop0 file exists:


ls /dev/loop
loop0         loop-control

Comment 3 Steven Rosenberg 2020-03-25 13:52:18 UTC
Created attachment 1673444 [details]
Export OVA with 4 disks as seen from the Import Dialog

From a clean installation, performed an export OVA on a VM with 4 Disks. From the Import screen one can see that there are 4 disks attached to the VM to import obtained from the ova file. Importing also succeeded with 4 disks attached to the imported VM.

Comment 4 nijin ashok 2020-03-25 14:00:10 UTC
(In reply to Steven Rosenberg from comment #3)
> Created attachment 1673444 [details]
> Export OVA with 4 disks as seen from the Import Dialog
> 
> From a clean installation, performed an export OVA on a VM with 4 Disks.
> From the Import screen one can see that there are 4 disks attached to the VM
> to import obtained from the ova file. Importing also succeeded with 4 disks
> attached to the imported VM.

The export will be successful however it will not be having any contents as the "qemu-img convert" failed. Did you import the VM and check if there is actual data? You may have to test in a VM with an OS installed and has to put some data in it.

Also, check if you are seeing "Permission denied" error in the Ansible log. I can reproduce this issue consistently.

Comment 5 nijin ashok 2020-03-25 14:05:05 UTC
(In reply to nijin ashok from comment #0)

> Also, the worst part is the engine reports the whole operation as successful
> and OVA is packed with those empty disks. The user only realizes about the
> empty disk when they upload the OVA to another environment. The reported
> customer also observed when they imported the OVA to the destination
> environment where they see an empty uninitialized disk in the Windows VM.
> 
> 
Please also check the above info.

Comment 6 Steven Rosenberg 2020-03-25 15:26:54 UTC
(In reply to nijin ashok from comment #4)
> (In reply to Steven Rosenberg from comment #3)
> > Created attachment 1673444 [details]
> > Export OVA with 4 disks as seen from the Import Dialog
> > 
> > From a clean installation, performed an export OVA on a VM with 4 Disks.
> > From the Import screen one can see that there are 4 disks attached to the VM
> > to import obtained from the ova file. Importing also succeeded with 4 disks
> > attached to the imported VM.
> 
> The export will be successful however it will not be having any contents as
> the "qemu-img convert" failed. Did you import the VM and check if there is
> actual data? You may have to test in a VM with an OS installed and has to
> put some data in it.
> 
> Also, check if you are seeing "Permission denied" error in the Ansible log.
> I can reproduce this issue consistently.

Please provide your logs so that we can synchronize our testing. Also, please provide how you verified the contents are empty, such as just "qemu-img convert" or on importing, please provide more details.

Comment 7 Ryan Barry 2020-03-25 15:34:15 UTC
Arguably, just extract the OVA (it's a tarball) or upload elsewhere

Comment 8 nijin ashok 2020-03-25 16:00:13 UTC
(In reply to Steven Rosenberg from comment #6)

> Please provide your logs so that we can synchronize our testing. Also,
> please provide how you verified the contents are empty, such as just
> "qemu-img convert" or on importing, please provide more details.

I extracted the OVA, did an hexdump to see if my partition table and filesystem header is available. Or you can simply attach the disk to any VM and check if it's having data.

Also please check if you see "permission denied" error in your "/var/log/ovirt-engine/ova/ovirt-export-ova-*" logs. If you see it there, then you have reproduced the issue.

Comment 9 Steven Rosenberg 2020-03-25 17:03:40 UTC
Created attachment 1673581 [details]
Imported OVA with 4 disks and OS

I was able to perform an import of a VM with 4 disks as per the screenshot which also shows the sanity file I created on the original VM before exporting. 

I also do not have any ovirt-export-ova-* files with "permission denied" in them.


Maybe more information on how to simulate the issue may help.

Comment 10 Steven Rosenberg 2020-03-25 17:04:16 UTC
(In reply to Steven Rosenberg from comment #9)
> Created attachment 1673581 [details]
> Imported OVA with 4 disks and OS
> 
> I was able to perform an import of a VM with 4 disks as per the screenshot
> which also shows the sanity file I created on the original VM before
> exporting. 
> 
> I also do not have any ovirt-export-ova-* files with "permission denied" in
> them.
> 
> 
> Maybe more information on how to simulate the issue may help.

Comment 12 Michal Skrivanek 2020-04-25 17:17:40 UTC
anything more to do?

Comment 13 Steven Rosenberg 2020-04-30 07:32:57 UTC
(In reply to Michal Skrivanek from comment #12)
> anything more to do?

No. Thank you.

Comment 14 Nisim Simsolo 2020-06-01 11:19:55 UTC
Verified:
ovirt-engine-4.4.1.1-0.5.el8ev
vdsm-4.40.17-1.el8ev.x86_64
libvirt-daemon-6.0.0-22.module+el8.2.1+6815+1c792dc8.x86_64
qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64

Verification scenario:
1. Make sure that there is no /dev/loop devices in the host. The reboot of the host will clear the loop devices.
2. Export VM as OVA.
Verify VM exported as OVA successfully with disk.
3. Import VM from that OVA.
Verify VM imported, Run VM and verify VM is running successfully.
4. Export VM as OVA when rhere is /dev/loop devices in the host.
Verify VM exported successfully with disk.
5. Import VM from that OVA.
Verify VM imported, Run VM and verify VM is running successfully.

Comment 16 Rolfe Dlugy-Hegwer 2020-06-24 18:34:40 UTC
VM OVA is created with empty disk if loop device doesn't exist in the host during the export 

Cause: When exporting a VM with at least one boot disk with a Guest OS installed to an OVA file for the first time on a newly installed host when the /dev/loop0 device does not initially exist, the export fails to write the boot disk contents. The failure is due to invalid permissions because the udev device has not yet completed. This results in an empty boot disk that fails to contain any contents including the installed Guest OS. 

Consequence: When attempting to use the OVA file such as by importing the OVA file back to the engine, the imported VM will not have a guest OS installed and will not run without reinstalling a guest OS.

Fix: In order to give time for the udev program to set the permissions on the /dev/loop0 device, we call the "udevadm settle" command which shall return when the udev has completed before setting the /dev/loop0 user and group permissions. This ensures that the processing is synchronized and that permissions will be correct when the image conversion program is launched.  

Result: The export OVA succeeds in exporting the VM with the Guest OS intact.

Comment 20 Gianluca Cecchi 2020-07-30 14:36:19 UTC
any expected backport to 4.3? I think this is a major problem

Comment 21 Nir Soffer 2020-08-01 13:19:15 UTC
*** Bug 1862115 has been marked as a duplicate of this bug. ***

Comment 22 thomas 2020-08-01 13:54:42 UTC
(In reply to Gianluca Cecchi from comment #20)
> any expected backport to 4.3? I think this is a major problem

Especially in the context of a 4.3 -> 4.4 migration

Comment 27 errata-xmlrpc 2020-08-04 13:27:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246

Comment 28 Arik 2020-08-04 20:55:50 UTC
(In reply to Gianluca Cecchi from comment #20)
> any expected backport to 4.3? I think this is a major problem

But with a really simple fix I suppose -
Considering that there is no plan to release another version of 4.3 and that it's already fixed in 4.4, can't you just add the single line that was introduced in [1] to [2] (on the engine's host)?

[1] https://gerrit.ovirt.org/#/c/108115/6/packaging/ansible-runner-service-project/project/roles/ovirt-ova-pack/files/pack_ova.py
[2] ./usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-ova-pack/files/pack_ova.py

Comment 29 thomas 2020-08-04 21:51:33 UTC
I've added the line, and I'll test it for sure.

But your comment implies much more:
So far my impression was, that oVirt is more of a CentOS than a Fedora. From what I understand, CentOS 7 has years of bugfixes left, while CentOS 8 is slowly starting to be usable.

I'm not deploying in the cloud, but in a lab with recycled or very cheap hardware, just where oVirt belongs, not the hottest iron used for production. I want flexibility, easy demoing, "LEGO" or "Erector Kit" not 99% production utilization: For that I'd need to buy support, go commercial.

Without such a fundamental thing as VM export, I'd consider oVirt 4.3 is seriously broken now. Sure, this one is easy to fix, one line to add (and test).

But the mere fact, that Redhat is willing to leave it in that state, terminates the usability of CentOS7 four years ahead of the end of maintenance, without any ability to do a step-by-step migration where an infrastructure designed for high-availability retains that high-availability while doing a migration.

Also without nested oVirt, that's the ability to run oVirt 4.4 on oVirt 4.3. to do assurance tests, the migration requires a full tear-down/rebuild or dual infrastructure.

I understand the amount of coding effort that has to be invested for something that won't be needed afterwards, but I don't think that his is how VMware would do it.

Ok this is oVirt, free software without support. But if this is the approach Redhat will also take with RHEV, I wonder what that will do to the market share of vSphere's strongest competitor.

I can't ask you to change your minds without offering money.
But I'd say you either need to better communicate that oVirt is a toy not usable for anything serious.... or match CentOS 7 in terms of life-time.

Of course, this is my very own personal opinion. But currently I'd never dare to recommend to my employer to replace their current vSphere farm with RHEV... As much as I originally liked the idea and invested more than a year of personal work into that vision so far.

Comment 30 thomas 2020-08-04 21:59:58 UTC
This is a "red wine after hours" comment, so please don't spill your morning coffee reading it:

When you put "POWERFUL OPEN SOURCE VIRTUALIZATION oVirt is a free open-source virtualization solution for your entire enterprise" on your home page and then argue that "Autopilot" doesn't mean you're allowed to relax your attention from full driving, you are committing a Musk mistake of overselling beyond what you'll be made liable for.

Comment 31 thomas 2020-08-04 23:53:12 UTC
Done the OVA export on v.4.3 adding the one line from the patch.
100GB logical size, 23GB actually allocated: I haven't yet imported the VM on the other side, but I am rather confident it will perform as expected.

One line of code, but a commitment to CentOS 7's life cycle really at stake: Whither does Redhat go?