Bug 1871694

Summary: HostedEngine VM is broken after Cluster changed to UEFI
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Arik <ahadas>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.1CC: ahadas, aperotti, bcholler, dfodor, emarcus, gchakkar, lsvaty, mavital, michal.skrivanek
Target Milestone: ovirt-4.4.3Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.3.5 Doc Type: Bug Fix
Doc Text:
Previously, changing a cluster's bios type to UEFI or UEFI+SecureBoot changed the Self-Hosted Engine Virtual Machine that runs within the cluster as well. As a result, the Self-Hosted Engine Virtual Machine failed to reboot upon restart. In this release, the Self-Hosted Engine Virtual Machine is configured with a custom bios type, and does not change if the cluster's bios type changes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-24 13:10:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2020-08-24 04:24:27 UTC
Description of problem:

In 4.4, Q35+UEFI is supported and the procedure to change the cluster to use this is described in [1].

However, if the user does this on a cluster containing the HostedEngine VM, the HE VM is converted as well, and this breaks the OS installation.
The VM will fail to restart.

Not only the upgrade does not handle the HostedEngine specifically by setting a custom machine/bios type, but these values are also locked in the UI
for the user to manually fix it.

Version-Release number of selected component (if applicable):
rhvm-4.4.1.10-0.1.el8ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Change 4.4 cluster containing HE VM to UEFI
2. Check /var/run/ovirt-hosted-engine-ha/vm.conf on the hypervisor
  <os>
    <type arch="x86_64" machine="pc-q35-rhel8.2.0">hvm</type>
    <smbios mode="sysinfo"/>
    <loader readonly="yes" secure="no" type="pflash">/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
    <nvram template="/usr/share/OVMF/OVMF_VARS.fd">/var/lib/libvirt/qemu/nvram/77ab345a-7745-40c6-8662-379abefdb560.fd</nvram>
    <bios useserial="yes"/>
  </os>


Actual results:
HE VM is set to Q35+UEFI, fails to boot due to UEFI

Expected results:
HE VM is set with custom machine type and bios if the cluster settings are changed, or block the user from doing the change on the HE cluster.

Additional info:
[1] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/administration_guide/index#About_UEFI_Q35-cluster_opt_settings

Comment 2 Germano Veit Michel 2020-08-24 04:40:24 UTC
And the docs contain:

~~~
Warning

If the virtual machine’s BIOS type is set to Cluster default, changing the BIOS type of the cluster changes the BIOS type of the virtual machine. If the virtual machine has an operating system installed, changing the cluster BIOS type can cause booting the virtual machine to fail. 
~~~

But also:

~~~
You can configure a cluster’s default BIOS type, which determines the default BIOS type of any new virtual machines you create in that cluster. If necessary, you can override the cluster’s default BIOS type by specifying a different BIOS type when you create a virtual machine. 
~~~

So, what is the expected behaviour?
a) Applies only to new VMs (assuming this set custom_* for existing VMs)
b) Convert everything

In addition, vm_devices is wrong after changing from BIOS to UEFI, similar to BZ1829691. There is a mix of Q35 and i440fx stuff (USB and IDE controllers + cdrom on wrong bus).

Comment 3 Arik 2020-08-25 07:39:44 UTC
(In reply to Germano Veit Michel from comment #2)
> So, what is the expected behaviour?
> a) Applies only to new VMs (assuming this set custom_* for existing VMs)
> b) Convert everything

The expected behaviour is (b) - existing VMs that are not defined with custom bios type should also change

> 
> In addition, vm_devices is wrong after changing from BIOS to UEFI, similar
> to BZ1829691. There is a mix of Q35 and i440fx stuff (USB and IDE
> controllers + cdrom on wrong bus).

This is supposed to be fixed by https://gerrit.ovirt.org/c/109937 and https://gerrit.ovirt.org/c/110438
(although their corresponding bugs talk about import-VM)

Comment 4 Germano Veit Michel 2020-08-25 22:48:04 UTC
(In reply to Arik from comment #3)
> (In reply to Germano Veit Michel from comment #2)
> > So, what is the expected behaviour?
> > a) Applies only to new VMs (assuming this set custom_* for existing VMs)
> > b) Convert everything
> 
> The expected behaviour is (b) - existing VMs that are not defined with
> custom bios type should also change
> 
> > 
> > In addition, vm_devices is wrong after changing from BIOS to UEFI, similar
> > to BZ1829691. There is a mix of Q35 and i440fx stuff (USB and IDE
> > controllers + cdrom on wrong bus).
> 
> This is supposed to be fixed by https://gerrit.ovirt.org/c/109937 and
> https://gerrit.ovirt.org/c/110438
> (although their corresponding bugs talk about import-VM)

Nice, I think you can close BZ1829691 with these patches as well.

And IIUC here all what is needed is to ignore the HostedEngine VM when changing the cluster setting.

Comment 10 Arik 2020-09-08 08:09:00 UTC
Regardless of whether the changes mentioned in comment 3 would actually fix it or not, it makes sense to me to prevent the bios type of the hosted engine VM from changing upon changes at the cluster-level as suggested in comment 4.

We can do that by setting a custom bios type to the hosted-engine VM :
(1) Explicitly set the bios type of the hosted-engine VM to q35+bios (rather than defaulting to cluster-default) on hosted-engine setup.
(2) Introduce an upgrade script that sets a custom bios type of the hosted-engine VM according to the bios type that the VM is set with for existing setups.

Comment 12 Nikolai Sednev 2020-10-01 14:17:27 UTC
Please fill in the "Fixed In Version" field.

Comment 13 Nikolai Sednev 2020-10-12 14:09:39 UTC
emulatedMachine=pc-q35-rhel8.2.0 stays the same for HE in both cases, when cluster's bios being changed to UEFI or UEFI+SecureBoot.
Tested after changing, shutting down the engine's VM and then starting it back under global maintenance and normal ha-environment (maintenance set to none).

Tested on:
ovirt-engine-setup-4.4.3.6-0.13.el8ev.noarch
rhvm-4.4.3.6-0.13.el8ev.noarch
ovirt-hosted-engine-setup-2.4.7-2.el8ev.noarch
ovirt-hosted-engine-ha-2.4.5-1.el8ev.noarch
Linux 4.18.0-240.el8.x86_64 #1 SMP Wed Sep 23 05:13:10 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.3 (Ootpa)

Comment 14 Nikolai Sednev 2020-10-13 09:38:41 UTC
Moving qe_test_coverage to minus forth to our discussion with Beni, as this is a corner case and not simple for automation.
Will consider to reopen if will happen with high probability rate.

Comment 22 errata-xmlrpc 2020-11-24 13:10:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5179