1839598 – Cluster CPU Type "Secure AMD EPYC" results in unbootable VM OS

Bug 1839598 - Cluster CPU Type "Secure AMD EPYC" results in unbootable VM OS

Summary: Cluster CPU Type "Secure AMD EPYC" results in unbootable VM OS

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	General
Sub Component:
Version:	4.4.0.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	bugs@ovirt.org
QA Contact:	Lucie Leistnerova
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-25 00:22 UTC by Stephen Panicho
Modified:	2020-09-07 06:48 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-09-07 06:48:14 UTC
oVirt Team:	Gluster
Embargoed:
Dependent Products:
Flags:	s.panicho: needinfo-

Attachments	(Terms of Use)

Description Stephen Panicho 2020-05-25 00:22:38 UTC

Description of problem:
My default cluster type was sutomatically selected as "Secure AMD EPYC". I started a VM with an attached CentOS 8 ISO and the install completed successfully. However, when rebooting after the install, I'm met with "error: invalid arch independent ELF magic" and then drop to the grub rescue shell.

A similar issue occurs (minus the complaint about ELF magic) if I try to boot a cloud image from the ovirt-image-repository storage domain.

To resolve this, I need to go into the VM's settings, then:
System -> Advanced Parameters -> Custom CPU
and change the value from "Secure AMD EPYC" to "EPYC,+ibpb,+virt-ssbd"

Version-Release number of selected component (if applicable):
oVirt 4.4 GA

How reproducible:
Always

Steps to Reproduce:
1. Have a default Cluster CPU Type of "Secure AMD EPYC"
2. Start a VM and watch it not get past bios/grub
3. Override CPU Type to "EPYC,+ibpb,+virt-ssbd"
4. VM boots into OS as expected.

Additional info:
As far as I can tell, these are the same CPU types, just with different names.
https://github.com/oVirt/ovirt-engine/blob/ovirt-engine-4.4.0.3/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql#L430
https://github.com/oVirt/ovirt-engine/blob/ovirt-engine-4.4.0.3/packaging/dbscripts/upgrade/pre_upgrade/0000_config.sql#L460

Comment 1 Stephen Panicho 2020-05-25 17:17:28 UTC

Sorry, I believe I misdiagnosed the root cause of this issue. There's something going on with disk images on my HC Gluster domain (strangely the Hosted Engine isn't affected).

Rarely the VM will actually come up, sometimes it reports "not a bootable disk", sometimes "invalid arch independent ELF magic", sometimes "not a correct xfs inode". If I migrate the very same disk image to an NFS domain, it boots every time.

Comment 2 Michal Skrivanek 2020-06-08 12:06:47 UTC

ok. well, we will keep it open for a while in case you have some additional information, otherwise I don't see anything to do here, assuming the AMD thing mentioned in the original comment is not really a problem

Comment 3 RHEL Program Management 2020-06-08 12:06:54 UTC

The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 4 Gobinda Das 2020-08-25 10:22:00 UTC

What version of gluster used and are you using centos based deployment or ovirt-ng-node?

Comment 5 Stephen Panicho 2020-09-04 02:25:47 UTC

(In reply to Gobinda Das from comment #4)
> What version of gluster used and are you using centos based deployment or
> ovirt-ng-node?

This was on a CentOS 8.1 machine running oVirt 4.4.0, which I believe uses Gluster 7. It's whatever was pulled in by vdsm-gluster at the time. Unfortunately, I have moved to a different setup since encountering this issue so I'm unable to provide further details.

Comment 6 Gobinda Das 2020-09-07 06:48:14 UTC

We had issue earlier and fixed in gluster-7.7 and greater.
Please upgrade gluster to 7.7 or higher and try.
Ref patch: https://github.com/gluster/glusterfs/issues/1243 
For now I am closing this bug, please feel free to reopen after upgrade if you hit any issue.

Note You need to log in before you can comment on or make changes to this bug.