Bug 1622240 - "Unknown CPU model Broadwell-IBRS-SSBD"
Summary: "Unknown CPU model Broadwell-IBRS-SSBD"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.2.24
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.2.7
: ---
Assignee: Simone Tiraboschi
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks: 1629888
TreeView+ depends on / blocked
 
Reported: 2018-08-24 20:50 UTC by me
Modified: 2018-11-02 14:31 UTC (History)
5 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.27-1.el7ev.noarch.rpm
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-02 14:31:20 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
hosted engine setup log2 (539.92 KB, text/plain)
2018-08-25 06:54 UTC, me
no flags Details
vdsm.log (4.96 MB, text/plain)
2018-08-27 16:26 UTC, me
no flags Details
agent.log (3.90 MB, text/plain)
2018-08-27 21:00 UTC, me
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1542531 0 unspecified CLOSED Allow ServerCPUList and ClusterEmulatedMachines options to be visible over RESTAPI 2021-02-22 00:41:40 UTC
oVirt gerrit 93965 0 master MERGED virt: get cpu model and machine type from engine 2021-02-09 06:34:07 UTC
oVirt gerrit 93968 0 master MERGED ovf: wait for OVF_STORE content 2021-02-09 06:34:07 UTC
oVirt gerrit 93984 0 ovirt-hosted-engine-setup-2.2 MERGED virt: get cpu model and machine type from engine 2021-02-09 06:34:06 UTC
oVirt gerrit 93985 0 ovirt-hosted-engine-setup-2.2 MERGED ovf: wait for OVF_STORE content 2021-02-09 06:34:07 UTC

Internal Links: 1542531

Description me 2018-08-24 20:50:56 UTC
Created attachment 1478642 [details]
old log

Description of problem:
Trying to install hosted-engine. It proceeds to near the end of the Ansible script,
then pauses at:
"TASK [Check engine VM health]"
then eventually errors:
"Engine VM is not running, please check vdsm logs". 
The VDSM logs have error:
"libvirtError: internal error: Unknown CPU model Broadwell-IBRS-SSBD"

Also, in the oVirt Node webui, the HostedEngine VM is created, but not started. If I
press Run, the following error is shown:

VM START action failed
error: Failed to start domain HostedEngine error: internal error: Unknown CPU model
Broadwell-IBRS-SSBD

I notice a Trello card was created to add similar Skylake & EPYC CPUs, but not
Broadwell:
https://trello.com/c/5nGTgQ9P/89-new-ibrs-and-ssbd-cpu-types#

Does the Broadwell-IBRS-SSBD CPU need to be added to this supported types list?


Version-Release number of selected component (if applicable):
oVirt v4.2.6 second release candidate.  I use oVirt Node.


Steps to Reproduce:
1. On a computer with an Intel i7-6800K CPU, perhaps any Broadwell, probably any Broadwell-E (6800K, 6850K, 6900K, and 6950X), try to install hosted-engine using command: hosted-engine --deploy
2. Choose any options during wizard

Actual results:
Error: "Engine VM is not running, please check vdsm logs"

Expected results:
hosted-engine is installed & started

Additional info:
Pls see attached hosted-engine setup log.

Comment 1 me 2018-08-25 06:54:46 UTC
Created attachment 1478710 [details]
hosted engine setup log2

Comment 2 Michal Skrivanek 2018-08-25 07:36:33 UTC
This is a problem in create_target_vm.yml in the he-setup code. The Cluster CPU Family name is not the CPU type in VM configuration and cannot be derived from that by the simple removal of "Intel" and "Family" strings. It works for the simple types ("Intel Nehalem Family" -> "Nehalem") but it won't work for the recent ones like any of the ssbd types. Also for ibrs it's actually using a wrong one and it will cause problems in future with live migration

Comment 3 Yaniv Lavi 2018-08-27 07:36:53 UTC
(In reply to Michal Skrivanek from comment #2)
> This is a problem in create_target_vm.yml in the he-setup code. The Cluster
> CPU Family name is not the CPU type in VM configuration and cannot be
> derived from that by the simple removal of "Intel" and "Family" strings. It
> works for the simple types ("Intel Nehalem Family" -> "Nehalem") but it
> won't work for the recent ones like any of the ssbd types. Also for ibrs
> it's actually using a wrong one and it will cause problems in future with
> live migration

Can we change the name in engine to be according to the same standard as the rest of the CPU family, so this parse would work on that name?

Comment 4 Michal Skrivanek 2018-08-27 09:31:27 UTC
(In reply to Yaniv Lavi from comment #3)
> Can we change the name in engine to be according to the same standard as the
> rest of the CPU family, so this parse would work on that name?

Those are two different things. It's not a matter of simple string substitutions, it's a configuration. The CPU Family name is a user-facing name, the other is technical detail of libvirt configuration of VM. HE code should have never used one as the other, it needs to use the same configuration to map one to the other if needed.

Comment 5 Simone Tiraboschi 2018-08-27 16:13:59 UTC
I posted a patch to correctly fetch the short model string via rest API (see: 1542531) but now I expect that to be just almost a placeholder since ovirt-ha-agent at that stage should already consume the XML for libvirt found in the OVF store.
Maybe there is also a race condition starting the engine VM on ovirt-ha-agent side.

Comment 6 Simone Tiraboschi 2018-08-27 16:16:44 UTC
Can you please attach also vdsm and ovirt-ha-agent logs?

Comment 7 me 2018-08-27 16:26:36 UTC
Created attachment 1479014 [details]
vdsm.log

Comment 8 me 2018-08-27 16:29:05 UTC
Pls see attached vdsm.log.  I can't find a ovirt-ha-agent.log.  The only files named 'ovirt-ha-agent*' are executables.  Do you know where I can find ovirt-ha-agent logs?

Comment 9 Simone Tiraboschi 2018-08-27 20:38:17 UTC
(In reply to me from comment #8)
> Do you know where I can find
> ovirt-ha-agent logs?

Thanks, /var/log/ovirt-hosted-engine-ha/*.log

Comment 10 me 2018-08-27 21:00:55 UTC
Created attachment 1479060 [details]
agent.log

Comment 11 Simone Tiraboschi 2018-08-28 07:27:43 UTC
(In reply to me from comment #10)
> Created attachment 1479060 [details]
> agent.log

Thanks,
the real issue comes from here:

MainThread::ERROR::2018-08-27 05:16:06,170::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs

The bootstrap local engine VM has been stopped before having the time to fill the OVF_STORE archive.
We definitively have a race condition there.

Comment 12 me 2018-09-16 17:29:17 UTC
I'm happy to help test a fix for this using my system, which has a Broadwell CPU, if you don't have one available.  Are there any instructions for how to build a bug-fix branch?  Many thanks

Comment 13 Simone Tiraboschi 2018-09-17 07:04:24 UTC
(In reply to me from comment #12)
> I'm happy to help test a fix for this using my system, which has a Broadwell
> CPU, if you don't have one available.  Are there any instructions for how to
> build a bug-fix branch?  Many thanks

autoreconf -ivf
./configure
make dist
rpmbuild -tb *.tar

Otherwise we have also a repo with unreleased nightly built rpms:
https://resources.ovirt.org/pub/yum-repo/ovirt-release42-snapshot.rpm

Comment 14 Sandro Bonazzola 2018-09-24 07:22:48 UTC
Any update about the testing of this?

Comment 15 me 2018-09-24 09:17:29 UTC
This bug is fixed.  You can close.

Still can't deploy hosted engine because of a different error, but didn't have time to collate the details into a new bug ticket, prob won't have time for a week or so.

Comment 16 meital avital 2018-10-08 07:35:03 UTC
Moving to verified according to comment 15

Comment 17 Sandro Bonazzola 2018-11-02 14:31:20 UTC
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.