Bug 1622240

Summary: "Unknown CPU model Broadwell-IBRS-SSBD"
Product: [oVirt] ovirt-hosted-engine-setup Reporter: me
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: high    
Version: 2.2.24CC: bugs, lsvaty, me, michal.skrivanek, ylavi
Target Milestone: ovirt-4.2.7Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-2.2.27-1.el7ev.noarch.rpm Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-02 14:31:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629888    
Attachments:
Description Flags
hosted engine setup log2
none
vdsm.log
none
agent.log none

Description me 2018-08-24 20:50:56 UTC
Created attachment 1478642 [details]
old log

Description of problem:
Trying to install hosted-engine. It proceeds to near the end of the Ansible script,
then pauses at:
"TASK [Check engine VM health]"
then eventually errors:
"Engine VM is not running, please check vdsm logs". 
The VDSM logs have error:
"libvirtError: internal error: Unknown CPU model Broadwell-IBRS-SSBD"

Also, in the oVirt Node webui, the HostedEngine VM is created, but not started. If I
press Run, the following error is shown:

VM START action failed
error: Failed to start domain HostedEngine error: internal error: Unknown CPU model
Broadwell-IBRS-SSBD

I notice a Trello card was created to add similar Skylake & EPYC CPUs, but not
Broadwell:
https://trello.com/c/5nGTgQ9P/89-new-ibrs-and-ssbd-cpu-types#

Does the Broadwell-IBRS-SSBD CPU need to be added to this supported types list?


Version-Release number of selected component (if applicable):
oVirt v4.2.6 second release candidate.  I use oVirt Node.


Steps to Reproduce:
1. On a computer with an Intel i7-6800K CPU, perhaps any Broadwell, probably any Broadwell-E (6800K, 6850K, 6900K, and 6950X), try to install hosted-engine using command: hosted-engine --deploy
2. Choose any options during wizard

Actual results:
Error: "Engine VM is not running, please check vdsm logs"

Expected results:
hosted-engine is installed & started

Additional info:
Pls see attached hosted-engine setup log.

Comment 1 me 2018-08-25 06:54:46 UTC
Created attachment 1478710 [details]
hosted engine setup log2

Comment 2 Michal Skrivanek 2018-08-25 07:36:33 UTC
This is a problem in create_target_vm.yml in the he-setup code. The Cluster CPU Family name is not the CPU type in VM configuration and cannot be derived from that by the simple removal of "Intel" and "Family" strings. It works for the simple types ("Intel Nehalem Family" -> "Nehalem") but it won't work for the recent ones like any of the ssbd types. Also for ibrs it's actually using a wrong one and it will cause problems in future with live migration

Comment 3 Yaniv Lavi 2018-08-27 07:36:53 UTC
(In reply to Michal Skrivanek from comment #2)
> This is a problem in create_target_vm.yml in the he-setup code. The Cluster
> CPU Family name is not the CPU type in VM configuration and cannot be
> derived from that by the simple removal of "Intel" and "Family" strings. It
> works for the simple types ("Intel Nehalem Family" -> "Nehalem") but it
> won't work for the recent ones like any of the ssbd types. Also for ibrs
> it's actually using a wrong one and it will cause problems in future with
> live migration

Can we change the name in engine to be according to the same standard as the rest of the CPU family, so this parse would work on that name?

Comment 4 Michal Skrivanek 2018-08-27 09:31:27 UTC
(In reply to Yaniv Lavi from comment #3)
> Can we change the name in engine to be according to the same standard as the
> rest of the CPU family, so this parse would work on that name?

Those are two different things. It's not a matter of simple string substitutions, it's a configuration. The CPU Family name is a user-facing name, the other is technical detail of libvirt configuration of VM. HE code should have never used one as the other, it needs to use the same configuration to map one to the other if needed.

Comment 5 Simone Tiraboschi 2018-08-27 16:13:59 UTC
I posted a patch to correctly fetch the short model string via rest API (see: 1542531) but now I expect that to be just almost a placeholder since ovirt-ha-agent at that stage should already consume the XML for libvirt found in the OVF store.
Maybe there is also a race condition starting the engine VM on ovirt-ha-agent side.

Comment 6 Simone Tiraboschi 2018-08-27 16:16:44 UTC
Can you please attach also vdsm and ovirt-ha-agent logs?

Comment 7 me 2018-08-27 16:26:36 UTC
Created attachment 1479014 [details]
vdsm.log

Comment 8 me 2018-08-27 16:29:05 UTC
Pls see attached vdsm.log.  I can't find a ovirt-ha-agent.log.  The only files named 'ovirt-ha-agent*' are executables.  Do you know where I can find ovirt-ha-agent logs?

Comment 9 Simone Tiraboschi 2018-08-27 20:38:17 UTC
(In reply to me from comment #8)
> Do you know where I can find
> ovirt-ha-agent logs?

Thanks, /var/log/ovirt-hosted-engine-ha/*.log

Comment 10 me 2018-08-27 21:00:55 UTC
Created attachment 1479060 [details]
agent.log

Comment 11 Simone Tiraboschi 2018-08-28 07:27:43 UTC
(In reply to me from comment #10)
> Created attachment 1479060 [details]
> agent.log

Thanks,
the real issue comes from here:

MainThread::ERROR::2018-08-27 05:16:06,170::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs

The bootstrap local engine VM has been stopped before having the time to fill the OVF_STORE archive.
We definitively have a race condition there.

Comment 12 me 2018-09-16 17:29:17 UTC
I'm happy to help test a fix for this using my system, which has a Broadwell CPU, if you don't have one available.  Are there any instructions for how to build a bug-fix branch?  Many thanks

Comment 13 Simone Tiraboschi 2018-09-17 07:04:24 UTC
(In reply to me from comment #12)
> I'm happy to help test a fix for this using my system, which has a Broadwell
> CPU, if you don't have one available.  Are there any instructions for how to
> build a bug-fix branch?  Many thanks

autoreconf -ivf
./configure
make dist
rpmbuild -tb *.tar

Otherwise we have also a repo with unreleased nightly built rpms:
https://resources.ovirt.org/pub/yum-repo/ovirt-release42-snapshot.rpm

Comment 14 Sandro Bonazzola 2018-09-24 07:22:48 UTC
Any update about the testing of this?

Comment 15 me 2018-09-24 09:17:29 UTC
This bug is fixed.  You can close.

Still can't deploy hosted engine because of a different error, but didn't have time to collate the details into a new bug ticket, prob won't have time for a week or so.

Comment 16 meital avital 2018-10-08 07:35:03 UTC
Moving to verified according to comment 15

Comment 17 Sandro Bonazzola 2018-11-02 14:31:20 UTC
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.