Created attachment 1478642 [details] old log Description of problem: Trying to install hosted-engine. It proceeds to near the end of the Ansible script, then pauses at: "TASK [Check engine VM health]" then eventually errors: "Engine VM is not running, please check vdsm logs". The VDSM logs have error: "libvirtError: internal error: Unknown CPU model Broadwell-IBRS-SSBD" Also, in the oVirt Node webui, the HostedEngine VM is created, but not started. If I press Run, the following error is shown: VM START action failed error: Failed to start domain HostedEngine error: internal error: Unknown CPU model Broadwell-IBRS-SSBD I notice a Trello card was created to add similar Skylake & EPYC CPUs, but not Broadwell: https://trello.com/c/5nGTgQ9P/89-new-ibrs-and-ssbd-cpu-types# Does the Broadwell-IBRS-SSBD CPU need to be added to this supported types list? Version-Release number of selected component (if applicable): oVirt v4.2.6 second release candidate. I use oVirt Node. Steps to Reproduce: 1. On a computer with an Intel i7-6800K CPU, perhaps any Broadwell, probably any Broadwell-E (6800K, 6850K, 6900K, and 6950X), try to install hosted-engine using command: hosted-engine --deploy 2. Choose any options during wizard Actual results: Error: "Engine VM is not running, please check vdsm logs" Expected results: hosted-engine is installed & started Additional info: Pls see attached hosted-engine setup log.
Created attachment 1478710 [details] hosted engine setup log2
This is a problem in create_target_vm.yml in the he-setup code. The Cluster CPU Family name is not the CPU type in VM configuration and cannot be derived from that by the simple removal of "Intel" and "Family" strings. It works for the simple types ("Intel Nehalem Family" -> "Nehalem") but it won't work for the recent ones like any of the ssbd types. Also for ibrs it's actually using a wrong one and it will cause problems in future with live migration
(In reply to Michal Skrivanek from comment #2) > This is a problem in create_target_vm.yml in the he-setup code. The Cluster > CPU Family name is not the CPU type in VM configuration and cannot be > derived from that by the simple removal of "Intel" and "Family" strings. It > works for the simple types ("Intel Nehalem Family" -> "Nehalem") but it > won't work for the recent ones like any of the ssbd types. Also for ibrs > it's actually using a wrong one and it will cause problems in future with > live migration Can we change the name in engine to be according to the same standard as the rest of the CPU family, so this parse would work on that name?
(In reply to Yaniv Lavi from comment #3) > Can we change the name in engine to be according to the same standard as the > rest of the CPU family, so this parse would work on that name? Those are two different things. It's not a matter of simple string substitutions, it's a configuration. The CPU Family name is a user-facing name, the other is technical detail of libvirt configuration of VM. HE code should have never used one as the other, it needs to use the same configuration to map one to the other if needed.
I posted a patch to correctly fetch the short model string via rest API (see: 1542531) but now I expect that to be just almost a placeholder since ovirt-ha-agent at that stage should already consume the XML for libvirt found in the OVF store. Maybe there is also a race condition starting the engine VM on ovirt-ha-agent side.
Can you please attach also vdsm and ovirt-ha-agent logs?
Created attachment 1479014 [details] vdsm.log
Pls see attached vdsm.log. I can't find a ovirt-ha-agent.log. The only files named 'ovirt-ha-agent*' are executables. Do you know where I can find ovirt-ha-agent logs?
(In reply to me from comment #8) > Do you know where I can find > ovirt-ha-agent logs? Thanks, /var/log/ovirt-hosted-engine-ha/*.log
Created attachment 1479060 [details] agent.log
(In reply to me from comment #10) > Created attachment 1479060 [details] > agent.log Thanks, the real issue comes from here: MainThread::ERROR::2018-08-27 05:16:06,170::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs The bootstrap local engine VM has been stopped before having the time to fill the OVF_STORE archive. We definitively have a race condition there.
I'm happy to help test a fix for this using my system, which has a Broadwell CPU, if you don't have one available. Are there any instructions for how to build a bug-fix branch? Many thanks
(In reply to me from comment #12) > I'm happy to help test a fix for this using my system, which has a Broadwell > CPU, if you don't have one available. Are there any instructions for how to > build a bug-fix branch? Many thanks autoreconf -ivf ./configure make dist rpmbuild -tb *.tar Otherwise we have also a repo with unreleased nightly built rpms: https://resources.ovirt.org/pub/yum-repo/ovirt-release42-snapshot.rpm
Any update about the testing of this?
This bug is fixed. You can close. Still can't deploy hosted engine because of a different error, but didn't have time to collate the details into a new bug ticket, prob won't have time for a week or so.
Moving to verified according to comment 15
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.