Bug 1783180
Summary: | Cannot deploy HostedEngine on EPYC processor: Host CPU does not provide required features: virt-ssbd | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Juan Orti <jortialc> |
Component: | vdsm | Assignee: | Nobody <nobody> |
Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 4.3.6 | CC: | anderson, lsurette, michal.skrivanek, mkalinin, mnl, mtessun, o.freyermuth, rdlugyhe, rhbugzilla, srevivo, stowellt, ycui |
Target Milestone: | ovirt-4.4.0 | Keywords: | TestOnly, Triaged |
Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rhv-4.4.0-29 | Doc Type: | Bug Fix |
Doc Text: |
Previously, a problem with AMD EPYC CPUs that were missing the virt-ssbd CPU flag prevented Hosted Engine installation. The current release fixes this issue.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-08-04 13:27:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1744281 | ||
Bug Blocks: |
Description
Juan Orti
2019-12-13 08:32:29 UTC
Maybe duplicate of bug 1745181 ? I am wondering if this RHEL 7.8 bug is going to help resovling this issue: BZ#1744281. (In reply to Marina Kalinin from comment #6) > I am wondering if this RHEL 7.8 bug is going to help resovling this issue: > BZ#1744281. it should. And it should be testable on RHEL 8 too Worked for me on ocelot05 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7451 24-Core Processor Stepping: 2 CPU MHz: 2898.364 CPU max MHz: 2300.0000 CPU min MHz: 1200.0000 BogoMIPS: 4599.43 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-5,24-29 NUMA node1 CPU(s): 6-11,30-35 NUMA node2 CPU(s): 12-17,36-41 NUMA node3 CPU(s): 18-23,42-47 rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch Deployed over NFS. Adding a "me too" comment for oVirt 4.4 hosted engine, new deployment on EPYC CPUs. The HostedEngineLocal VM runs fine during setup due to using a requirement of 'amd-ssbd' instead of 'virt-ssbd'. After copying everything to shared storage, setup tries to bring up the HostedEngine VM with a requirement for 'virt-ssbd' which is unavailable with qemu on CentOS 8.1 for EPYC CPUs, according to 'virsh domcapabilities'. So, HostedEngine can't start and you have the incompatible CPU error from the original report after your deploy finally fails out at "Waiting for VM status". You can 'virsh dumpxml HostedEngine | sed 's/virt-ssbd/amd-ssbd/' >/tmp/he.xml ; virsh create /tmp/he.xml' and the HostedEngine VM comes right up in a temporary fashion, allowing you to change the cluster CPU type from 'Secure AMD EPYC' to 'AMD EPYC'. After that change, restart the whole kit to get a working, but hacky, 4.4 cluster running. Probably not the fix you'd want for production. On CentOS 7.8 w/ EPYC, virt-ssbd is available, and I can do a fresh oVirt 4.3 hosted engine deploy to the hosts: centos7 ~]# virsh domcapabilities | grep ssbd <feature policy='require' name='virt-ssbd'/> On CentOS 8.1 w/ EPYC (same servers of course), virt-ssbd is not available/usable but oVirt 4.4 still configures the HE VM to require it, rendering it unstartable: centos8 ~]# virsh domcapabilities | grep ssbd <feature policy='require' name='amd-ssbd'/> ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7302 16-Core Processor Stepping: 0 CPU MHz: 3287.912 BogoMIPS: 5989.27 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 16384K NUMA node0 CPU(s): 0-15,32-47 NUMA node1 CPU(s): 16-31,48-63 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca 4.4 is on el8 and there the relevant bug is bug 1797092 (In reply to Mark R. from comment #12) > You can 'virsh dumpxml HostedEngine | sed 's/virt-ssbd/amd-ssbd/' > >/tmp/he.xml ; virsh create /tmp/he.xml' and the HostedEngine VM comes right > up in a temporary fashion, allowing you to change the cluster CPU type from > 'Secure AMD EPYC' to 'AMD EPYC'. After that change, restart the whole kit to > get a working, but hacky, 4.4 cluster running. Probably not the fix you'd > want for production. Great! This saved my day. Being new to all of this, could you give me a hint what's left to do to complete the "hosted-engine --deploy" invocation which was left unfinished due to the problem? (In reply to Michael Lipp from comment #15) > (In reply to Mark R. from comment #12) > > You can 'virsh dumpxml HostedEngine | sed 's/virt-ssbd/amd-ssbd/'... > > Great! This saved my day. As there is still some activity around this bug report, I should maybe mention that Mark R.'s comment put me on the right track (it made me understand the problem). The eventual solution in my case, however, is the workaround described in https://bugzilla.redhat.com/show_bug.cgi?id=1798004#c16. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246 |