Bug 2030006

Summary: [RHEL8.2Z - OSP16.1] Wrongly support "Cascadelake-Server" on physical host without avx512_vnni cpu flag
Product: Red Hat Enterprise Linux 8 Reporter: Priscila <pveiga>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Luyao Huang <lhuang>
Severity: high Docs Contact:
Priority: medium    
Version: 8.2CC: ailan, cmayapka, dhill, dyuan, gveitmic, jdenemar, jiyan, lhuang, lijin, lmen, rbalakri, smooney, vasanth.mohanraj, virt-bugs, virt-maint, xuzhang, yalzhang
Target Milestone: rcKeywords: Triaged, Upstream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1761678 Environment:
Last Closed: 2022-07-04 23:23:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1761678    
Bug Blocks: 1840010    

Comment 4 John Ferlan 2021-12-08 18:24:23 UTC
Updating needinfo to Jira who resolved the cloned from bug 1761678

Comment 5 Jiri Denemark 2021-12-10 12:35:59 UTC
I'm confused. The bug was already backported and fixed in libvirt-6.0.0-21 and
released with RHEL-AV 8.2.1 in July 2020. See bug 1840010. So I would say
there's nothing to do here.

But just to be sure... what issues do you see and with what libvirt release
exactly?

Comment 6 David Hill 2021-12-22 17:38:03 UTC
We get the following failure when migrating from cascadelake to skylake:

2021-11-23 11:21:07.243 7 INFO nova.compute.manager [req-a8f868ee-6dff-49b1-aa6e-bc08b6b56014 - - - - -] [instance: e9cd1ff8-56f3-496e-a948-6d90c5532487] During the sync_power process the instance has moved from host overcloud-novacompute-8.localdomain to host overcloud-novacompute-7.localdomain
2021-11-23 11:21:13.232 7 INFO nova.compute.manager [req-582a0e19-5770-43c4-a48a-9d9a207507db f04d44cea47a4197b14111e32f22f3c8 7ff9802d352043b0b0f6648441b12324 - default default] [instance: e9cd1ff8-56f3-496e-a948-6d90c5532487] Post operation of migration started
2021-11-23 11:21:30.561 7 WARNING nova.compute.manager [req-b20eb4d0-27c6-4925-81a6-630741b33996 1edce6bc10fa4a13bebea68dd17ce68e 3aa0f6362c874915b069ebddb82aea3f - default default] [instance: e9cd1ff8-56f3-496e-a948-6d90c5532487] Received unexpected event network-vif-plugged-a90fe5f2-012e-4e65-b3e3-48574f729860 for instance with vm_state active and task_state None.
2021-11-23 11:21:32.615 7 WARNING nova.compute.manager [req-8ae6142f-acdc-4213-a568-58a7ad826cfb 1edce6bc10fa4a13bebea68dd17ce68e 3aa0f6362c874915b069ebddb82aea3f - default default] [instance: e9cd1ff8-56f3-496e-a948-6d90c5532487] Received unexpected event network-vif-plugged-a90fe5f2-012e-4e65-b3e3-48574f729860 for instance with vm_state active and task_state None.
2021-11-23 11:33:36.592 7 INFO nova.virt.libvirt.driver [req-310c7a72-80ba-41e0-ad90-0e538d8cf291 f04d44cea47a4197b14111e32f22f3c8 7ff9802d352043b0b0f6648441b12324 - default default] Instance launched has CPU info: {"arch": "x86_64", "model": "Cascadelake-Server", "vendor": "Intel", "topology": {"cells": 2, "sockets": 1, "cores": 24, "threads": 2}, "features": ["tsc-deadline", "mca", "acpi", "apic", "stibp", "smep", "pat", "monitor", "mce", "xsaveopt", "bmi2", "vme", "mpx", "avx512f", "mtrr", "rdctl-no", "vmx", "3dnowprefetch", "dtes64", "avx512dq", "rdtscp", "avx2", "xgetbv1", "cmov", "avx512cd", "sse4.1", "rdrand", "intel-pt", "tsc", "erms", "pni", "cx8", "cx16", "xtpr", "ht", "tsc_adjust", "clflushopt", "fpu", "xsavec", "pku", "bmi1", "md-clear", "invpcid", "avx512vl", "tm", "arat", "skip-l1dfl-vmentry", "ds_cpl", "adx", "smap", "ss", "clflush", "syscall", "fsgsbase", "sse4.2", "spec-ctrl", "avx512bw", "nx", "lahf_lm", "msr", "de", "pse36", "clwb", "pdpe1gb", "fxsr", "est", "arch-capabilities", "f16c", "aes", "pbe", "abm", "ssbd", "ds", "rtm", "pse", "mds-no", "x2apic", "dca", "sep", "smx", "pcid", "pclmuldq", "hle", "popcnt", "fma", "sse", "ssse3", "pge", "lm", "pdcm", "tm2", "invtsc", "avx", "tsx-ctrl", "xsaves", "mmx", "ibrs-all", "rdseed", "movbe", "pae", "xsave", "sse2", "avx512vnni"]}
2021-11-23 11:33:36.595 7 ERROR nova.virt.libvirt.driver [req-310c7a72-80ba-41e0-ad90-0e538d8cf291 f04d44cea47a4197b14111e32f22f3c8 7ff9802d352043b0b0f6648441b12324 - default default] CPU doesn't have compatibility.

even with:

        nova :: compute :: libvirt :: libvirt_cpu_model: 'Skylake-Server-IBRS'

set.

Comment 7 yalzhang@redhat.com 2022-01-10 05:37:11 UTC
(In reply to David Hill from comment #6)
> "mmx", "ibrs-all", "rdseed", "movbe", "pae", "xsave", "sse2", "avx512vnni"]}
> 2021-11-23 11:33:36.595 7 ERROR nova.virt.libvirt.driver
> [req-310c7a72-80ba-41e0-ad90-0e538d8cf291 f04d44cea47a4197b14111e32f22f3c8
> 7ff9802d352043b0b0f6648441b12324 - default default] CPU doesn't have
> compatibility.

Cascadelake is newer than skylake, and avx512vnni is not supported on skylake, so I think it is expected result. 

> even with:
> 
>         nova :: compute :: libvirt :: libvirt_cpu_model:
> 'Skylake-Server-IBRS'
> 
> set.
Do you mean migrate from Cascadelake-Server to Skylake with guest xml as below failed? I will try to reproduce it when I get the appropriate hardware.
 <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Skylake-Server-IBRS</model>
</cpu>

Comment 9 Germano Veit Michel 2022-03-24 04:20:35 UTC
(In reply to David Hill from comment #6)
> 7ff9802d352043b0b0f6648441b12324 - default default] Instance launched has
> CPU info: {"arch": "x86_64", "model": "Cascadelake-Server", ......
> 
> even with:
> 
>         nova :: compute :: libvirt :: libvirt_cpu_model:
> 'Skylake-Server-IBRS'

If nova set the CPU to Skylake, why is the log above saying it launched an instance with Cascadelake? Isn't this nova launching a VM with the wrong CPU?

I've just tested the hypothesis in comment #7, and it works for me

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Skylake-Server-noTSX-IBRS</model>

Migrates fine from one host machine to another, both being Cascadelake-Server-noTSX.

David, could you please provide more information so we can move on with this bug or close it?

Thanks!

Comment 10 Germano Veit Michel 2022-06-21 23:00:05 UTC
I think this could be another victim of this KCS.
https://access.redhat.com/solutions/2891431

cpu_model_extra_flags needs to be set in nova, not just cpu_model.

So I'm afraid this is not a bug, just somewhat incorrect configuration.

Can someone still reproduce this? If not, I suggest we close this bug.