The "topoext" flag may crash guests if used on older CPU models. We need to check if mode="host-model" code won't trigger the same crash, and make it not enable "topoext" by default like "-cpu host" does. See equivalent QEMU commit: commit 7210a02c58572b2686a3a8d610c6628f87864aed Author: Eduardo Habkost <ehabkost> Date: Thu Aug 9 19:18:52 2018 -0300 i386: Disable TOPOEXT by default on "-cpu host" Enabling TOPOEXT is always allowed, but it can't be enabled blindly by "-cpu host" because it may make guests crash if the rest of the cache topology information isn't provided or isn't consistent. This addresses the bug reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1613277 Signed-off-by: Eduardo Habkost <ehabkost> Message-Id: <20180809221852.15285-1-ehabkost> Tested-by: Richard W.M. Jones <rjones> Reviewed-by: Babu Moger <babu.moger> Signed-off-by: Eduardo Habkost <ehabkost> +++ This bug was initially created as a clone of Bug #1614612 +++ Description of problem: Starting VM with host-passthrough cpu conf in some host caused VM kernel panic Version-Release number of selected component (if applicable): libvirt-4.5.0-6.el7.x86_64 qemu-kvm-rhev-2.12.0-9.el7.x86_64 kernel-3.10.0-931.el7.x86_64 For guest: kernel-3.10.0-931.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Configure VM with the following conf, start VM and check the console output # virsh dumpxml test1 --inactive |grep cpu <vcpu placement='static'>1</vcpu> <cpu mode='host-passthrough' check='partial'/> # virsh start test1 Domain test1 started # virsh console test1 Connected to domain test1 Escape character is ^] [ 110.720038] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 [ 110.721000] IP: [<ffffffffad8b69c2>] __queue_work+0x32/0x3e0 [ 110.721000] PGD 0 [ 110.721000] Oops: 0000 [#1] SMP [ 110.721000] Modules linked in: [ 110.721000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-931.el7.x86_64 #1 [ 110.721000] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 [ 110.721000] task: ffffffffae418480 ti: ffffffffae400000 task.ti: ffffffffae400000 [ 110.721000] RIP: 0010:[<ffffffffad8b69c2>] [<ffffffffad8b69c2>] __queue_work+0x32/0x3e0 [ 110.721000] RSP: 0000:ffff8c68ffc03e20 EFLAGS: 00010046 [ 110.721000] RAX: 0000000000000082 RBX: 0000000000000087 RCX: 0000000000000000 [ 110.721000] RDX: ffffffffae4ee9a0 RSI: 0000000000000000 RDI: 0000000000001400 [ 110.721000] RBP: ffff8c68ffc03e58 R08: 0000000000000000 R09: 0000000000004000 [ 110.721000] R10: ffffffffaea36bc8 R11: 0000000000007ffe R12: ffffffffae4ee9a0 [ 110.721000] R13: 0000000000001400 R14: 0000000000000000 R15: ffffffffae2c1551 [ 110.721000] FS: 0000000000000000(0000) GS:ffff8c68ffc00000(0000) knlGS:0000000000000000 [ 110.721000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 110.721000] CR2: 0000000000000102 CR3: 0000000019410000 CR4: 00000000000006b0 [ 110.721000] Call Trace: [ 110.721000] <IRQ> [ 110.721000] [<ffffffffad8b6fc5>] queue_work_on+0x45/0x50 [ 110.721000] [<ffffffffadc81a26>] credit_entropy_bits+0x1c6/0x290 [ 110.721000] [<ffffffffadc82734>] ? add_interrupt_randomness+0x1c4/0x230 [ 110.721000] [<ffffffffadc82734>] add_interrupt_randomness+0x1c4/0x230 [ 110.721000] [<ffffffffad9494df>] handle_irq_event_percpu+0x3f/0x80 [ 110.721000] [<ffffffffad94955c>] handle_irq_event+0x3c/0x60 [ 110.721000] [<ffffffffad94c663>] handle_level_irq+0x73/0xd0 [ 110.721000] [<ffffffffad82e564>] handle_irq+0xe4/0x1a0 [ 110.721000] [<ffffffffad89f028>] ? __local_bh_enable+0x28/0x90 [ 110.721000] [<ffffffffadf7553d>] do_IRQ+0x4d/0xf0 [ 110.721000] [<ffffffffadf67362>] common_interrupt+0x162/0x162 [ 110.721000] <EOI> [ 110.721000] [<ffffffffadf674a6>] ? retint_restore_args+0x6/0x36 [ 110.721000] [<ffffffffad86a511>] ? native_cpuid+0x11/0x20 [ 110.721000] [<ffffffffad83c5fe>] find_num_cache_leaves.isra.0+0x6e/0xa0 [ 110.721000] [<ffffffffad83dc39>] init_amd_cacheinfo+0x99/0xb0 [ 110.721000] [<ffffffffad841f40>] init_amd+0xb0/0x880 [ 110.721000] [<ffffffffad83f772>] identify_cpu+0x1c2/0x4d0 [ 110.721000] [<ffffffffae594f30>] identify_boot_cpu+0x10/0xa9 [ 110.721000] [<ffffffffae594fff>] check_bugs+0x21/0x22e [ 110.721000] [<ffffffffae586198>] start_kernel+0x41d/0x467 [ 110.721000] [<ffffffffae585b7b>] ? repair_env_string+0x5c/0x5c [ 110.721000] [<ffffffffae585120>] ? early_idt_handler_array+0x120/0x120 [ 110.721000] [<ffffffffae58572f>] x86_64_start_reservations+0x24/0x26 [ 110.721000] [<ffffffffae585885>] x86_64_start_kernel+0x154/0x177 [ 110.721000] [<ffffffffad8000d5>] start_cpu+0x5/0x14 [ 110.721000] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 80 40 43 ae f6 c4 02 0f 85 de 02 00 00 <41> f6 86 02 01 00 00 01 0f 85 78 02 00 00 49 c7 c7 48 7b 01 00 [ 110.721000] RIP [<ffffffffad8b69c2>] __queue_work+0x32/0x3e0 [ 110.721000] RSP <ffff8c68ffc03e20> [ 110.721000] CR2: 0000000000000102 [ 110.721000] ---[ end trace 66ea57364ef8c66f ]--- [ 110.721000] Kernel panic - not syncing: Fatal exception in interrupt Actual results: As step-1 shows Expected results: VM should start successfully Additional info: If I do not configure cpu for guest, just use qemu emulation, VM can start normally in this same host Paste the host # cat /proc/cpuinfo and guest dumpxml in attachment
This will be addressed in the next major release.
So QEMU correctly reports topoext as disabled in the expansion of "host" CPU model and libvirt therefore does not explicitly ask QEMU to enable topoext. However, topoext may still be enabled when QEMU starts... For example, on a host with AMD EPYC 7401 24-Core Processor virsh domcapabilities will show <mode name='host-model' supported='yes'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='require' name='x2apic'/> <feature policy='require' name='tsc-deadline'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='cmp_legacy'/> <feature policy='require' name='perfctr_core'/> <feature policy='require' name='invtsc'/> <feature policy='require' name='virt-ssbd'/> <feature policy='disable' name='monitor'/> </mode> and a domain with host-model CPU will be started with -cpu EPYC-IBPB,\ x2apic=on,\ tsc-deadline=on,\ hypervisor=on,\ tsc_adjust=on,\ arch-capabilities=on,\ cmp_legacy=on,\ perfctr_core=on,\ virt-ssbd=on,\ monitor=off which exactly matches the host-model CPU definition from domcapabilities. But once QEMU is started, the live domain definition will change to <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='require' name='x2apic'/> <feature policy='require' name='tsc-deadline'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='cmp_legacy'/> <feature policy='require' name='perfctr_core'/> <feature policy='require' name='virt-ssbd'/> <feature policy='disable' name='monitor'/> <feature policy='disable' name='svm'/> <feature policy='require' name='topoext'/> </cpu> where you can see topoext is actually enabled. The problem is a difference between libvirt's definition of EPYC-IBPB (I'll be ignoring the -IBPB suffix further on as the difference is irrelevant) CPU model and the definition used by QEMU. While libvirt's EPYC CPU model does not contain topoext feature, the EPYC CPU model is defined in QEMU as follows (most irrelevant parts were removed): { .name = "EPYC", .level = 0xd, .vendor = CPUID_VENDOR_AMD, .family = 23, .model = 1, .stepping = 2, ... .features[FEAT_8000_0001_ECX] = CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM | CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM | CPUID_EXT3_TOPOEXT, ... }, In other words, "-cpu EPYC" will implicitly enable topoext, which is detected by libvirt after starting QEMU and thus the feature is added into the live definition. If libvirt's version of EPYC contained topoext, the host-model would explicitly disable topoext, but since libvirt thinks topoext is not enabled implcitly by the model, there's no need to explicitly disable it. Ironically enough, topoext is listed in .no_autoenable_flags in QEMU an yet some CPU models enable it without explicit request. This looks like a QEMU bug to me. But if QEMU is correct, libvirt will need to do something to fix this.
Having enabled by EPYC implicitly but not by "-cpu host" is done on purpose: the feature is supposed to be hidden behind the CPU model because it is more complex than a boolean on/off option. In retrospect, making the feature directly configurable in the command line was a mistake: on all cases it works, the feature is already enabled implicitly. Now, to the current situation: If "host-model" never includes topoext out of the box, this is correct. If the domain XML is updated to include topoext because QEMU did enable the feature implicitly, this makes the config redudant but also correct. So, it looks like everything is working as expected?
Ah so you're saying QEMU will implicitly enable topoext only if it can be safely enabled, right?
(In reply to Jiri Denemark from comment #5) > Ah so you're saying QEMU will implicitly enable topoext only if it can be > safely enabled, right? Correct.
OK, everything works right then, no libvirt work needed.
Verify this bug on libvirt-4.5.0-30.module+el8.1.0+3574+3a63752b.x86_64. Other components: kernel-4.18.0-120.el8.x86_64 qemu-kvm-2.12.0-82.module+el8.1.0+3738+0d8c0249.x86_64 Other ENV info: On physical host: # lscpu Model name: AMD EPYC 7251 8-Core Processor # virsh domcapabilities <mode name='host-model' supported='yes'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> ... S1: When starting VM with "host-passthrough" CPU conf; VM can start successfully. NO topoext info in active dumpxml or qemu cmd line of VM; and NO topoext cpu flag in guest OS. S2: When starting VM with "host-model" conf as "virsh domcapabilities" shows; topoext flag and CPU feature will be displayed in guest OS and dumpxml/qemu cmd line of VM. # virsh dumpxml topoext |grep "<cpu" -A20 <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> ... <feature policy='require' name='topoext'/> ... </cpu> # ps -ef |grep topoext -cpu EPYC-IBPB,x2apic=on,tsc-deadline=on,hypervisor=on,tsc_adjust=on,cmp_legacy=on, ** topoext=on **,perfctr_core=on,virt-ssbd=on,monitor=off,svm=off,invtsc=on # virsh console topoext (In guest) # lscpu |grep topo Flags: ... topoext The test result is expected, move this bug to be verified.
Hi Jiri https://bugzilla.redhat.com/show_bug.cgi?id=1619798#c8 In this comment, if topoext is enabled when VM started successfully, should this flag also is displayed in the output of "virsh domcapabilities" originally? # rpm -qa libvirt qemu-kvm kernel qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.x86_64 kernel-4.18.0-147.el8.x86_64 libvirt-4.5.0-35.module+el8.1.0+4227+b2722cb3.x86_64 # virsh domcapabilities |grep "<mode name='host-model'" -A15 <mode name='host-model' supported='yes'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='require' name='x2apic'/> <feature policy='require' name='tsc-deadline'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='cmp_legacy'/> <feature policy='require' name='perfctr_core'/> <feature policy='require' name='invtsc'/> <feature policy='require' name='virt-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='disable' name='monitor'/> <feature policy='disable' name='svm'/> *** No topoext here *** </mode> # virsh domstate avocado-vt-vm1 shut off # virsh dumpxml avocado-vt-vm1 --inactive |grep "<cpu" -A3 <cpu mode='host-model' check='partial'> <model fallback='allow'/> </cpu> # virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started # virsh dumpxml avocado-vt-vm1 |grep "<cpu" -A15 <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='require' name='x2apic'/> <feature policy='require' name='tsc-deadline'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='cmp_legacy'/> <feature policy='require' name='perfctr_core'/> <feature policy='require' name='virt-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='disable' name='monitor'/> <feature policy='disable' name='svm'/> <feature policy='require' name='topoext'/> *** Topoext is enabled here *** </cpu> However, If I test the following cmd, "virsh hypervisor-cpu-compare" will raise the following info: "CPU described in avocado-vt-vm1.xml is incompatible with the CPU provided by hypervisor on the host". # virsh dumpxml avocado-vt-vm1 >> avocado-vt-vm1.xml # virsh hypervisor-cpu-compare avocado-vt-vm1.xml CPU described in avocado-vt-vm1.xml is incompatible with the CPU provided by hypervisor on the host Since VM can start successfully on this host, I think the CPU conf should not be incompatible. Or should this info change?
Track the issue in comment 9 in Bug 1765445 - Cmd "virsh Hypervisor-cpu-compare" outputs wrong result with VM's active dumpxml as input because of topoext
Given that this bug is VERIFIED and RHEL AV 8.1.0 shipped (went GA) on 11 Nov 2019, I am closing this bug report.