Bug 1619798 - "topoext" flag may crash guests if enabled blindly
Summary: "topoext" flag may crash guests if enabled blindly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 8.1
Assignee: Jiri Denemark
QA Contact: jiyan
URL:
Whiteboard:
Depends On:
Blocks: 1649160
TreeView+ depends on / blocked
 
Reported: 2018-08-21 18:29 UTC by Eduardo Habkost
Modified: 2020-03-11 22:39 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1614612
Environment:
Last Closed: 2020-03-11 22:39:36 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1622421 None None None 2019-11-25 01:42:20 UTC

Internal Links: 1622421

Description Eduardo Habkost 2018-08-21 18:29:17 UTC
The "topoext" flag may crash guests if used on older CPU models.  We need to check if mode="host-model" code won't trigger the same crash, and make it not enable "topoext" by default like "-cpu host" does.  See equivalent QEMU commit:

commit 7210a02c58572b2686a3a8d610c6628f87864aed
Author: Eduardo Habkost <ehabkost@redhat.com>
Date:   Thu Aug 9 19:18:52 2018 -0300

    i386: Disable TOPOEXT by default on "-cpu host"
    
    Enabling TOPOEXT is always allowed, but it can't be enabled
    blindly by "-cpu host" because it may make guests crash if the
    rest of the cache topology information isn't provided or isn't
    consistent.
    
    This addresses the bug reported at:
    https://bugzilla.redhat.com/show_bug.cgi?id=1613277
    
    Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
    Message-Id: <20180809221852.15285-1-ehabkost@redhat.com>
    Tested-by: Richard W.M. Jones <rjones@redhat.com>
    Reviewed-by: Babu Moger <babu.moger@amd.com>
    Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>



+++ This bug was initially created as a clone of Bug #1614612 +++

Description of problem:
Starting VM with host-passthrough cpu conf in some host caused VM kernel panic

Version-Release number of selected component (if applicable):
libvirt-4.5.0-6.el7.x86_64
qemu-kvm-rhev-2.12.0-9.el7.x86_64
kernel-3.10.0-931.el7.x86_64

For guest: kernel-3.10.0-931.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Configure VM with the following conf, start VM and check the console output
# virsh dumpxml test1 --inactive |grep cpu
  <vcpu placement='static'>1</vcpu>
  <cpu mode='host-passthrough' check='partial'/>

# virsh start test1
Domain test1 started

# virsh console test1
Connected to domain test1
Escape character is ^]
[  110.720038] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
[  110.721000] IP: [<ffffffffad8b69c2>] __queue_work+0x32/0x3e0
[  110.721000] PGD 0 
[  110.721000] Oops: 0000 [#1] SMP 
[  110.721000] Modules linked in:
[  110.721000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-931.el7.x86_64 #1
[  110.721000] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
[  110.721000] task: ffffffffae418480 ti: ffffffffae400000 task.ti: ffffffffae400000
[  110.721000] RIP: 0010:[<ffffffffad8b69c2>]  [<ffffffffad8b69c2>] __queue_work+0x32/0x3e0
[  110.721000] RSP: 0000:ffff8c68ffc03e20  EFLAGS: 00010046
[  110.721000] RAX: 0000000000000082 RBX: 0000000000000087 RCX: 0000000000000000
[  110.721000] RDX: ffffffffae4ee9a0 RSI: 0000000000000000 RDI: 0000000000001400
[  110.721000] RBP: ffff8c68ffc03e58 R08: 0000000000000000 R09: 0000000000004000
[  110.721000] R10: ffffffffaea36bc8 R11: 0000000000007ffe R12: ffffffffae4ee9a0
[  110.721000] R13: 0000000000001400 R14: 0000000000000000 R15: ffffffffae2c1551
[  110.721000] FS:  0000000000000000(0000) GS:ffff8c68ffc00000(0000) knlGS:0000000000000000
[  110.721000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  110.721000] CR2: 0000000000000102 CR3: 0000000019410000 CR4: 00000000000006b0
[  110.721000] Call Trace:
[  110.721000]  <IRQ> 
[  110.721000]  [<ffffffffad8b6fc5>] queue_work_on+0x45/0x50
[  110.721000]  [<ffffffffadc81a26>] credit_entropy_bits+0x1c6/0x290
[  110.721000]  [<ffffffffadc82734>] ? add_interrupt_randomness+0x1c4/0x230
[  110.721000]  [<ffffffffadc82734>] add_interrupt_randomness+0x1c4/0x230
[  110.721000]  [<ffffffffad9494df>] handle_irq_event_percpu+0x3f/0x80
[  110.721000]  [<ffffffffad94955c>] handle_irq_event+0x3c/0x60
[  110.721000]  [<ffffffffad94c663>] handle_level_irq+0x73/0xd0
[  110.721000]  [<ffffffffad82e564>] handle_irq+0xe4/0x1a0
[  110.721000]  [<ffffffffad89f028>] ? __local_bh_enable+0x28/0x90
[  110.721000]  [<ffffffffadf7553d>] do_IRQ+0x4d/0xf0
[  110.721000]  [<ffffffffadf67362>] common_interrupt+0x162/0x162
[  110.721000]  <EOI> 
[  110.721000]  [<ffffffffadf674a6>] ? retint_restore_args+0x6/0x36
[  110.721000]  [<ffffffffad86a511>] ? native_cpuid+0x11/0x20
[  110.721000]  [<ffffffffad83c5fe>] find_num_cache_leaves.isra.0+0x6e/0xa0
[  110.721000]  [<ffffffffad83dc39>] init_amd_cacheinfo+0x99/0xb0
[  110.721000]  [<ffffffffad841f40>] init_amd+0xb0/0x880
[  110.721000]  [<ffffffffad83f772>] identify_cpu+0x1c2/0x4d0
[  110.721000]  [<ffffffffae594f30>] identify_boot_cpu+0x10/0xa9
[  110.721000]  [<ffffffffae594fff>] check_bugs+0x21/0x22e
[  110.721000]  [<ffffffffae586198>] start_kernel+0x41d/0x467
[  110.721000]  [<ffffffffae585b7b>] ? repair_env_string+0x5c/0x5c
[  110.721000]  [<ffffffffae585120>] ? early_idt_handler_array+0x120/0x120
[  110.721000]  [<ffffffffae58572f>] x86_64_start_reservations+0x24/0x26
[  110.721000]  [<ffffffffae585885>] x86_64_start_kernel+0x154/0x177
[  110.721000]  [<ffffffffad8000d5>] start_cpu+0x5/0x14
[  110.721000] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 80 40 43 ae f6 c4 02 0f 85 de 02 00 00 <41> f6 86 02 01 00 00 01 0f 85 78 02 00 00 49 c7 c7 48 7b 01 00 
[  110.721000] RIP  [<ffffffffad8b69c2>] __queue_work+0x32/0x3e0
[  110.721000]  RSP <ffff8c68ffc03e20>
[  110.721000] CR2: 0000000000000102
[  110.721000] ---[ end trace 66ea57364ef8c66f ]---
[  110.721000] Kernel panic - not syncing: Fatal exception in interrupt


Actual results:
As step-1 shows

Expected results:
VM should start successfully

Additional info:
If I do not configure cpu for guest, just use qemu emulation, VM can start normally in this same host

Paste the host # cat /proc/cpuinfo and guest dumpxml in attachment

Comment 1 Jaroslav Suchanek 2019-05-20 11:50:35 UTC
This will be addressed in the next major release.

Comment 3 Jiri Denemark 2019-07-23 14:16:23 UTC
So QEMU correctly reports topoext as disabled in the expansion of "host" CPU
model and libvirt therefore does not explicitly ask QEMU to enable topoext.
However, topoext may still be enabled when QEMU starts...

For example, on a host with AMD EPYC 7401 24-Core Processor virsh
domcapabilities will show

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='perfctr_core'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='disable' name='monitor'/>
    </mode>

and a domain with host-model CPU will be started with

    -cpu EPYC-IBPB,\
         x2apic=on,\
         tsc-deadline=on,\
         hypervisor=on,\
         tsc_adjust=on,\
         arch-capabilities=on,\
         cmp_legacy=on,\
         perfctr_core=on,\
         virt-ssbd=on,\
         monitor=off

which exactly matches the host-model CPU definition from domcapabilities. But
once QEMU is started, the live domain definition will change to

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <vendor>AMD</vendor>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='tsc-deadline'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='perfctr_core'/>
    <feature policy='require' name='virt-ssbd'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='require' name='topoext'/>
  </cpu>

where you can see topoext is actually enabled.

The problem is a difference between libvirt's definition of EPYC-IBPB (I'll be
ignoring the -IBPB suffix further on as the difference is irrelevant) CPU
model and the definition used by QEMU. While libvirt's EPYC CPU model does not
contain topoext feature, the EPYC CPU model is defined in QEMU as follows
(most irrelevant parts were removed):

    {
        .name = "EPYC",
        .level = 0xd,
        .vendor = CPUID_VENDOR_AMD,
        .family = 23,
        .model = 1,
        .stepping = 2,
        ...
        .features[FEAT_8000_0001_ECX] =
            CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
            CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
            CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
            CPUID_EXT3_TOPOEXT,
        ...
    },

In other words, "-cpu EPYC" will implicitly enable topoext, which is detected
by libvirt after starting QEMU and thus the feature is added into the live
definition.

If libvirt's version of EPYC contained topoext, the host-model would
explicitly disable topoext, but since libvirt thinks topoext is not enabled
implcitly by the model, there's no need to explicitly disable it.

Ironically enough, topoext is listed in .no_autoenable_flags in QEMU an yet
some CPU models enable it without explicit request. This looks like a QEMU bug
to me. But if QEMU is correct, libvirt will need to do something to fix this.

Comment 4 Eduardo Habkost 2019-07-23 14:44:54 UTC
Having enabled by EPYC implicitly but not by "-cpu host" is done on purpose: the feature is supposed to be hidden behind the CPU model because it is more complex than a boolean on/off option.

In retrospect, making the feature directly configurable in the command line was a mistake: on all cases it works, the feature is already enabled implicitly.

Now, to the current situation:

If "host-model" never includes topoext out of the box, this is correct.
If the domain XML is updated to include topoext because QEMU did enable the feature implicitly, this makes the config redudant but also correct.

So, it looks like everything is working as expected?

Comment 5 Jiri Denemark 2019-07-23 15:06:35 UTC
Ah so you're saying QEMU will implicitly enable topoext only if it can be
safely enabled, right?

Comment 6 Eduardo Habkost 2019-07-23 15:25:04 UTC
(In reply to Jiri Denemark from comment #5)
> Ah so you're saying QEMU will implicitly enable topoext only if it can be
> safely enabled, right?

Correct.

Comment 7 Jiri Denemark 2019-07-24 10:35:09 UTC
OK, everything works right then, no libvirt work needed.

Comment 8 jiyan 2019-07-25 06:51:54 UTC
Verify this bug on libvirt-4.5.0-30.module+el8.1.0+3574+3a63752b.x86_64.

Other components:
kernel-4.18.0-120.el8.x86_64
qemu-kvm-2.12.0-82.module+el8.1.0+3738+0d8c0249.x86_64

Other ENV info:
On physical host:
# lscpu
Model name:          AMD EPYC 7251 8-Core Processor

# virsh domcapabilities
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
...

S1: When starting VM with "host-passthrough" CPU conf; VM can start successfully. NO topoext info in active dumpxml or qemu cmd line of VM; and NO topoext cpu flag in guest OS.

S2: When starting VM with "host-model" conf as "virsh domcapabilities" shows; topoext flag and CPU feature will be displayed in guest OS and dumpxml/qemu cmd line of VM.
# virsh dumpxml topoext |grep "<cpu" -A20
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <vendor>AMD</vendor>
    ...
    <feature policy='require' name='topoext'/>
    ...
  </cpu>

# ps -ef |grep topoext
-cpu EPYC-IBPB,x2apic=on,tsc-deadline=on,hypervisor=on,tsc_adjust=on,cmp_legacy=on, ** topoext=on  **,perfctr_core=on,virt-ssbd=on,monitor=off,svm=off,invtsc=on

# virsh console topoext
(In guest) # lscpu |grep topo
Flags:               ... topoext

The test result is expected, move this bug to be verified.

Comment 9 jiyan 2019-10-10 09:03:16 UTC
Hi Jiri https://bugzilla.redhat.com/show_bug.cgi?id=1619798#c8
In this comment, if topoext is enabled when VM started successfully, should this flag also is displayed in the output of "virsh domcapabilities" originally?

# rpm -qa libvirt qemu-kvm kernel
qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.x86_64
kernel-4.18.0-147.el8.x86_64
libvirt-4.5.0-35.module+el8.1.0+4227+b2722cb3.x86_64

# virsh domcapabilities |grep "<mode name='host-model'" -A15
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='perfctr_core'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='disable' name='monitor'/>
      <feature policy='disable' name='svm'/>     *** No topoext here ***
    </mode>

# virsh domstate avocado-vt-vm1
shut off

# virsh dumpxml avocado-vt-vm1 --inactive |grep "<cpu" -A3
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>

# virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started

# virsh dumpxml avocado-vt-vm1 |grep "<cpu" -A15
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <vendor>AMD</vendor>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='tsc-deadline'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='perfctr_core'/>
    <feature policy='require' name='virt-ssbd'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='require' name='topoext'/>  *** Topoext is enabled here ***
  </cpu>

However, If I test the following cmd, "virsh hypervisor-cpu-compare" will raise the following info: "CPU described in avocado-vt-vm1.xml is incompatible with the CPU provided by hypervisor on the host".

# virsh dumpxml avocado-vt-vm1 >> avocado-vt-vm1.xml 

# virsh hypervisor-cpu-compare avocado-vt-vm1.xml 
CPU described in avocado-vt-vm1.xml is incompatible with the CPU provided by hypervisor on the host

Since VM can start successfully on this host, I think the CPU conf should not be incompatible. Or should this info change?

Comment 10 jiyan 2019-10-25 06:33:48 UTC
Track the issue in comment 9 in Bug 1765445 - Cmd "virsh Hypervisor-cpu-compare" outputs wrong result with VM's active dumpxml as input because of topoext

Comment 11 Jeff Nelson 2020-03-11 22:39:36 UTC
Given that this bug is VERIFIED and RHEL AV 8.1.0 shipped (went GA) on 11 Nov 2019, I am closing this bug report.


Note You need to log in before you can comment on or make changes to this bug.